by Brandon Rhodes • Home

Untangling the Big Pharoah’s “Terrifying” Chart

Date: 3 September 2013
Tags:python

An interesting diagram has been making the rounds on the Internet. Attributed to a Twitter personality named @TheBigPharaoh — whose tweets draw attention to the humanitarian and human rights situation in Egypt — it has been cited by no less an authority than the Washington Post, who calls it a “sort of terrifying” depiction of the modern Middle East. But as a consumer of data, I was immediately skeptical: there are many ways to make quite simple information look like chaos if it is presented poorly.

Is the information in the diagram really that complex?

I decided to try building a very simple data model to see if it could predict every single relationship on the diagram. Not because I think that the real Middle East (or anywhere else) can be adequately described with a simple model, but because I strongly suspected that the diagram itself was in fact modeled on only a few basic regional divisions.

Update: reader Alex Burr points out five missing edges in [`pharoahs-chart.json`](http://rhodesmill.org/brandon/2013/pharoahs-chart.json) so please use [`pharoahs-chart-v2.json`](http://rhodesmill.org/brandon/2013/pharoahs-chart-v2.json) instead, which inspired an improvement in the article below: Iran has been added to the `islamists` set, which now overrules the Shia-Sunni split to match the diagram’s assertion that they support Hamas.

Diving into the data

So I opened an IPython Notebook and got to work! This blog post is, in fact, the notebook itself, with some Markdown calls full of paragraphs and text added to provide structure and commentary. You can download the original notebook here:

untangling-big-pharoah.ipynb

So that every IPython Notebook does not begin with the same series of verbose import statements, IPython provides a pylab directive which imports a few dozen essential NumPy features. It is the first step that I took in getting ready to code:

%pylab inline
Populating the interactive namespace from numpy and matplotlib

So that other people can play with the diagram — and probably do an even better job of analysis than I will here! — I have chosen to represent it as JSON instead of using a Python-specific format. You can download my small data file here:

pharoahs-chart-v2.json

Once this file is saved to the current directory, Python can load it quite easily with the load() method:

import json
with open('pharoahs-chart-v2.json') as f:
    edges = sorted(json.load(f))

a, verb, b = array(edges).T
print 'Loaded', len(verb), 'edges'
Loaded 42 edges

As is common when doing information processing in modern Python, note that I have not left the data as a list-of-lists as it is represented in the underlying JSON file. Instead, I have passed the entire data structure to the NumPy array() method which I have then transposed so that the input’s list of 3-element items becomes three big vectors: a vector of actors, a vector of verbs, and finally another vector of actors at which those verbs are respectively directed.

A quick count of the number of unique nodes can be a quick way to check against misspellings, since a misspelling will create two unique nodes where the original diagram had only one. Happily, computing the number of unique strings shared between a and b yields exactly the number of unique nodes in the actual diagram:

print 'Loaded', len(unique(append(a, b))), 'nodes'
Loaded 15 nodes

We are nearly ready to explore the data!

I will propose a series of simple political models of the Middle East, each of which is a function that, given a political actor a like "Turkey" and a potential client b like "Syria Rebels", returns one of three predictions:

These predictions can then be compared to the actual arrows on the diagram to rate the political model for its accuracy. Note carefully that these models are only being judged for their ability to correctly color-code the arrows that actually exist in the diagram; they can return whatever nonsense they want for arrows not in the diagram, like ("USA", "Turkey"), because we are only testing the functions against the input data set.

Because NumPy supports vector operations that operate simultaneously on whole vectors of input values, it only takes a single == operation to compare a series of predictions against the series of actual supports/hates verbs from the diagram. The only catch is that, to perform the actual prediction, we need to “vectorize” each little prediction function to produce a routine that works on a whole vector at a time. And we use another trick: since a series of == decisions like True and False are in fact equivalent to a series of numbers 1 or 0, we can use sum() to count how many True values are present! Aside from these two nuances, the reporting routine is rather simple Python:

def try_predictor(predictor, report=True, verbose=False):
    """Report on how well a `predictor` function performs."""

    # What does the predictor predict for each situation?

    prediction = vectorize(predictor)(a, b)
    
    # How does that stack up against the diagram?

    match = (prediction == verb)
    percent = 100.0 * sum(match) / len(match)
    print 'Accuracy: %.03f %%' % percent
    
    # What specific predictions is it making?

    if report and (verbose or not all(match)):
        print
        for is_match, ai, bi, pi in zip(match, a, b, prediction):
            if is_match and not verbose:
                continue
            print '      ' if is_match else 'WRONG:',
            print ai, pi, bi
        print

Before getting all political, we should test this analysis and reporting tool by feeding it one or two dummy predictors that are not actually interesting, to see its output. We will try exercising a pair of functions that represent the perfect optimist and the perfect pessimist: the one assumes that members of the human species always support one another, while the other assumes that "hates" is the universal relationship.

try_predictor(lambda a, b: 'supports', report=False)
try_predictor(lambda a, b: 'hates', report=False)
Accuracy: 47.619 %
Accuracy: 47.619 %

It is a happy fact that the pessimist and optimist are so perfectly balanced in this particular case: the number of friendly links in the diagram, in other words, is equal to the number of enemy relationships. Which almost gives one hope for the world — not quite, but almost.

Friends and Enemies

Given this infrastructure, it will take only a few steps to predict every single political relationship in the Big Pharaoh’s diagram. The real Middle East may be more complex than this, but you would not know it from the diagram!

The first thing that strikes me is how many red arrows cut left-to-right across the diagram between the upper right, where we see Russia, Assad, and Iran, and most of the rest of the state and non-state actors that are depicted. This has deep roots: Islam became separated within its first few centuries into a Sunni majority and a Shia minority (as well as many smaller groups), the latter of which claims both Assad and the Iranian leadership as adherents. If we place all of the Shia in a group and throw in Russia — which shares a border with Iran and has served as an ally following the overthrow of the United-States-backed Shah in 1979 — then we find that we are almost halfway to explaining the entire diagram:

shias = {'Assad', 'Iran', 'Lebanon Shias', 'Russia'}

def p1(a, b):
    if (a in shias) != (b in shias):
        return 'hates'
    else:
        return 'supports'

try_predictor(p1)
Accuracy: 71.429 %

WRONG: Al Qaeda supports Saudi & Gulf
WRONG: Hamas supports Sisi
WRONG: Iran hates Hamas
WRONG: Israel supports Hamas
WRONG: Qatar supports Sisi
WRONG: Saudi & Gulf supports Muslim Brotherhood
WRONG: Sisi supports Muslim Brotherhood
WRONG: Turkey supports Sisi
WRONG: USA supports Muslim Brotherhood
WRONG: USA supports Sisi
WRONG: USA supports Al Qaeda
WRONG: USA supports Hamas

You may be a bit confused about why I am performing a pair of in operations and then comparing the output with an != inequality operator. The reason is that I am looking for situations where the answers are either True and False or else the values False and True, either one of which indicates that a and b fall on opposite sides of the division.

This predictor brings our success rate to 70%.

But there is obviously more going on here, because nearly 30% of the links in the diagram are still being reported incorrectly. Take a moment to read over the list of mis-predictions above. Do they share anything in common?

What our first predictor seems blind to is the opposition between populist Islamist movements and most of the nation-states involved in the region. The Arab Spring has made it possible that several of these organizations will now make significant political gains if they can turn their popular support into votes in newly created democracies, but they are considered terrorist organizations by many Western nations and their allies.

Three state actors, though, have allied themselves with the Islamist movements instead of opposing them. Theocratic Iran was itself born of an Islamist revolution in 1979. Turkey is a secular democracy that has been flirting with the idea of a more explicitly Islamist government. And Qatar is a more interesting case: while the government itself is an autocracy, it is a Wahabi state and thus is strongly aligned with the earnestly conservative Islam that motivates many of these political and religious groups.

Adding these two rough allegiances into our model, and assuming that Islamists always aid one another while Islamists and moderates are always at odds, very nearly completes the entire diagram!

islamists = {'Al Qaeda', 'Hamas', 'Muslim Brotherhood', 'Iran', 'Turkey', 'Qatar'}
moderates = {'Saudi & Gulf', 'Sisi', 'Israel', 'USA'}

def p2(a, b):
    either = {a, b}
    if (a in islamists) and (b in islamists):
        return 'supports'
    elif (either & islamists) and (either & moderates):
        return 'hates'
    elif (a in shias) != (b in shias):
        return 'hates'
    else:
        return 'supports'

try_predictor(p2)
Accuracy: 95.238 %

WRONG: USA hates Muslim Brotherhood
WRONG: USA supports Sisi

Note my careful use of Python set operations to contrive a succinct expression for “if one of the players is populist and the other is autocratic” — if it were not for the ability to do a quick test for an intersection between one of the inputs and either the islamists set or the moderates set, this new if statement would have had to run to several lines.

The only thing now missing is that our political predictor never outputs the result "clueless" and thus cannot correctly predict the stance of the United States with respect to the power struggle in Egypt. I will leave to more informed political commentators whether this characterization of the current administration is fair or not; for our purposes, the only point is that it requires the addition of but a third clause to our predictor, yielding an absolutely perfect p3():

egypt = {'Muslim Brotherhood', 'Sisi'}

def p3(a, b):
    either = {a, b}
    if a == 'USA' and b in egypt:
        return 'clueless'
    elif (a in islamists) and (b in islamists):
        return 'supports'
    elif (either & islamists) and (either & moderates):
        return 'hates'
    elif (a in shias) != (b in shias):
        return 'hates'
    else:
        return 'supports'

try_predictor(p3)
Accuracy: 100.000 %

And we are done.

Lessons

For all of its chaotic hand-drawn relationships, the Big Pharoah diagram really models only two regional feuds, combined with a swipe at the United States for its caution in engaging with either of two warring factions within today’s Egypt.

I draw three lessons about information visualization from the fact that a diagram whose politics are so simplistic has been re-blogged as evidence that the Middle East is complicated.

First, the diagram presents a puzzle for which the human vision is simply not optimized. Never, to my knowledge, does Nature present a hunter-gatherer with a web of different-colored links and demand a quick intuition about whether the nodes form only a few basic groupings or are hopelessly splintered into several. So presenting the information this way makes it basically opaque.

Second, our eyes are very sensitive to similarities between shapes, yet the diagram takes a uniform relationship like “supports” and splays it across the page at a half-dozen different angles and sizes to create a perception of chaos. The fact that the arrows are hand-drawn adds an extra level of visual noise that is simply icing on the cake.

Finally, edge-coloring turns out to be a fairly expensive way to illustrate nodes that fall into a few groups, because in the general case you wind up drawing \(n^2\) edges when instead you could just use 3 or 4 colors to label broad groups and then explain the relationships among them. You could even use a mix of node-colorings and edges: imagine a map of the 30 Years’ War that colors Catholic countries one color, Protestant countries another, and then has a few annotations thrown in to explain the exceptions to those natural allegances that arose during the protracted conflict. I suspect that the same approach would work better here.

©2014