{
 "metadata": {
  "date": "3 September 2013",
  "name": "",
  "tags": "Python"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Untangling the Big Pharoah\u2019s \u201cTerrifying\u201d Chart\n",
      "\n",
      "An interesting diagram has been making the rounds on the Internet.\n",
      "Attributed to a Twitter personality\n",
      "named [@TheBigPharaoh](https://twitter.com/TheBigPharaoh) \u2014\n",
      "whose tweets draw attention to the humanitarian\n",
      "and human rights situation in Egypt \u2014\n",
      "it has been cited by no less an authority than the\n",
      "[Washington Post, who calls it a \u201csort of terrifying\u201d](http://www.washingtonpost.com/blogs/worldviews/wp/2013/08/26/the-middle-east-explained-in-one-sort-of-terrifying-chart/)\n",
      "depiction of the modern Middle East.\n",
      "But as a consumer of data,\n",
      "I was immediately skeptical:\n",
      "there are many ways to make quite simple information\n",
      "look like chaos if it is presented poorly.\n",
      "\n",
      "<div class=\"figure\"><a href=\"http://www.washingtonpost.com/blogs/worldviews/files/2013/08/BSm0bOBCYAAAph6.jpg\"><img style=\"width: 420px\" src=\"http://www.washingtonpost.com/blogs/worldviews/files/2013/08/BSm0bOBCYAAAph6.jpg\"></a></div>\n",
      "\n",
      "Is the information in the diagram really that complex?\n",
      "\n",
      "I decided to try building a very simple data model\n",
      "to see if it could predict every single relationship on the diagram.\n",
      "Not because I think that the real Middle East (or anywhere else)\n",
      "can be adequately described with a simple model,\n",
      "but because I strongly suspected that the diagram itself\n",
      "was in fact modeled on only a few basic regional divisions.\n",
      "\n",
      "<p class=\"note\">\n",
      "Update: reader Alex Burr points out five missing edges in\n",
      "[`pharoahs-chart.json`](http://rhodesmill.org/brandon/2013/pharoahs-chart.json)\n",
      "so please use\n",
      "[`pharoahs-chart-v2.json`](http://rhodesmill.org/brandon/2013/pharoahs-chart-v2.json)\n",
      "instead, which inspired an improvement in the article below:\n",
      "Iran has been added to the `islamists` set,\n",
      "which now overrules the Shia-Sunni split\n",
      "to match the diagram\u2019s assertion that they support Hamas.\n",
      "</p>\n",
      "\n",
      "## Diving into the data\n",
      "\n",
      "So I opened an IPython Notebook and got to work!\n",
      "This blog post is, in fact, the notebook itself,\n",
      "with some Markdown calls full of paragraphs and text\n",
      "added to provide structure and commentary.\n",
      "You can download the original notebook here:\n",
      "\n",
      "[`untangling-big-pharoah.ipynb`](http://rhodesmill.org/brandon/2013/untangling-big-pharoah.ipynb)\n",
      "\n",
      "So that every IPython Notebook does not begin\n",
      "with the same series of verbose import statements,\n",
      "IPython provides a `pylab` directive which imports a few dozen\n",
      "essential NumPy features.\n",
      "It is the first step that I took in getting ready to code:<!--more-->"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%pylab inline"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Populating the interactive namespace from numpy and matplotlib\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So that other people can play with the diagram \u2014\n",
      "and probably do an even better job of analysis than I will here! \u2014\n",
      "I have chosen to represent it as JSON\n",
      "instead of using a Python-specific format.\n",
      "You can download my small data file here:\n",
      "\n",
      "[`pharoahs-chart-v2.json`](http://rhodesmill.org/brandon/2013/pharoahs-chart-v2.json)\n",
      "\n",
      "Once this file is saved to the current directory,\n",
      "Python can load it quite easily with the `load()` method:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import json\n",
      "with open('pharoahs-chart-v2.json') as f:\n",
      "    edges = sorted(json.load(f))\n",
      "\n",
      "a, verb, b = array(edges).T\n",
      "print 'Loaded', len(verb), 'edges'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Loaded 42 edges\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As is common when doing information processing in modern Python,\n",
      "note that I have not left the data as a list-of-lists\n",
      "as it is represented in the underlying JSON file.\n",
      "Instead, I have passed the entire data structure\n",
      "to the NumPy `array()` method\n",
      "which I have then transposed\n",
      "so that the input\u2019s list of 3-element items\n",
      "becomes three big vectors:\n",
      "a vector of actors, a vector of verbs,\n",
      "and finally another vector of actors at which those verbs\n",
      "are respectively directed.\n",
      "\n",
      "A quick count of the number of unique nodes\n",
      "can be a quick way to check against misspellings,\n",
      "since a misspelling will create two unique nodes\n",
      "where the original diagram had only one.\n",
      "Happily, computing the number of unique strings\n",
      "shared between `a` and `b` yields\n",
      "exactly the number of unique nodes in the actual diagram:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print 'Loaded', len(unique(append(a, b))), 'nodes'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Loaded 15 nodes\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We are nearly ready to explore the data!\n",
      "\n",
      "I will propose a series of simple political models of the Middle East,\n",
      "each of which is a function that,\n",
      "given a political actor `a` like `\"Turkey\"`\n",
      "and a potential client `b` like `\"Syria Rebels\"`,\n",
      "returns one of three predictions:\n",
      "\n",
      "* `\"supports\"` \u2014 diagram\u2019s blue lines\n",
      "* `\"hates\"` \u2014 diagram\u2019s red lines\n",
      "* `\"clueless\"` \u2014 diagram\u2019s green lines \n",
      "\n",
      "These predictions can then be compared\n",
      "to the actual arrows on the diagram\n",
      "to rate the political model for its accuracy.\n",
      "Note carefully that these models\n",
      "are only being judged for their ability to correctly color-code\n",
      "the arrows that actually exist in the diagram;\n",
      "they can return whatever nonsense they want\n",
      "for arrows not in the diagram, like `(\"USA\", \"Turkey\")`,\n",
      "because we are only testing the functions against the input data set.\n",
      "\n",
      "Because NumPy supports vector operations\n",
      "that operate simultaneously on whole vectors of input values,\n",
      "it only takes a single `==` operation to compare a series of\n",
      "predictions against the series of actual supports/hates verbs\n",
      "from the diagram.\n",
      "The only catch is that, to perform the actual prediction,\n",
      "we need to \u201cvectorize\u201d each little prediction function\n",
      "to produce a routine that works on a whole vector at a time.\n",
      "And we use another trick:\n",
      "since a series of `==` decisions like `True` and `False`\n",
      "are in fact equivalent to a series of numbers `1` or `0`,\n",
      "we can use `sum()` to count how many `True` values are present!\n",
      "Aside from these two nuances,\n",
      "the reporting routine is rather simple Python:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def try_predictor(predictor, report=True, verbose=False):\n",
      "    \"\"\"Report on how well a `predictor` function performs.\"\"\"\n",
      "\n",
      "    # What does the predictor predict for each situation?\n",
      "\n",
      "    prediction = vectorize(predictor)(a, b)\n",
      "    \n",
      "    # How does that stack up against the diagram?\n",
      "\n",
      "    match = (prediction == verb)\n",
      "    percent = 100.0 * sum(match) / len(match)\n",
      "    print 'Accuracy: %.03f %%' % percent\n",
      "    \n",
      "    # What specific predictions is it making?\n",
      "\n",
      "    if report and (verbose or not all(match)):\n",
      "        print\n",
      "        for is_match, ai, bi, pi in zip(match, a, b, prediction):\n",
      "            if is_match and not verbose:\n",
      "                continue\n",
      "            print '      ' if is_match else 'WRONG:',\n",
      "            print ai, pi, bi\n",
      "        print"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Before getting all political,\n",
      "we should test this analysis and reporting tool\n",
      "by feeding it one or two dummy predictors\n",
      "that are not actually interesting,\n",
      "to see its output.\n",
      "We will try exercising a pair of functions\n",
      "that represent the perfect optimist and the perfect pessimist:\n",
      "the one assumes that members of the human species always support one another,\n",
      "while the other assumes that `\"hates\"` is the universal relationship."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "try_predictor(lambda a, b: 'supports', report=False)\n",
      "try_predictor(lambda a, b: 'hates', report=False)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Accuracy: 47.619 %\n",
        "Accuracy: 47.619 %\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It is a happy fact that the pessimist and optimist\n",
      "are so perfectly balanced in this particular case:\n",
      "the number of friendly links in the diagram, in other words,\n",
      "is equal to the number of enemy relationships.\n",
      "Which almost gives one hope for the world \u2014 not quite, but almost.\n",
      "\n",
      "## Friends and Enemies\n",
      "\n",
      "Given this infrastructure,\n",
      "it will take only a few steps\n",
      "to predict every single political relationship\n",
      "in the Big Pharaoh\u2019s diagram.\n",
      "The real Middle East may be more complex than this,\n",
      "but you would not know it from the diagram!\n",
      "\n",
      "The first thing that strikes me\n",
      "is how many red arrows cut left-to-right across the diagram\n",
      "between the upper right, where we see Russia, Assad, and Iran,\n",
      "and most of the rest of the state\n",
      "and non-state actors that are depicted.\n",
      "This has deep roots:\n",
      "Islam became separated within its first few centuries\n",
      "into a Sunni majority and a Shia minority (as well as many smaller groups),\n",
      "the latter of which claims both Assad and the Iranian leadership as adherents.\n",
      "If we place all of the Shia in a group and throw in Russia \u2014\n",
      "which shares a border with Iran and has served as an ally\n",
      "following the overthrow of the United-States-backed Shah in 1979 \u2014\n",
      "then we find that we are almost halfway\n",
      "to explaining the entire diagram:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "shias = {'Assad', 'Iran', 'Lebanon Shias', 'Russia'}\n",
      "\n",
      "def p1(a, b):\n",
      "    if (a in shias) != (b in shias):\n",
      "        return 'hates'\n",
      "    else:\n",
      "        return 'supports'\n",
      "\n",
      "try_predictor(p1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Accuracy: 71.429 %\n",
        "\n",
        "WRONG: Al Qaeda supports Saudi & Gulf\n",
        "WRONG: Hamas supports Sisi\n",
        "WRONG: Iran hates Hamas\n",
        "WRONG: Israel supports Hamas\n",
        "WRONG: Qatar supports Sisi\n",
        "WRONG: Saudi & Gulf supports Muslim Brotherhood\n",
        "WRONG: Sisi supports Muslim Brotherhood\n",
        "WRONG: Turkey supports Sisi\n",
        "WRONG: USA supports Muslim Brotherhood\n",
        "WRONG: USA supports Sisi\n",
        "WRONG: USA supports Al Qaeda\n",
        "WRONG: USA supports Hamas\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You may be a bit confused about why I am performing a pair of `in`\n",
      "operations and then comparing the output with an `!=` inequality operator.\n",
      "The reason is that I am looking for situations where the answers\n",
      "are either `True` and `False` or else the values `False` and `True`,\n",
      "either one of which indicates that `a` and `b` fall on opposite sides\n",
      "of the division.\n",
      "\n",
      "This predictor brings our success rate to 70%.\n",
      "\n",
      "But there is obviously more going on here,\n",
      "because nearly 30% of the links in the diagram\n",
      "are still being reported incorrectly.\n",
      "Take a moment to read over the list of mis-predictions above.\n",
      "Do they share anything in common?\n",
      "\n",
      "What our first predictor seems blind to\n",
      "is the opposition between populist Islamist movements \n",
      "and most of the nation-states involved in the region.\n",
      "The Arab Spring has made it possible\n",
      "that several of these organizations\n",
      "will now make significant political gains\n",
      "if they can turn their popular support into votes\n",
      "in newly created democracies,\n",
      "but they are considered terrorist organizations\n",
      "by many Western nations and their allies.\n",
      "\n",
      "Three state actors, though, have allied themselves\n",
      "with the Islamist movements instead of opposing them.\n",
      "Theocratic Iran was itself born of an Islamist revolution in\u00a01979.\n",
      "Turkey is a secular democracy that has been flirting with the idea\n",
      "of a more explicitly Islamist government.\n",
      "And Qatar is a more interesting case:\n",
      "while the government itself is an autocracy,\n",
      "it is a [Wahabi](http://en.wikipedia.org/wiki/Wahhabi_movement) state\n",
      "and thus is strongly aligned with the earnestly conservative Islam\n",
      "that motivates many of these political and religious groups.\n",
      "\n",
      "Adding these two rough allegiances into our model,\n",
      "and assuming that Islamists always aid one another\n",
      "while Islamists and moderates are always at odds,\n",
      "very nearly completes the entire diagram!"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "islamists = {'Al Qaeda', 'Hamas', 'Muslim Brotherhood', 'Iran', 'Turkey', 'Qatar'}\n",
      "moderates = {'Saudi & Gulf', 'Sisi', 'Israel', 'USA'}\n",
      "\n",
      "def p2(a, b):\n",
      "    either = {a, b}\n",
      "    if (a in islamists) and (b in islamists):\n",
      "        return 'supports'\n",
      "    elif (either & islamists) and (either & moderates):\n",
      "        return 'hates'\n",
      "    elif (a in shias) != (b in shias):\n",
      "        return 'hates'\n",
      "    else:\n",
      "        return 'supports'\n",
      "\n",
      "try_predictor(p2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Accuracy: 95.238 %\n",
        "\n",
        "WRONG: USA hates Muslim Brotherhood\n",
        "WRONG: USA supports Sisi\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note my careful use of Python set operations to contrive\n",
      "a succinct expression for \u201cif one of the players is populist\n",
      "and the other is autocratic\u201d \u2014\n",
      "if it were not for the ability to do a quick test\n",
      "for an intersection between one of the inputs\n",
      "and either the `islamists` set or the `moderates` set,\n",
      "this new `if` statement would have had to run to several lines.\n",
      "\n",
      "The only thing now missing\n",
      "is that our political predictor\n",
      "never outputs the result `\"clueless\"`\n",
      "and thus cannot correctly predict the stance of the United States\n",
      "with respect to the power struggle in Egypt.\n",
      "I will leave to more informed political commentators\n",
      "whether this characterization of the current administration\n",
      "is fair or not;\n",
      "for our purposes, the only point is that it requires\n",
      "the addition of but a third clause to our predictor,\n",
      "yielding an absolutely perfect `p3()`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "egypt = {'Muslim Brotherhood', 'Sisi'}\n",
      "\n",
      "def p3(a, b):\n",
      "    either = {a, b}\n",
      "    if a == 'USA' and b in egypt:\n",
      "        return 'clueless'\n",
      "    elif (a in islamists) and (b in islamists):\n",
      "        return 'supports'\n",
      "    elif (either & islamists) and (either & moderates):\n",
      "        return 'hates'\n",
      "    elif (a in shias) != (b in shias):\n",
      "        return 'hates'\n",
      "    else:\n",
      "        return 'supports'\n",
      "\n",
      "try_predictor(p3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Accuracy: 100.000 %\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And we are done.\n",
      "\n",
      "## Lessons\n",
      "\n",
      "For all of its chaotic hand-drawn relationships,\n",
      "the Big Pharoah diagram really models only two regional feuds,\n",
      "combined with a swipe at the United States\n",
      "for its caution in engaging with either of two warring factions\n",
      "within today\u2019s Egypt.\n",
      "\n",
      "I draw three lessons about information visualization from the fact\n",
      "that a diagram whose politics are so simplistic\n",
      "has been re-blogged as evidence that the Middle East is complicated.\n",
      "\n",
      "First, the diagram presents a puzzle\n",
      "for which the human vision is simply not optimized.\n",
      "Never, to my knowledge, does Nature present a hunter-gatherer\n",
      "with a web of different-colored links\n",
      "and demand a quick intuition\n",
      "about whether the nodes form only a few basic groupings\n",
      "or are hopelessly splintered into several.\n",
      "So presenting the information this way\n",
      "makes it basically opaque.\n",
      "\n",
      "Second, our eyes are very sensitive\n",
      "to similarities between shapes,\n",
      "yet the diagram takes a uniform relationship like \u201csupports\u201d\n",
      "and splays it across the page at a half-dozen different\n",
      "angles and sizes to create a perception of chaos.\n",
      "The fact that the arrows are hand-drawn\n",
      "adds an extra level of visual noise\n",
      "that is simply icing on the cake.\n",
      "\n",
      "Finally, edge-coloring turns out to be a fairly expensive way\n",
      "to illustrate nodes that fall into a few groups,\n",
      "because in the general case you wind up drawing $n^2$ edges\n",
      "when instead you could just use 3 or 4 colors\n",
      "to label broad groups\n",
      "and then explain the relationships among them.\n",
      "You could even use a mix of node-colorings and edges:\n",
      "imagine a map of the 30 Years\u2019 War\n",
      "that colors Catholic countries one color,\n",
      "Protestant countries another,\n",
      "and then has a few annotations thrown in\n",
      "to explain the exceptions to those natural allegances\n",
      "that arose during the protracted conflict.\n",
      "I suspect that the same approach would work better here."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}