by Brandon Rhodes • Home

I finally understand nested comprehensions

Date: 11 March 2009
Tags:computing, python

This entire blog post can be summarized in the words of Guido himself that I have just discovered down at the bottom of PEP-202 (“List Comprehensions”):

The form [... for x... for y...] nests, with the last index varying fastest, just like nested for loops.

Have you ever seen a Python list comprehension like that, with two or more for loops inside? I have just written my first one! It was only recently that I discovered they were even possible, when I encountered several in a draft of the upcoming Natural Language Processing with Python book. (Which should be great — watch for O'Reilly to publish it!) They almost never turn up in other code that I encounter, and perhaps for good reason: they were deeply confusing the first time I saw them!

The code I have just written is shown below. It uses the Python Imaging Library to produce an image I will use in the series of blog posts that I started yesterday on watermarking PDF files. The code requires a small arecibo.txt file, detailing the radio message that was sent from the Arecibo Observatory in November 1974 to any other civilizations that might be listening. As you can see, I have successfully used two for clauses in the list comprehension that generates the image's pixels:

"""Draw the Arecibo message (blue on transparent)."""
from PIL import Image
image = Image.new("RGBA", (23, 73))
image.putdata([
    (192,224,255,255) if char == '1' else (0,0,0,0)
    for line in open('arecibo.txt')
    for char in line.strip()
    ])
image.save('arecibo.png')

Each pixel is a four-value tuple, by the way, because an RGBA image not only has a red, green, and blue channel for each pixel, but also an “A channel” specifying its opacity or transparency. The colors in use here are a completely opaque light blue, and a completely transparent color (the four zeros). The result looks something like:

Arecibo message

My mistake in reading the multiple for clauses was that, old C-language programmer that I am, I was expecting the comprehension structure to be concentric. That is, I thought that the last for must “enclose” the ones above it, creating a mess of lists inside of lists inside of lists. But it turns out that they are much simpler to read than that. Just read them like normal loops, with the “big loop” described first and the subsequent loops nested inside of it:

#!python
# The list comprehension said:
  [ expression
    for line in open('arecibo.txt')
    for char in line.strip() ]

# It therefore meant:
for line in open('arecibo.txt'):
    for char in line.strip():
        list.append(expression)

So, to read the comprehension, just picture colons appended to each for clause and, finally, the expression moved down inside of the innermost for loop.

Now that I have made this conceptual leap, I can “picture” the normal for loops each time I see a complicated list comprehension, and they are trivial to read and write! It still, I admit, feels odd that the expression, which would be deep inside of normal for loops, goes in front of them in a comprehension instead. And I am not sure that double comprehensions should become part of my normal coding style. (How many other Python programmers understand them? Has everyone else been using them without problems?) But they are a neat trick to have up my sleeve when I need to iterate over an image quickly and want to pack everything into a single, easily-bloggable expression.

©2021