A Python Æsthetic

Beauty, and
Why I Python
Brandon Rhodes
PyCon Canada 2012

Why Do I Write Python?

Beautiful to think about
Beautiful to look at
Language → beautiful ideas
Community → beautiful code
It does not matter how
beautiful the ideas are behind
a programming language
if
its community insists
on writing ugly code
It is the traditions and
practices of our community
that make code look like Python

Why is python beautiful?

Because of us

So this talk weaves
together two topics:
And this talk generally
frames these topics with:
Math → language design
Typesetting → coding practices

Why math and typesetting?

Because those are
my particular background
You might love Python for
quite different reasons
But to tell my own story:
Math + Typesetting

First, a meta-question

I have a question about
your own thought process
What do you think about,
what occupies mental space,
as you are typing code?
For me:

The Stack

Consider the moment when
I start typing some code
x =
    ^

I type an open-paren

x = canvas.drawString(
                      ^

And another

x = canvas.drawString(margin + (
                                ^

I open a bracket

x = canvas.drawString(margin + (indent[
                                       ^

And close it again

x = canvas.drawString(margin + (indent[LEFT] / 2
                                                ^

(And so forth)

My solution?

Keep the obligation stack short!

You can keep closing
the brackets nearby:
x = canvas.drawString(margin + (indent[ ]))
                                       ^

Fight the Stack!

So

Short stacks
make focus possible
Once we are focused,
what kind of code should
we be trying to write?

Python and Language Design

What other languages have
I used over the years?

My story, in brief:

BASIC → C → awk → Python

(Plus: 6809 assembly and machine code, nroff, TeX, Basic09, sed, LISP, Smalltalk, C++, C#, Java, Scheme, JavaScript)

Honorable Mention

Modula-3 → Java, Python

Python is not radical

It looks very much like
several earlier languages

C++

//  sgp4fix for afspc written intrinsic functions
// nodep used without a trigonometric function ahead
if ((nodep < 0.0) && (opsmode == 'a'))
    nodep = nodep + twopi;
xls    = mp + argpp + cosip * nodep;
dls    = pl + pgh - pinc * nodep * sinip;
xls    = xls + dls;
xnoh   = nodep;
nodep  = atan2(alfdp, betdp);

Python

#   sgp4fix for afspc written intrinsic functions
#  nodep used without a trigonometric function ahead
if nodep < 0.0 and opsmode == 'a':
    nodep = nodep + twopi;
xls    = mp + argpp + cosip * nodep;
dls    = pl + pgh - pinc * nodep * sinip;
xls    = xls + dls;
xnoh   = nodep;
nodep  = atan2(alfdp, betdp);

<Personal Aside>

Q:

Why translate astronomy code to
Python instead of wrapping
the existing C++ library?
xnodce = fmod(4.5236020 - 9.2422029e-4 * day, twopi);
stem   = sin(xnodce);
ctem   = cos(xnodce);
zcosil = 0.91375164 - 0.03568096 * ctem;
zsinil = sqrt(1.0 - zcosil * zcosil);

A:

Thus

extension modules → pain

So I have undertaken
a really Big Project

Rewrite PyEphem in pure Python!

✓ Planets — jplephem
✓ Satellites — sgp4
  Coordinates — ephem

Each piece is as independent package

✓ Planets — jplephem
✓ Satellites — sgp4
  Coordinates — ephem

These pure-Python modules...

And — what about performance?

pypy_logo.png
Quick measurement suggests:

PyPy > Python + C

So, that is why I have
been translating C++ code
to Python and thinking
about languages
Let me know if you
are interested in taking
a look at it during the sprits

</Personal Aside>

So, why do C++ and Python
(and C, Pascal, Java, Algol, …)
look so much alike?
xls    = xls + dls;
xnoh   = nodep;
nodep  = atan2(alfdp, betdp);

Because of math

Let me start with a complaint

Consider the keystrokes
necessary to step through
the following investigation
print foo
print dir(foo)
print foo.bar
print len(foo.bar)
print foo.bar[1]
print dir(foo.bar[1])
print foo.bar[1].baz
What if a language’s operations
let you keep typing instead
of stopping to add parens?

3 popular approaches

  1. jQuery
  2. Unix shell
  3. LISP
jQuery chains everything
through endless method calls
jQuery uses chaining to permit
“just-keep-typing” programming
$('div').parent().find('h1')
     .attr('data-level', '1')
     .css('display', 'block')
If Python were more like
jQuery then debugging
might look more like:
print foo
print foo.dir()
print foo.bar
print foo.bar.len()
print foo.bar[1]
print foo.bar[1]
print foo.bar[1].baz
Downside: cannot symmetrically
express binary operation;
to add two items:
foo.add(bar)
bar.add(foo)

Unix Pipline

Inital data stream gets modified
by each of a series of filters:
cat log |
    grep 'Connection error' |
    awk '{print $2}' |
    sort |
    uniq -c
Interestingly, McIlroy’s original
idea for Unix pipelines
looked more like this:
inputfile sort paginate printerfile

<Historial Aside>

Donald Knuth once showed
his“literate programming”
approach in a beautifully-
documented program
Result: 10+ pages of Pascal
In his review of the paper,
McIlroy not only pointed out
several bugs, but offered a
bug-free alternative—
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
No one who read his
review ever seems to have
forgotten the lesson:
Simple filters that can
be arbitrarily chained are
more easily re-used, and more
robust, than almost any
other kind of code

</Historial Aside>

So anyway: pipelines
are an alternative when
you want more simplicity
than arbitrary math expressions

LISP

Another kind of consistency

a + b + c

not LISP

f(a)

not LISP

Adding 3 numbers in LISP
looks exactly like calling a
function f with 3 parameters:
(+ a b c)
(f a b c)
No ambiguity or
order-of-operations:
each nested expression
gets its own parens
(* (+ a b) m)
(output (concat page1 page2) printer)

What order?

LISP code always happens
from inside to outside
(list (nth 0 ad-return-value) ;; original word
      (nth 1 ad-return-value) ;; offset in file
      (remove-if 'contains-space-p
                 (nth 2 ad-return-value))
      (remove-if 'contains-space-p
                 (nth 3 ad-return-value))
      ))))
LISP refuses to special-case
the traditional math operators
but insists on

one syntax to rule them all

So that's
jQuery, pipelines, LISP
But, Python chooses
to follow math
  
x + y
      
x + y + z
But what if we
want the logarithm?
We now have to add
symbols to both sides
         
ln(x + y + z)

Negation?

 
- ln(x + y + z)

Lesson

Math symbols fly everywhere
Python has four main ways
to expand an expression
  1. Prefix operator -z
  2. Binary operator x + y
  3. Wrap in callable f(x, y)
  4. Attribute/method f.bar

Order of Operations

a + b * c
b * c + a
In both of these expressions,
multiplication will happen first
PEP-8: you can lay out your code
to make the order look obvious
a + b*c
a + (b * c)
a.b() + c[d]
Because they bind so tightly,
PEP-8 requires there be no spaces
between a name and () or []

Another math advantage

Context Freedom

This is a huge benefit derived
from the syntax of mathematics
Python’s syntax is a
context-free grammar
Means that a given
construct can have
only one meaning
(a -b)  # Python - one possible meaning
(a -b)  # Ruby - two possible meanings
Python’s context-free grammar
not only puts it in the mainstream
of decades of language design—
—but makes it easier to read
snippets of code without having
to examine the entire file
My point is not that Ruby people
ever actually write something like
(a -b) in a context in which
they could confuse theselves
My point is that Python has
a logical inner consistency
And if you happen to be
math-ish and sensitive, the
consistency will make you happy

Intermediate results

In Python, you can always
evaluate an expression partway
and save the result
a = (3 * 4) + 5

# is ALWAYS equivalent to

t = (3 * 4)
a = t + 5
For many computer languages,
this is not true of method calls!
a = spreadsheet.compute('D4')
A method call looks like
an attribute lookup that returns
a callable that we invoke
c = spreadsheet.compute  # attr lookup
c('D4')              # invoke callable
But appearance misleads!
C++ and JavaScript
special-case method calls as a
ternary operator that is not the
same as lookup + invocation!
# Python is so awesome

draw = canvas.drawString
draw(60, 120, 'A rose is a rose.')
Like every other expression,
can save not-yet-called method
or passed it as a callback

Aaaaah, consistency!

There is one final
benefit that we should mention
of Python’s having a generally
math-friendly syntax
Just like a math paper defines
the symbols and operators it uses,
Python makes you import
the things you need
Explicit import means
you should never have to search
your entire codebase to find
a stray definition

Please Remember

import piano
piano.Bench()      # yes!
The simplest import syntax
expects you to qualify each class,
function with the module name
import piano
piano.PianoBench()  # no!
So you do not have to
qualify names with extra works
to make them “extra unique”

PEP-8

“The X11 library uses
a leading X for all its public
functions. In Python, this style is
generally deemed unnecessary
For example, logging
both gets this right:
class Logger
class Handler
class Filter
and gets it wrong:
class LogRecord
The modern json Standard
Library module is an example
of good practice
import json
json.loads(...)
json.dumps(...)

# not json_load() or jdump()

So,

if you keep names short,
then you leave the caller in charge
of whether to qualify them or not
While we are on
the subject of imports:

Import Loops

I used to hate that Python
raises errors on import loops:
But I now suspect that an
import loop often indicates
a failure to carefully architect
my code into proper layers
If you use dependency injection
to keep higher-level code in charge
of lower-level modules, then
import loops do not occur

anyway

So those are some of the
benefits that Python inherits
from the traditions and the
notation of mathematics

Typesetting!

Confession:

I re-format paragraphs by
hand to make them look good
Depending on the line length,
the browser might split a
particularly difficult paragraph so
long and short lines interleave
(And, of course, that paragraph
is deliberately an example!)

So what do I do?

I break every
paragraph manually

Behold the majesty!

Depending on the line
length, the browser might split a
particularly difficult paragraph so
long and short lines interleave

Confession:

I similarly re-format
email paragraphs

Recent discovery

Many email clients re-format
plain-text email and re-wrap
each paragraph themselves!
In such cases, my carefully
hand-wrapped paragraphs
are for naught

Quiz

How many of you use email
clients that keep 80-column
plain-text emails pristine
in a fixed-width font?

Three words

OFF MY LAWN

anyway

Where did I get so
interested in typesetting?
tex0.jpg

Motivation

Knuth’s publisher
was cutting costs
So Volume 2
of Knuth's life’s work,
The Art of Computer Programming,
looked pretty ugly
tex_logo.png
“a new typesetting system
intended for the creation of
beautiful books” (1978)
He built the whole stack
all by himself
Font design Computer Modern
Font rasterization METAFONT
Plain-text markup TeX macros
Device-independent output DVI
Printing DVI device driver
Today we use the same stack
only with different tools
FontCreator, FontForge
ClearType, OpenType
Markdown, RST
PDF documents
OS printer drivers
Doing a full stack,
from designing a typeface
to inventing algorithms for
page layout, was quite a
challenge for Knuth
“then there was the letter S.
None of my mathematical
formulas would handle it, and
I spent several days without
sleep up at the lab”
Donald finally came home and
showed Jill the results
Her comment:

“…why don’t you make it S-shaped?”

Computer_modern_sample.svg
tex1.png
He immersed himself deeply
in the history of typography
\ddangerexercise Since \TeX\ reads an entire
paragraph before it makes any decisions about line
breaks, the computer's memory capacity might
^^{capacity exceeded} be exceeded if you are
typesetting the works of some ^^{Joyce, James}
^{philosopher} or modernistic novelist who writes
200-line paragraphs. Suggest a way to cope with such
authors.
\answer Assuming that the author is deceased and/or
set in his or her ways, the remedy is to insert
`|{\parfillskip=0pt\par\parskip=0pt\noindent}|'
in random places, after each 50 lines or so of]
text. \ (Every space between words is usually a
feasible breakpoint, when you get sufficiently
far from the beginning of a paragraph.)

A tempting definition—

“TeX”

A computational engine
for converting backslashes
into beautiful documents

<Personal Aside>

I need to design a book,
but can no longer bear to
make myself use TeX
So I have started a new project!

python-bookbinding

It turns text into paragraphs
then paragraphs into pages, then
draws them in a real PDF using the
popular reportlab library
Python does have a built-in
textwrap module for splitting
paragraphs into lines, but its
algorithm is too simplistic
for professional quality
So bookbinding uses the
same high-powered typesetting
algorithms originally developed
by Knuth for TeX!
(Thanks, Andrew Kutchling,
for texlib!)
Let me know if you
are interested in taking
a look at it during the sprits

</Personal Aside>

So

Typesetting. Knuth.

With all those backslashes in TeX,
you might not think that Knuth
would have advice for writing
beautiful Python code

But:

Python code is based
on the syntax of math
and
Knuth became a world expert
on how whitespace should be
used when laying out math!
tex5.png
$$ 1 + \left( 1 \over 1 - x^2 \right)^3 $$
tex5.png
Whitespace. Expressions. Beauty.
Knuth
Whitespace. Expressions. Beauty.
Python
Yes, that brings us
again to considering:

PEP-8

You can think of PEP-8
as a set of compositor’s rules
for typesetting Python code
on your screen

Example

PEP-8 specifies the
basic shape of a “page” of code
“Limit all lines to a
maximum of 79 characters.”
This is an exact analogue to
the standard advice of graphic
designers about paragraph width:

45–75 characters

So how do you handle
the line-length restriction?
When you reach the right edge,
you might be tempted to wrap
a Python statement across
several lines of code
But what if you introduced a new name instead?
canvas.drawString(x, y,
    'Please press {}'.format(key))
message = 'Please press {}'.format(key)
canvas.drawString(x, y, message)

Naming intermediate values

message = 'Please press {}'.format(key)
canvas.drawString(x, y, message)
This is actually an
idea I picked up from those
Extreme Programming (XP) guys
XP people tended to use variable
names to Destroy All Comments
widget.reset(True)  # forces re-draw
yes_force_redraw = True
widget.reset(yes_force_redraw)
XP people also point out that
big “section title” comments can
often be replaced with a function
...

# Open the barn

barn = models.Barn.get()
barn.unlock()
barn.open()

# Saddle the horse

...
    ...
    open_barn()
    saddle_horse()
    ...

def open_barn():
    barn = models.Barn.get()
    barn.unlock()
    barn.open()
The XP movement took it too far
but I really love using more names
that usefully replace comments or
let me avoid really long lines
# React if window too tall

if win.x1 - win.x0 > vp.h:
    ...
too_tall = (win.x1 - win.x0) > vp.h
if too_tall:
    ...
Another traditional
typesetter goal:
The page should be an attractive
block of text without ugly rivers
of whitespace spilling down it
kelmscott08.jpg
Attention to space
can also help the
look of our code
# Yes:
x = 1
y = 2
long_variable = 3

# No:
x             = 1
y             = 2
long_variable = 3
For example, extra whitespace
to align variable values
is forbidden by PEP=8
Another layout idea
that I use comes from Linux
inventor Linus Torvals
Torvalds wrote the masterful
“Linux kernal coding style”
for the C language
“Now, some people will
claim that having 8-character
indentations makes the code move
too far to the right and makes it
hard to read on a 80-character
terminal screen.”
“The answer to that is that
if you need more than 3 levels
of indentation, you’re screwed
anyway, and should fix
your program.”
— Linus Torvalds
I actually agree with Linus here
With each year that I keep
programming, I find more value
in code that stays very close
to the screen’s left margin

My indentation settings

That last because web pages just
tend to be deeper than code!
Indentation getting too deep?
Here are four tricks I use!

#1 Use continue

for item in sequence:
    if is_valid(item):
        if not is_inconsequential(item):
            item.do_something()
for item in sequence:
    if not is_valid(item):
        continue
    if is_inconsequential(item):
        continue
    item.do_something()

#2 Factor out a new method

def mymethod(self):
   for item in sequence:
       if item.is_good():
           for widget in item:
               ...
def mymethod(self):
    for item in sequence:
        if item.is_good():
            for widget in item:
                self.finalize(widget)

def finalize(self, widget): ...
But, if self is not involved,
why make the routine a method?

Look again:

def mymethod(self):
   for item in sequence:
        if item.is_good():
            self.finalize_widgets(item)

def finalize_widgets(self, item):
    for widget in item:
        widget.close()
Since the routine does not even
use self you can pull it
out as a plain function

#3 Split out a function

    def mymethod(self):
        for item in self.sequence:
            if item.is_good():
                _finalize_widgets(item)

def _finalize_widgets(item):
    for widget in item:
        widget.close()
This, by the way, is a
significant way that Python
has been training its community
Django made mistakes,
but is far more Pythonic
than many competitors!
It recognizes that a web view
could just be a plain function!

(Flask, Bottle followed later)

#4 Factor out an iterator

for item in sequence:
    for widget in item:
        for bitmap in widget:
            for pixel in bitmap:
                pixel.align()
                pixel.darken()
                pixel.draw()
def widget_pixels(sequence):
    for item in sequence:
        for widget in item:
            for bitmap in widget:
                for pixel in bitmap:
                    yield pixel

for pixel in widget_pixels(sequence):
    pixel.align()
    pixel.darken()
    pixel.draw()
Factoring out iterators (#4)
to keep code shallow is
a Python superpower
Another source
of ugly whitespace:

Large function calls

Unfortunately the following
is a PEP-8 recommendation:
foo = long_function_name(var_one, var_two,
                         var_three, var_four)
Which brings us to the
5 Stages of Function Call Grief

Stage 1: brevity

asymtotic_reduction(arg1, arg2)

Stage 2: >80 columns

asymtotic_reduction(arg1, arg2,
                    arg3, arg4)

Stage 3: leftward collapse

asymtotic_reduction(arg1, arg2,
                    arg3, arg4,
                    arg5)
asymtotic_reduction(arg1, arg2,
    arg3, arg4, arg5)

Stage 4: argument ballooning

asymtotic_reduction(arg1, arg2,
    die_on_error=arg3, heigth=arg4,
    width=arg5 / 2.0 + COLUMN_WIDTH)

Stage 5: an argument-per-line

asymtotic_reduction(
    x=arg1,
    y=arg2,
    die_on_error=arg3,
    height=arg4,
    width=arg5 / 2.0 + COLUMN_WIDTH,
    )

Argument-per-line is AWESOME

Every argument looks the same
Orthogonal in version-control
asymtotic_reduction(
    die_on_error=arg3,
    height=arg4,
    width=arg5 / 2.0 + COLUMN_WIDTH,
    )
Why would adjancent lines
not be treated separately by
your version-control?
The Problem: when adding or
changing Line n requires
another line (n-1)
to be modified

Example

Most langauges today use
a statement terminator
Pascal decided to use
a statement separator

Pascal statements are “highly coupled”

    x := sin(a);
    y := cos(a)
End;
        
    x := sin(a);
    y := cos(a);  # CHANGED
    z := tan(a)   # NEW
End;
This, of course, makes your
version control system (git, hg)
flag two lines as changed!

C langauge

int biglist[] = {
    112,
    223
};

        

int biglist[] = {
    112,
    223,    # CHANGE
    334     # NEW
};
When you design a language,
every construct that can span
lines should allow utter symmetry
between the first, middle, and
last lines in the construct!

Python always gets this right

Because Python is awesome

big_tuple = (
    12,
    23,
    )
big_list = [
    34,
    45,
    ]
big_dict = {
    'one': 1,
    'two': 2,
    }

So option #5 argument-per-line makes VC happy:

asymtotic_reduction(
    x=arg1,
    y=arg2,
    die_on_error=arg3,
    height=arg4,
    width=arg5 / 2.0 + COLUMN_WIDTH,
    )
I do sometimes make
exceptions if parameters can
be grouped logically
canvas.drawString(x + margin, y - line_height,
                  'The Naming of Cats')
But, many experienced Python
programmers immediately snap
into arg-per-line mode
The Python community
keeps developing new practices;
PEP-8 was not the end!
twitter.png

Should PEP-8 continue evolving?

Probably not

PEP-8 should remain an
essential common denominator
It is hard enough to get some
projects to adopt PEP-8 already!
But we should find new ways
to communicate these ideas when
we run across the fact that several
of us have the same coding habit

So

Lists separated by
commas can be pretty

But

What about terms separated
by a series of operators?

+ - * /

Here, PEP-8 is actually harmful

PEP-8

“The preferred place to break
around a binary operator is after
the operator, not before it…”
if (width == 0 and height == 0 and
    color == 'red' and emphasis == 'strong' or
    highlight > 100):
How do I know that
this is bad advice?
don.gif

KNUTH

tex5.png

Knuth = typesetting + math

It turns out that Knuth
has written hundreds of pages
about formatting expressions

So

What is his advice about
breaking them into lines?
“It’s quite an art to decide
how to break long displayed
formulas into several lines…”
“…it is often desirable to
emphasize some of the symmetry
or other structure that
underlies a formula…”

Laying down the law

“displayed formulas
always break before binary
operations and relations.”

PEP-8: bad

adjusted_income = (gross_wages +
    taxable_interest +
    (dividends - qualified_dividends) -
    ira_deduction -
    student_loan_interest)
adjusted_income = (gross_wages +
    taxable_interest +
    (dividends - qualified_dividends) -
    ira_deduction -
    student_loan_interest)

Knuth, instead of PEP-8

adjusted_income = (gross_wages
    + taxable_interest
    + (dividends - qualified_dividends)
    - ira_deduction
    - student_loan_interest)

So

There are long traditions in math
that can help us improve how
we write our Python code

What about method chains?

With ORMs everywhere,
the question of chained
methods keeps coming up
I never use backslash
continuation, so I need another
way to do long chains!
# UGH BAD HURTS MY EYES

query = Person.filter(last_name='Smith') \
    .order_by('social_security_number') \
    .select_related('spouse')

Option #1

Close each method on next line

query = Person.filter(last_name='Smith'
    ).order_by('social_security_number'
    ).select_related('spouse')

Option #2

Use outer parens, period ends line

query = (Person.
    filter(last_name='Smith').
    order_by('social_security_number').
    select_related('spouse')
    )

Option #3

Use outer parens, period begins line

query = (Person
    .filter(last_name='Smith')
    .order_by('social_security_number')
    .select_related('spouse')
    )

This option #3 is my favorite

query = (Person.
    .filter(last_name='Smith')
    .order_by('social_security_number')
    .select_related('spouse')
    )
VC will be happy that adding a
4th method call does not require
the previous line to be adjusted!
But, method chains still
seem to be an emerging Python
practice; I sometimes use
intermediate variables
q = Person.filter(last_name='Smith')
q = q.order_by('social_security_number')
q = q.select_related('spouse')

anyway

So

To be happy like me,
make code pretty
When at work, I avoid
tweaking other people’s code
willy-nilly if I visit a module
for something specific
But if I touch a line of code
in the course of my duties, I am
always trying to find the next
tweak to make that section of
code really beautiful

Your Homework

  1. Re-read PEP-8
  2. “Linux kernel coding style”
  3. Tell me refactoring stories
  4. Ask for drive-by code review

Thank you!