A Python Æsthetic

Beauty, and

Why I Python

Brandon Rhodes

PyCon Canada 2012

Why Do I Write Python?

Beautiful to think about

Beautiful to look at

Language → beautiful ideas

Community → beautiful code

It does not matter how
beautiful the ideas are behind
a programming language

its community insists

on writing ugly code

It is the traditions and

practices of our community

that make code look like Python

Why is python beautiful?

Because of us

So this talk weaves

together two topics:

Language design
Coding practices

And this talk generally

frames these topics with:

Math → language design

Typesetting → coding practices

Why math and typesetting?

Because those are

my particular background

First, a meta-question

I have a question about

your own thought process

What do you think about,
what occupies mental space,
as you are typing code?

For me:

My obligation stack
The code’s visual layout

The Stack

Consider the moment when

I start typing some code

x =
    ^

(Obligation "stack" is empty)

I type an open-paren

x = canvas.drawString(
                      ^

Owe a close paren

And another

x = canvas.drawString(margin + (
                                ^

Owe a close paren
Owe another close paren

I open a bracket

x = canvas.drawString(margin + (indent[
                                       ^

Owe a close paren
Owe another close paren
Owe a close bracket

And close it again

x = canvas.drawString(margin + (indent[LEFT] / 2
                                                ^

Owe a close paren
Owe another close paren

(And so forth)

My solution?

Keep the obligation stack short!

You can keep closing

the brackets nearby:

x = canvas.drawString(margin + (indent[ ]))
                                       ^

Fight the Stack!

Add that import now
Stub out functions now
Keep short-term TODOs
Runs now with print
Do not try to remember anything!

So

Short stacks

make focus possible

Once we are focused,
what kind of code should
we be trying to write?

Python and Language Design

What other languages have

I used over the years?

My story, in brief:

BASIC → C → awk → Python

(Plus: 6809 assembly and machine code, nroff, TeX, Basic09, sed, LISP, Smalltalk, C++, C#, Java, Scheme, JavaScript)

Honorable Mention

Modula-3 → Java, Python

Interfaces
Exceptions
Objects
Import

Python is not radical

It looks very much like

several earlier languages

C++

//  sgp4fix for afspc written intrinsic functions
// nodep used without a trigonometric function ahead
if ((nodep < 0.0) && (opsmode == 'a'))
    nodep = nodep + twopi;
xls    = mp + argpp + cosip * nodep;
dls    = pl + pgh - pinc * nodep * sinip;
xls    = xls + dls;
xnoh   = nodep;
nodep  = atan2(alfdp, betdp);

Python

#   sgp4fix for afspc written intrinsic functions
#  nodep used without a trigonometric function ahead
if nodep < 0.0 and opsmode == 'a':
    nodep = nodep + twopi;
xls    = mp + argpp + cosip * nodep;
dls    = pl + pgh - pinc * nodep * sinip;
xls    = xls + dls;
xnoh   = nodep;
nodep  = atan2(alfdp, betdp);

<Personal Aside>

Q:

Why translate astronomy code to
Python instead of wrapping
the existing C++ library?

xnodce = fmod(4.5236020 - 9.2422029e-4 * day, twopi);
stem   = sin(xnodce);
ctem   = cos(xnodce);
zcosil = 0.91375164 - 0.03568096 * ctem;
zsinil = sqrt(1.0 - zcosil * zcosil);

A:

Windows installation: hard
Visual Studio Express: no
Amateur astronomers: not coders

Thus

extension modules → pain

So I have undertaken

a really Big Project

Rewrite PyEphem in pure Python!

✓ Planets — jplephem
✓ Satellites — sgp4
  Coordinates — ephem

Each piece is as independent package

✓ Planets — jplephem
✓ Satellites — sgp4
  Coordinates — ephem

These pure-Python modules...

Work in both Python 2 and 3
Require no C extensions
Use NumPy if available

And — what about performance?

Quick measurement suggests:

PyPy > Python + C

So, that is why I have
been translating C++ code
to Python and thinking
about languages

Let me know if you
are interested in taking
a look at it during the sprits

</Personal Aside>

So, why do C++ and Python
(and C, Pascal, Java, Algol, …)
look so much alike?

xls    = xls + dls;
xnoh   = nodep;
nodep  = atan2(alfdp, betdp);

Because of math

Let me start with a complaint

Consider the keystrokes
necessary to step through
the following investigation

print foo
print dir(foo)
print foo.bar
print len(foo.bar)
print foo.bar[1]
print dir(foo.bar[1])
print foo.bar[1].baz

What if a language’s operations
let you keep typing instead
of stopping to add parens?

3 popular approaches

jQuery
Unix shell
LISP

jQuery chains everything

through endless method calls

jQuery uses chaining to permit

“just-keep-typing” programming

$('div').parent().find('h1')
     .attr('data-level', '1')
     .css('display', 'block')

If Python were more like
jQuery then debugging
might look more like:

print foo
print foo.dir()
print foo.bar
print foo.bar.len()
print foo.bar[1]
print foo.bar[1]
print foo.bar[1].baz

Downside: cannot symmetrically
express binary operation;
to add two items:

foo.add(bar)
bar.add(foo)

Unix Pipline

Inital data stream gets modified

by each of a series of filters:

cat log |
    grep 'Connection error' |
    awk '{print $2}' |
    sort |
    uniq -c

Interestingly, McIlroy’s original
idea for Unix pipelines
looked more like this:

inputfile sort paginate printerfile

<Historial Aside>

Donald Knuth once showed
his“literate programming”
approach in a beautifully-
documented program

Result: 10+ pages of Pascal

In his review of the paper,
McIlroy not only pointed out
several bugs, but offered a
bug-free alternative—

tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q

No one who read his
review ever seems to have
forgotten the lesson:

Simple filters that can
be arbitrarily chained are
more easily re-used, and more
robust, than almost any
other kind of code

</Historial Aside>

So anyway: pipelines
are an alternative when
you want more simplicity
than arbitrary math expressions

LISP

Another kind of consistency

a + b + c

not LISP

f(a)

not LISP

Adding 3 numbers in LISP
looks exactly like calling a
function f with 3 parameters:

(+ a b c)
(f a b c)

No ambiguity or
order-of-operations:
each nested expression
gets its own parens

(* (+ a b) m)
(output (concat page1 page2) printer)

What order?

LISP code always happens

from inside to outside

(list (nth 0 ad-return-value) ;; original word
      (nth 1 ad-return-value) ;; offset in file
      (remove-if 'contains-space-p
                 (nth 2 ad-return-value))
      (remove-if 'contains-space-p
                 (nth 3 ad-return-value))
      ))))

LISP refuses to special-case
the traditional math operators
but insists on

one syntax to rule them all

So that's

jQuery, pipelines, LISP

But, Python chooses

to follow math

  ↘
x + y

      ↘
x + y + z

But what if we

want the logarithm?

We now have to add

symbols to both sides

  ↙       ↘
ln(x + y + z)

Negation?

 ↙
- ln(x + y + z)

Lesson

Math symbols fly everywhere

Prefix operators
Infix operators
Functions

Python has four main ways

to expand an expression

Prefix operator -z
Binary operator x + y
Wrap in callable f(x, y)
Attribute/method f.bar

Order of Operations

a + b * c

b * c + a

In both of these expressions,

multiplication will happen first

PEP-8: you can lay out your code

to make the order look obvious

a + b*c

a + (b * c)

a.b() + c[d]

Because they bind so tightly,
PEP-8 requires there be no spaces
between a name and () or []

Another math advantage

Context Freedom

This is a huge benefit derived

from the syntax of mathematics

Python’s syntax is a

context-free grammar

Means that a given
construct can have
only one meaning

(a -b)  # Python - one possible meaning
(a -b)  # Ruby - two possible meanings

Python’s context-free grammar
not only puts it in the mainstream
of decades of language design—

—but makes it easier to read
snippets of code without having
to examine the entire file

My point is not that Ruby people
ever actually write something like
(a -b) in a context in which
they could confuse theselves

My point is that Python has

a logical inner consistency

And if you happen to be
math-ish and sensitive, the
consistency will make you happy

Intermediate results

In Python, you can always
evaluate an expression partway
and save the result

a = (3 * 4) + 5

# is ALWAYS equivalent to

t = (3 * 4)
a = t + 5

For many computer languages,

this is not true of method calls!

a = spreadsheet.compute('D4')

A method call looks like
an attribute lookup that returns
a callable that we invoke

c = spreadsheet.compute  # attr lookup
c('D4')              # invoke callable

But appearance misleads!

C++ and JavaScript
special-case method calls as a
ternary operator that is not the
same as lookup + invocation!

# Python is so awesome

draw = canvas.drawString
draw(60, 120, 'A rose is a rose.')

Like every other expression,
can save not-yet-called method
or passed it as a callback

Aaaaah, consistency!

There is one final
benefit that we should mention
of Python’s having a generally
math-friendly syntax

Just like a math paper defines
the symbols and operators it uses,
Python makes you import
the things you need

Explicit import means
you should never have to search
your entire codebase to find
a stray definition

Please Remember

import piano
piano.Bench()      # yes!

The simplest import syntax
expects you to qualify each class,
function with the module name

import piano
piano.PianoBench()  # no!

So you do not have to
qualify names with extra works
to make them “extra unique”

PEP-8

“The X11 library uses
a leading X for all its public
functions. In Python, this style is
generally deemed unnecessary”

For example, logging

both gets this right:

class Logger
class Handler
class Filter

and gets it wrong:

class LogRecord

The modern json Standard
Library module is an example
of good practice

import json
json.loads(...)
json.dumps(...)

# not json_load() or jdump()

So,

if you keep names short,
then you leave the caller in charge
of whether to qualify them or not

While we are on

the subject of imports:

Import Loops

I used to hate that Python

raises errors on import loops:

Module a needs something
From module b which tries
To import something from a

But I now suspect that an
import loop often indicates
a failure to carefully architect
my code into proper layers

If you use dependency injection
to keep higher-level code in charge
of lower-level modules, then
import loops do not occur

anyway

So those are some of the
benefits that Python inherits
from the traditions and the
notation of mathematics

Typesetting!

Confession:

I re-format paragraphs by

hand to make them look good

Depending on the line length,
the browser might split a
particularly difficult paragraph so
long and short lines interleave

(And, of course, that paragraph

is deliberately an example!)

So what do I do?

I break every

paragraph manually

Behold the majesty!

Depending on the line
length, the browser might split a
particularly difficult paragraph so
long and short lines interleave

Confession:

I similarly re-format

email paragraphs

Recent discovery

Many email clients re-format
plain-text email and re-wrap
each paragraph themselves!

In such cases, my carefully
hand-wrapped paragraphs
are for naught

Quiz

How many of you use email
clients that keep 80-column
plain-text emails pristine
in a fixed-width font?

Three words

OFF MY LAWN

anyway

Where did I get so

interested in typesetting?

Motivation

Knuth’s publisher

was cutting costs

So Volume 2
of Knuth's life’s work,
The Art of Computer Programming,
looked pretty ugly

“a new typesetting system
intended for the creation of
beautiful books” (1978)

He built the whole stack

all by himself

Font design Computer Modern
Font rasterization METAFONT
Plain-text markup TeX macros
Device-independent output DVI
Printing DVI device driver

Today we use the same stack

only with different tools

FontCreator, FontForge
ClearType, OpenType
Markdown, RST
PDF documents
OS printer drivers

Doing a full stack,
from designing a typeface
to inventing algorithms for
page layout, was quite a
challenge for Knuth

“then there was the letter S.

None of my mathematical
formulas would handle it, and
I spent several days without
sleep up at the lab”

Donald finally came home and

showed Jill the results

Her comment:

“…why don’t you make it S-shaped?”

He immersed himself deeply

in the history of typography

\ddangerexercise Since \TeX\ reads an entire
paragraph before it makes any decisions about line
breaks, the computer's memory capacity might
^^{capacity exceeded} be exceeded if you are
typesetting the works of some ^^{Joyce, James}
^{philosopher} or modernistic novelist who writes
200-line paragraphs. Suggest a way to cope with such
authors.
\answer Assuming that the author is deceased and/or
set in his or her ways, the remedy is to insert
`|{\parfillskip=0pt\par\parskip=0pt\noindent}|'
in random places, after each 50 lines or so of]
text. \ (Every space between words is usually a
feasible breakpoint, when you get sufficiently
far from the beginning of a paragraph.)

A tempting definition—

“TeX”

A computational engine
for converting backslashes
into beautiful documents

<Personal Aside>

I need to design a book,
but can no longer bear to
make myself use TeX

So I have started a new project!

python-bookbinding

It turns text into paragraphs
then paragraphs into pages, then
draws them in a real PDF using the
popular reportlab library

Python does have a built-in
textwrap module for splitting
paragraphs into lines, but its
algorithm is too simplistic
for professional quality

So bookbinding uses the
same high-powered typesetting
algorithms originally developed
by Knuth for TeX!

(Thanks, Andrew Kutchling,

for texlib!)

Let me know if you
are interested in taking
a look at it during the sprits

</Personal Aside>

So

Typesetting. Knuth.

With all those backslashes in TeX,
you might not think that Knuth
would have advice for writing
beautiful Python code

But:

Python code is based

on the syntax of math

and

Knuth became a world expert
on how whitespace should be
used when laying out math!

$$ 1 + \left( 1 \over 1 - x^2 \right)^3 $$

Whitespace. Expressions. Beauty.

Knuth

Whitespace. Expressions. Beauty.

Python

Yes, that brings us

again to considering:

PEP-8

You can think of PEP-8
as a set of compositor’s rules
for typesetting Python code
on your screen

Example

PEP-8 specifies the

basic shape of a “page” of code

“Limit all lines to a

maximum of 79 characters.”

This is an exact analogue to
the standard advice of graphic
designers about paragraph width:

45–75 characters

So how do you handle

the line-length restriction?

When you reach the right edge,
you might be tempted to wrap
a Python statement across
several lines of code

But what if you introduced a new name instead?

canvas.drawString(x, y,
    'Please press {}'.format(key))

↓

message = 'Please press {}'.format(key)
canvas.drawString(x, y, message)

Naming intermediate values

message = 'Please press {}'.format(key)
canvas.drawString(x, y, message)

Removes ugly hanging indent
Provides extra documentation

This is actually an
idea I picked up from those
Extreme Programming (XP) guys

XP people tended to use variable

names to Destroy All Comments

widget.reset(True)  # forces re-draw

↓

yes_force_redraw = True
widget.reset(yes_force_redraw)

XP people also point out that
big “section title” comments can
often be replaced with a function

...

# Open the barn

barn = models.Barn.get()
barn.unlock()
barn.open()

# Saddle the horse

...

    ...
    open_barn()
    saddle_horse()
    ...

def open_barn():
    barn = models.Barn.get()
    barn.unlock()
    barn.open()

The XP movement took it too far
but I really love using more names
that usefully replace comments or
let me avoid really long lines

# React if window too tall

if win.x1 - win.x0 > vp.h:
    ...

↓

too_tall = (win.x1 - win.x0) > vp.h
if too_tall:
    ...

Another traditional

typesetter goal:

The page should be an attractive
block of text without ugly rivers
of whitespace spilling down it

Attention to space
can also help the
look of our code

# Yes:
x = 1
y = 2
long_variable = 3

# No:
x             = 1
y             = 2
long_variable = 3

For example, extra whitespace
to align variable values
is forbidden by PEP=8

Another layout idea
that I use comes from Linux
inventor Linus Torvals

Torvalds wrote the masterful
“Linux kernal coding style”
for the C language

“Now, some people will
claim that having 8-character
indentations makes the code move
too far to the right and makes it
hard to read on a 80-character
terminal screen.”

“The answer to that is that
if you need more than 3 levels
of indentation, you’re screwed
anyway, and should fix
your program.”

— Linus Torvalds

I actually agree with Linus here

With each year that I keep
programming, I find more value
in code that stays very close
to the screen’s left margin

My indentation settings

Python: 4 spaces
JavaScript: 4 spaces
Others: Emacs default
HTML: 2 spaces

That last because web pages just

tend to be deeper than code!

Indentation getting too deep?

Here are four tricks I use!

#1 Use continue

for item in sequence:
    if is_valid(item):
        if not is_inconsequential(item):
            item.do_something()

for item in sequence:
    if not is_valid(item):
        continue
    if is_inconsequential(item):
        continue
    item.do_something()

#2 Factor out a new method

def mymethod(self):
   for item in sequence:
       if item.is_good():
           for widget in item:
               ...

def mymethod(self):
    for item in sequence:
        if item.is_good():
            for widget in item:
                self.finalize(widget)

def finalize(self, widget): ...

But, if self is not involved,

why make the routine a method?

Look again:

def mymethod(self):
   for item in sequence:
        if item.is_good():
            self.finalize_widgets(item)

def finalize_widgets(self, item):
    for widget in item:
        widget.close()

Since the routine does not even
use self you can pull it
out as a plain function

#3 Split out a function

    def mymethod(self):
        for item in self.sequence:
            if item.is_good():
                _finalize_widgets(item)

def _finalize_widgets(item):
    for widget in item:
        widget.close()

This, by the way, is a
significant way that Python
has been training its community

Django made mistakes,
but is far more Pythonic
than many competitors!

It recognizes that a web view

could just be a plain function!

(Flask, Bottle followed later)

#4 Factor out an iterator

for item in sequence:
    for widget in item:
        for bitmap in widget:
            for pixel in bitmap:
                pixel.align()
                pixel.darken()
                pixel.draw()

def widget_pixels(sequence):
    for item in sequence:
        for widget in item:
            for bitmap in widget:
                for pixel in bitmap:
                    yield pixel

for pixel in widget_pixels(sequence):
    pixel.align()
    pixel.darken()
    pixel.draw()

Factoring out iterators (#4)
to keep code shallow is
a Python superpower

Another source

of ugly whitespace:

Large function calls

Unfortunately the following

is a PEP-8 recommendation:

foo = long_function_name(var_one, var_two,
                         var_three, var_four)

Which brings us to the

5 Stages of Function Call Grief

Stage 1: brevity

asymtotic_reduction(arg1, arg2)

Stage 2: >80 columns

asymtotic_reduction(arg1, arg2,
                    arg3, arg4)

Stage 3: leftward collapse

asymtotic_reduction(arg1, arg2,
                    arg3, arg4,
                    arg5)

↓

asymtotic_reduction(arg1, arg2,
    arg3, arg4, arg5)

Stage 4: argument ballooning

asymtotic_reduction(arg1, arg2,
    die_on_error=arg3, heigth=arg4,
    width=arg5 / 2.0 + COLUMN_WIDTH)

Stage 5: an argument-per-line

asymtotic_reduction(
    x=arg1,
    y=arg2,
    die_on_error=arg3,
    height=arg4,
    width=arg5 / 2.0 + COLUMN_WIDTH,
    )

Argument-per-line is AWESOME

Every argument looks the same

Orthogonal in version-control

asymtotic_reduction(
    die_on_error=arg3,
    height=arg4,
    width=arg5 / 2.0 + COLUMN_WIDTH,
    )

Why would adjancent lines
not be treated separately by
your version-control?

The Problem: when adding or
changing Line n requires
another line (n-1)
to be modified

Example

Most langauges today use

a statement terminator

Pascal decided to use

a statement separator

Pascal statements are “highly coupled”

    x := sin(a);
    y := cos(a)
End;
        ↓
    x := sin(a);
    y := cos(a);  # CHANGED
    z := tan(a)   # NEW
End;

This, of course, makes your
version control system (git, hg)
flag two lines as changed!

C langauge

int biglist[] = {
    112,
    223
};

        ↓

int biglist[] = {
    112,
    223,    # CHANGE
    334     # NEW
};

When you design a language,
every construct that can span
lines should allow utter symmetry
between the first, middle, and
last lines in the construct!

Python always gets this right

Because Python is awesome

big_tuple = (
    12,
    23,
    )
big_list = [
    34,
    45,
    ]
big_dict = {
    'one': 1,
    'two': 2,
    }

So option #5 argument-per-line makes VC happy:

asymtotic_reduction(
    x=arg1,
    y=arg2,
    die_on_error=arg3,
    height=arg4,
    width=arg5 / 2.0 + COLUMN_WIDTH,
    )

I do sometimes make
exceptions if parameters can
be grouped logically

canvas.drawString(x + margin, y - line_height,
                  'The Naming of Cats')

But, many experienced Python
programmers immediately snap
into arg-per-line mode

The Python community
keeps developing new practices;
PEP-8 was not the end!

Should PEP-8 continue evolving?

Probably not

PEP-8 should remain an

essential common denominator

It is hard enough to get some

projects to adopt PEP-8 already!

But we should find new ways
to communicate these ideas when
we run across the fact that several
of us have the same coding habit

So

Lists separated by

commas can be pretty

But

What about terms separated

by a series of operators?

+ - * /

Here, PEP-8 is actually harmful

PEP-8

“The preferred place to break
around a binary operator is after
the operator, not before it…”

if (width == 0 and height == 0 and
    color == 'red' and emphasis == 'strong' or
    highlight > 100):

How do I know that

this is bad advice?

KNUTH

Knuth = typesetting + math

It turns out that Knuth
has written hundreds of pages
about formatting expressions

So

What is his advice about

breaking them into lines?

“It’s quite an art to decide
how to break long displayed
formulas into several lines…”

“…it is often desirable to
emphasize some of the symmetry
or other structure that
underlies a formula…”

Laying down the law

“displayed formulas
always break before binary
operations and relations.”

PEP-8: bad

adjusted_income = (gross_wages +
    taxable_interest +
    (dividends - qualified_dividends) -
    ira_deduction -
    student_loan_interest)

adjusted_income = (gross_wages +
    taxable_interest +
    (dividends - qualified_dividends) -
    ira_deduction -
    student_loan_interest)

Eye bounce back and forth
Operators difficult to find

Knuth, instead of PEP-8

adjusted_income = (gross_wages
    + taxable_interest
    + (dividends - qualified_dividends)
    - ira_deduction
    - student_loan_interest)

Much easier to read
Symmetry between terms
Subtracted terms look negative

So

There are long traditions in math
that can help us improve how
we write our Python code

What about method chains?

With ORMs everywhere,
the question of chained
methods keeps coming up

I never use backslash
continuation, so I need another
way to do long chains!

# UGH BAD HURTS MY EYES

query = Person.filter(last_name='Smith') \
    .order_by('social_security_number') \
    .select_related('spouse')

Option #1

Close each method on next line

query = Person.filter(last_name='Smith'
    ).order_by('social_security_number'
    ).select_related('spouse')

Option #2

Use outer parens, period ends line

query = (Person.
    filter(last_name='Smith').
    order_by('social_security_number').
    select_related('spouse')
    )

Option #3

Use outer parens, period begins line

query = (Person
    .filter(last_name='Smith')
    .order_by('social_security_number')
    .select_related('spouse')
    )

This option #3 is my favorite

query = (Person.
    .filter(last_name='Smith')
    .order_by('social_security_number')
    .select_related('spouse')
    )

VC will be happy that adding a
4th method call does not require
the previous line to be adjusted!

But, method chains still
seem to be an emerging Python
practice; I sometimes use
intermediate variables

q = Person.filter(last_name='Smith')
q = q.order_by('social_security_number')
q = q.select_related('spouse')

anyway

So

To be happy like me,

make code pretty

When at work, I avoid
tweaking other people’s code
willy-nilly if I visit a module
for something specific

But if I touch a line of code
in the course of my duties, I am
always trying to find the next
tweak to make that section of
code really beautiful

Your Homework

Re-read PEP-8
“Linux kernel coding style”
Tell me refactoring stories
Ask for drive-by code review

Thank you!