A Python Æsthetic
Brandon Rhodes
PyCon Canada 2012
Why Do I Write Python?
Beautiful to think about
Beautiful to look at
Language → beautiful ideas
Community → beautiful code
It does not matter how
beautiful the ideas are behind
a programming language
its community insists
on writing ugly code
It is the traditions and
practices of our community
that make code look like Python
Why is python beautiful?
Because of us
So this talk weaves
together two topics:
Language design
Coding practices
And this talk generally
frames these topics with:
Math → language design
Typesetting → coding practices
Why math and typesetting?
Because those are
my particular background
You might love Python for
quite different reasons
But to tell my own story:
What do you think about,
what occupies mental space,
as you are typing code?
My obligation stack
The code’s visual layout
The Stack
Consider the moment when
I start typing some code
(Obligation "stack" is empty)
And another
x = canvas . drawString ( margin + (
^
Owe a close paren
Owe another close paren
I open a bracket
x = canvas . drawString ( margin + ( indent [
^
Owe a close paren
Owe another close paren
Owe a close bracket
And close it again
x = canvas . drawString ( margin + ( indent [ LEFT ] / 2
^
Owe a close paren
Owe another close paren
My solution?
Keep the obligation stack short!
You can keep closing
the brackets nearby:
x = canvas . drawString ( margin + ( indent [ ]))
^
Fight the Stack!
Add that import now
Stub out functions now
Keep short-term TODOs
Runs now with print
Do not try to remember anything!
So
Short stacks
make focus possible
Once we are focused,
what kind of code should
we be trying to write?
Python and Language Design
What other languages have
I used over the years?
My story, in brief:
BASIC → C → awk → Python
(Plus: 6809 assembly and machine code,
nroff, TeX, Basic09, sed, LISP, Smalltalk,
C++, C#, Java, Scheme, JavaScript)
Honorable Mention
Modula-3 → Java, Python
Interfaces
Exceptions
Objects
Import
Python is not radical
It looks very much like
several earlier languages
C++
// sgp4fix for afspc written intrinsic functions
// nodep used without a trigonometric function ahead
if (( nodep < 0.0 ) && ( opsmode == 'a' ))
nodep = nodep + twopi ;
xls = mp + argpp + cosip * nodep ;
dls = pl + pgh - pinc * nodep * sinip ;
xls = xls + dls ;
xnoh = nodep ;
nodep = atan2 ( alfdp , betdp );
Python
# sgp4fix for afspc written intrinsic functions
# nodep used without a trigonometric function ahead
if nodep < 0.0 and opsmode == 'a' :
nodep = nodep + twopi ;
xls = mp + argpp + cosip * nodep ;
dls = pl + pgh - pinc * nodep * sinip ;
xls = xls + dls ;
xnoh = nodep ;
nodep = atan2 ( alfdp , betdp );
<Personal Aside>
Q:
Why translate astronomy code to
Python instead of wrapping
the existing C++ library?
xnodce = fmod ( 4.5236020 - 9.2422029e-4 * day , twopi );
stem = sin ( xnodce );
ctem = cos ( xnodce );
zcosil = 0.91375164 - 0.03568096 * ctem ;
zsinil = sqrt ( 1.0 - zcosil * zcosil );
A:
Windows installation: hard
Visual Studio Express: no
Amateur astronomers: not coders
Thus
extension modules → pain
So I have undertaken
a really Big Project
Rewrite PyEphem in pure Python!
✓ Planets — jplephem
✓ Satellites — sgp4
Coordinates — ephem
Each piece is as independent package
✓ Planets — jplephem
✓ Satellites — sgp4
Coordinates — ephem
These pure-Python modules...
Work in both Python 2 and 3
Require no C extensions
Use NumPy if available
And — what about performance?
Quick measurement suggests:
PyPy > Python + C
So, that is why I have
been translating C++ code
to Python and thinking
about languages
Let me know if you
are interested in taking
a look at it during the sprits
</Personal Aside>
So, why do C++ and Python
(and C, Pascal, Java, Algol, …)
look so much alike?
xls = xls + dls ;
xnoh = nodep ;
nodep = atan2 ( alfdp , betdp );
Because of math
Let me start with a complaint
Consider the keystrokes
necessary to step through
the following investigation
print foo
print dir ( foo )
print foo . bar
print len ( foo . bar )
print foo . bar [ 1 ]
print dir ( foo . bar [ 1 ])
print foo . bar [ 1 ] . baz
What if a language’s operations
let you keep typing instead
of stopping to add parens?
3 popular approaches
jQuery
Unix shell
LISP
jQuery chains everything
through endless method calls
jQuery uses chaining to permit
“just-keep-typing” programming
$ ( 'div' ) . parent () . find ( 'h1' )
. attr ( 'data-level' , '1' )
. css ( 'display' , 'block' )
If Python were more like
jQuery then debugging
might look more like:
print foo
print foo . dir ()
print foo . bar
print foo . bar . len ()
print foo . bar [ 1 ]
print foo . bar [ 1 ]
print foo . bar [ 1 ] . baz
Downside: cannot symmetrically
express binary operation;
to add two items:
foo . add ( bar )
bar . add ( foo )
Unix Pipline
Inital data stream gets modified
by each of a series of filters:
cat log |
grep 'Connection error' |
awk '{print $2}' |
sort |
uniq - c
Interestingly, McIlroy’s original
idea for Unix pipelines
looked more like this:
inputfile sort paginate printerfile
<Historial Aside>
Donald Knuth once showed
his“literate programming”
approach in a beautifully-
documented program
Result: 10+ pages of Pascal
In his review of the paper,
McIlroy not only pointed out
several bugs , but offered a
bug-free alternative—
tr - cs A - Za - z ' \n ' |
tr A - Z a - z |
sort |
uniq - c |
sort - rn |
sed $ { 1 } q
No one who read his
review ever seems to have
forgotten the lesson:
Simple filters that can
be arbitrarily chained are
more easily re-used, and more
robust, than almost any
other kind of code
</Historial Aside>
So anyway: pipelines
are an alternative when
you want more simplicity
than arbitrary math expressions
LISP
Another kind of consistency
Adding 3 numbers in LISP
looks exactly like calling a
function f with 3 parameters:
No ambiguity or
order-of-operations:
each nested expression
gets its own parens
( * ( + a b ) m )
( output ( concat page1 page2 ) printer )
What order?
LISP code always happens
from inside to outside
( list ( nth 0 ad - return - value ) ;; original word
( nth 1 ad - return - value ) ;; offset in file
( remove - if 'contains-space-p
( nth 2 ad - return - value ))
( remove - if 'contains-space-p
( nth 3 ad - return - value ))
))))
LISP refuses to special-case
the traditional math operators
but insists on
one syntax to rule them all
So that's
jQuery, pipelines, LISP
But, Python chooses
to follow math
But what if we
want the logarithm?
We now have to add
symbols to both sides
Lesson
Math symbols fly everywhere
Prefix operators
Infix operators
Functions
Python has four main ways
to expand an expression
Prefix operator -z
Binary operator x + y
Wrap in callable f(x, y)
Attribute/method f.bar
Order of Operations
In both of these expressions,
multiplication will happen first
PEP-8: you can lay out your code
to make the order look obvious
Because they bind so tightly ,
PEP-8 requires there be no spaces
between a name and () or []
Another math advantage
Context Freedom
This is a huge benefit derived
from the syntax of mathematics
Python’s syntax is a
context-free grammar
Means that a given
construct can have
only one meaning
( a - b ) # Python - one possible meaning
( a - b ) # Ruby - two possible meanings
Python’s context-free grammar
not only puts it in the mainstream
of decades of language design—
—but makes it easier to read
snippets of code without having
to examine the entire file
My point is not that Ruby people
ever actually write something like
(a -b) in a context in which
they could confuse theselves
My point is that Python has
a logical inner consistency
And if you happen to be
math-ish and sensitive, the
consistency will make you happy
a = ( 3 * 4 ) + 5
# is ALWAYS equivalent to
t = ( 3 * 4 )
a = t + 5
For many computer languages,
this is not true of method calls!
a = spreadsheet . compute ( 'D4' )
A method call looks like
an attribute lookup that returns
a callable that we invoke
c = spreadsheet . compute # attr lookup
c ( 'D4' ) # invoke callable
C++ and JavaScript
special-case method calls as a
ternary operator that is not the
same as lookup + invocation!
# Python is so awesome
draw = canvas . drawString
draw ( 60 , 120 , 'A rose is a rose.' )
Like every other expression,
can save not-yet-called method
or passed it as a callback
There is one final
benefit that we should mention
of Python’s having a generally
math-friendly syntax
Just like a math paper defines
the symbols and operators it uses,
Python makes you import
the things you need
Explicit import means
you should never have to search
your entire codebase to find
a stray definition
Please Remember
import piano
piano . Bench () # yes!
The simplest import syntax
expects you to qualify each class,
function with the module name
import piano
piano . PianoBench () # no!
So you do not have to
qualify names with extra works
to make them “extra unique”
PEP-8
“The X11 library uses
a leading X for all its public
functions. In Python, this style is
generally deemed unnecessary ”
For example, logging
both gets this right:
class Logger
class Handler
class Filter
The modern json Standard
Library module is an example
of good practice
import json
json . loads ( ... )
json . dumps ( ... )
# not json_load() or jdump()
So,
if you keep names short,
then you leave the caller in charge
of whether to qualify them or not
While we are on
the subject of imports:
Import Loops
I used to hate that Python
raises errors on import loops:
Module a needs something
From module b which tries
To import something from a
But I now suspect that an
import loop often indicates
a failure to carefully architect
my code into proper layers
If you use dependency injection
to keep higher-level code in charge
of lower-level modules, then
import loops do not occur
So those are some of the
benefits that Python inherits
from the traditions and the
notation of mathematics
Typesetting!
Confession:
I re-format paragraphs by
hand to make them look good
Depending on the line length,
the browser might split a
particularly difficult paragraph so
long and short lines interleave
(And, of course, that paragraph
is deliberately an example!)
I break every
paragraph manually
Behold the majesty!
Depending on the line
length, the browser might split a
particularly difficult paragraph so
long and short lines interleave
Confession:
I similarly re-format
email paragraphs
Recent discovery
Many email clients re-format
plain-text email and re-wrap
each paragraph themselves!
In such cases, my carefully
hand-wrapped paragraphs
are for naught
Quiz
How many of you use email
clients that keep 80-column
plain-text emails pristine
in a fixed-width font?
Where did I get so
interested in typesetting?
Motivation
Knuth’s publisher
was cutting costs
So Volume 2
of Knuth's life’s work,
The Art of Computer Programming,
looked pretty ugly
“a new typesetting system
intended for the creation of
beautiful books” (1978)
He built the whole stack
all by himself
Font design Computer Modern
Font rasterization METAFONT
Plain-text markup TeX macros
Device-independent output DVI
Printing DVI device driver
Today we use the same stack
only with different tools
FontCreator, FontForge
ClearType, OpenType
Markdown, RST
PDF documents
OS printer drivers
Doing a full stack,
from designing a typeface
to inventing algorithms for
page layout , was quite a
challenge for Knuth
“then there was the letter S.
None of my mathematical
formulas would handle it, and
I spent several days without
sleep up at the lab”
Donald finally came home and
showed Jill the results
“…why don’t you make it S-shaped?”
Computer_modern_sample.svg
He immersed himself deeply
in the history of typography
\ddangerexercise Since \TeX \ reads an entire
paragraph before it makes any decisions about line
breaks , the computer 's memory capacity might
^^ { capacity exceeded } be exceeded if you are
typesetting the works of some ^^ { Joyce , James }
^ { philosopher } or modernistic novelist who writes
200 - line paragraphs . Suggest a way to cope with such
authors .
\answer Assuming that the author is deceased and / or
set in his or her ways , the remedy is to insert
` | { \parfillskip = 0 pt \par \parskip = 0 pt \noindent } | '
in random places , after each 50 lines or so of ]
text . \ ( Every space between words is usually a
feasible breakpoint , when you get sufficiently
far from the beginning of a paragraph. )
“TeX”
A computational engine
for converting backslashes
into beautiful documents
<Personal Aside>
I need to design a book,
but can no longer bear to
make myself use TeX
So I have started a new project!
python-bookbinding
It turns text into paragraphs
then paragraphs into pages , then
draws them in a real PDF using the
popular reportlab library
Python does have a built-in
textwrap module for splitting
paragraphs into lines, but its
algorithm is too simplistic
for professional quality
So bookbinding uses the
same high-powered typesetting
algorithms originally developed
by Knuth for TeX!
(Thanks, Andrew Kutchling,
for texlib !)
Let me know if you
are interested in taking
a look at it during the sprits
</Personal Aside>
With all those backslashes in TeX,
you might not think that Knuth
would have advice for writing
beautiful Python code
But:
Python code is based
on the syntax of math
Knuth became a world expert
on how whitespace should be
used when laying out math!
$$ 1 + \left ( 1 \over 1 - x ^ 2 \right ) ^ 3 $$
Whitespace. Expressions. Beauty.
Whitespace. Expressions. Beauty.
Yes, that brings us
again to considering:
PEP-8
You can think of PEP-8
as a set of compositor’s rules
for typesetting Python code
on your screen
Example
PEP-8 specifies the
basic shape of a “page” of code
“Limit all lines to a
maximum of 79 characters.”
This is an exact analogue to
the standard advice of graphic
designers about paragraph width:
45–75 characters
So how do you handle
the line-length restriction?
When you reach the right edge,
you might be tempted to wrap
a Python statement across
several lines of code
But what if you introduced a new name instead?
canvas . drawString ( x , y ,
'Please press {}' . format ( key ))
message = 'Please press {}' . format ( key )
canvas . drawString ( x , y , message )
This is actually an
idea I picked up from those
Extreme Programming (XP) guys
XP people tended to use variable
names to Destroy All Comments
widget . reset ( True ) # forces re-draw
yes_force_redraw = True
widget . reset ( yes_force_redraw )
XP people also point out that
big “section title” comments can
often be replaced with a function
...
# Open the barn
barn = models . Barn . get ()
barn . unlock ()
barn . open ()
# Saddle the horse
...
...
open_barn ()
saddle_horse ()
...
def open_barn ():
barn = models . Barn . get ()
barn . unlock ()
barn . open ()
The XP movement took it too far
but I really love using more names
that usefully replace comments or
let me avoid really long lines
# React if window too tall
if win . x1 - win . x0 > vp . h :
...
too_tall = ( win . x1 - win . x0 ) > vp . h
if too_tall :
...
Another traditional
typesetter goal:
The page should be an attractive
block of text without ugly rivers
of whitespace spilling down it
Attention to space
can also help the
look of our code
# Yes:
x = 1
y = 2
long_variable = 3
# No:
x = 1
y = 2
long_variable = 3
For example, extra whitespace
to align variable values
is forbidden by PEP=8
Another layout idea
that I use comes from Linux
inventor Linus Torvals
Torvalds wrote the masterful
“Linux kernal coding style”
for the C language
“Now, some people will
claim that having 8-character
indentations makes the code move
too far to the right and makes it
hard to read on a 80-character
terminal screen.”
“The answer to that is that
if you need more than 3 levels
of indentation, you’re screwed
anyway, and should fix
your program.”
I actually agree with Linus here
With each year that I keep
programming, I find more value
in code that stays very close
to the screen’s left margin
My indentation settings
Python: 4 spaces
JavaScript: 4 spaces
Others: Emacs default
HTML: 2 spaces
That last because web pages just
tend to be deeper than code!
Indentation getting too deep?
Here are four tricks I use!
#1 Use continue
for item in sequence :
if is_valid ( item ):
if not is_inconsequential ( item ):
item . do_something ()
for item in sequence :
if not is_valid ( item ):
continue
if is_inconsequential ( item ):
continue
item . do_something ()
#2 Factor out a new method
def mymethod ( self ):
for item in sequence :
if item . is_good ():
for widget in item :
...
def mymethod ( self ):
for item in sequence :
if item . is_good ():
for widget in item :
self . finalize ( widget )
def finalize ( self , widget ): ...
But, if self is not involved,
why make the routine a method?
Look again:
def mymethod ( self ):
for item in sequence :
if item . is_good ():
self . finalize_widgets ( item )
def finalize_widgets ( self , item ):
for widget in item :
widget . close ()
Since the routine does not even
use self you can pull it
out as a plain function
#3 Split out a function
def mymethod ( self ):
for item in self . sequence :
if item . is_good ():
_finalize_widgets ( item )
def _finalize_widgets ( item ):
for widget in item :
widget . close ()
This, by the way, is a
significant way that Python
has been training its community
Django made mistakes,
but is far more Pythonic
than many competitors!
It recognizes that a web view
could just be a plain function!
(Flask, Bottle followed later)
#4 Factor out an iterator
for item in sequence :
for widget in item :
for bitmap in widget :
for pixel in bitmap :
pixel . align ()
pixel . darken ()
pixel . draw ()
def widget_pixels ( sequence ):
for item in sequence :
for widget in item :
for bitmap in widget :
for pixel in bitmap :
yield pixel
for pixel in widget_pixels ( sequence ):
pixel . align ()
pixel . darken ()
pixel . draw ()
Factoring out iterators (#4)
to keep code shallow is
a Python superpower
Another source
of ugly whitespace:
Large function calls
Unfortunately the following
is a PEP-8 recommendation:
foo = long_function_name ( var_one , var_two ,
var_three , var_four )
Which brings us to the
5 Stages of Function Call Grief
Stage 1: brevity
asymtotic_reduction ( arg1 , arg2 )
Stage 2: >80 columns
asymtotic_reduction ( arg1 , arg2 ,
arg3 , arg4 )
Stage 3: leftward collapse
asymtotic_reduction ( arg1 , arg2 ,
arg3 , arg4 ,
arg5 )
asymtotic_reduction ( arg1 , arg2 ,
arg3 , arg4 , arg5 )
Stage 4: argument ballooning
asymtotic_reduction ( arg1 , arg2 ,
die_on_error = arg3 , heigth = arg4 ,
width = arg5 / 2.0 + COLUMN_WIDTH )
Stage 5: an argument-per-line
asymtotic_reduction (
x = arg1 ,
y = arg2 ,
die_on_error = arg3 ,
height = arg4 ,
width = arg5 / 2.0 + COLUMN_WIDTH ,
)
Argument-per-line is AWESOME
Every argument looks the same
Orthogonal in version-control
asymtotic_reduction (
die_on_error = arg3 ,
height = arg4 ,
width = arg5 / 2.0 + COLUMN_WIDTH ,
)
Why would adjancent lines
not be treated separately by
your version-control?
The Problem: when adding or
changing Line n requires
another line (n-1)
to be modified
Example
Most langauges today use
a statement terminator
Pascal decided to use
a statement separator
Pascal statements are “highly coupled”
x : = sin ( a );
y : = cos ( a )
End ;
↓
x : = sin ( a );
y : = cos ( a ); # CHANGED
z : = tan ( a ) # NEW
End ;
This, of course, makes your
version control system (git, hg)
flag two lines as changed!
C langauge
int biglist [] = {
112 ,
223
};
↓
int biglist [] = {
112 ,
223 , # CHANGE
334 # NEW
};
When you design a language,
every construct that can span
lines should allow utter symmetry
between the first, middle, and
last lines in the construct!
Python always gets this right
Because Python is awesome
big_tuple = (
12 ,
23 ,
)
big_list = [
34 ,
45 ,
]
big_dict = {
'one' : 1 ,
'two' : 2 ,
}
So option #5 argument-per-line makes VC happy:
asymtotic_reduction (
x = arg1 ,
y = arg2 ,
die_on_error = arg3 ,
height = arg4 ,
width = arg5 / 2.0 + COLUMN_WIDTH ,
)
I do sometimes make
exceptions if parameters can
be grouped logically
canvas . drawString ( x + margin , y - line_height ,
'The Naming of Cats' )
But, many experienced Python
programmers immediately snap
into arg-per-line mode
The Python community
keeps developing new practices;
PEP-8 was not the end!
Should PEP-8 continue evolving?
Probably not
PEP-8 should remain an
essential common denominator
It is hard enough to get some
projects to adopt PEP-8 already!
But we should find new ways
to communicate these ideas when
we run across the fact that several
of us have the same coding habit
So
Lists separated by
commas can be pretty
But
What about terms separated
by a series of operators?
+ - * /
Here, PEP-8 is actually harmful
PEP-8
“The preferred place to break
around a binary operator is after
the operator, not before it…”
if ( width == 0 and height == 0 and
color == 'red' and emphasis == 'strong' or
highlight > 100 ):
How do I know that
this is bad advice?
KNUTH
Knuth = typesetting + math
It turns out that Knuth
has written hundreds of pages
about formatting expressions
So
What is his advice about
breaking them into lines?
“It’s quite an art to decide
how to break long displayed
formulas into several lines…”
“…it is often desirable to
emphasize some of the symmetry
or other structure that
underlies a formula…”
Laying down the law
“displayed formulas
always break before binary
operations and relations.”
PEP-8: bad
adjusted_income = ( gross_wages +
taxable_interest +
( dividends - qualified_dividends ) -
ira_deduction -
student_loan_interest )
adjusted_income = ( gross_wages +
taxable_interest +
( dividends - qualified_dividends ) -
ira_deduction -
student_loan_interest )
Eye bounce back and forth
Operators difficult to find
Knuth, instead of PEP-8
adjusted_income = ( gross_wages
+ taxable_interest
+ ( dividends - qualified_dividends )
- ira_deduction
- student_loan_interest )
Much easier to read
Symmetry between terms
Subtracted terms look negative
So
There are long traditions in math
that can help us improve how
we write our Python code
What about method chains?
With ORMs everywhere,
the question of chained
methods keeps coming up
I never use backslash
continuation, so I need another
way to do long chains!
# UGH BAD HURTS MY EYES
query = Person . filter ( last_name = 'Smith' ) \
. order_by ( 'social_security_number' ) \
. select_related ( 'spouse' )
Option #1
Close each method on next line
query = Person . filter ( last_name = 'Smith'
) . order_by ( 'social_security_number'
) . select_related ( 'spouse' )
Option #2
Use outer parens, period ends line
query = ( Person .
filter ( last_name = 'Smith' ) .
order_by ( 'social_security_number' ) .
select_related ( 'spouse' )
)
Option #3
Use outer parens, period begins line
query = ( Person
. filter ( last_name = 'Smith' )
. order_by ( 'social_security_number' )
. select_related ( 'spouse' )
)
This option #3 is my favorite
query = ( Person .
. filter ( last_name = 'Smith' )
. order_by ( 'social_security_number' )
. select_related ( 'spouse' )
)
VC will be happy that adding a
4th method call does not require
the previous line to be adjusted!
But, method chains still
seem to be an emerging Python
practice; I sometimes use
intermediate variables
q = Person . filter ( last_name = 'Smith' )
q = q . order_by ( 'social_security_number' )
q = q . select_related ( 'spouse' )
anyway
So
To be happy like me,
make code pretty
When at work, I avoid
tweaking other people’s code
willy-nilly if I visit a module
for something specific
But if I touch a line of code
in the course of my duties, I am
always trying to find the next
tweak to make that section of
code really beautiful
Your Homework
Re-read PEP-8
“Linux kernel coding style”
Tell me refactoring stories
Ask for drive-by code review
Thank you!