Sine Qua Nons

Brandon Rhodes
PyOhio 2013

Tooling

virtualenv

New_Project/┌─────virtualenv──────────┐
             /bin: python2, pip      
             /include                
             /lib: Django 1.5, lxml  
            └─────────────────────────┘

Old_Project/┌─────virtualenv──────────────────┐
             /bin: python2, pip              
             /include                        
             /lib: Django 1.4, BeautifulSoup 
            └─────────────────────────────────┘
Keep a requirements.txt in
each project that records
its dependency versions
de421==2008
jplephem==1.1
numpy==1.7.1
sgp4==1.1
Resuscitating a project
can then be as easy as:
$ cd ~/project
$ virtualenv venv
$ venv/bin/pip install -r requirements.txt

SINE QUA NON

Project sandboxes

Q:

What features does a language
need to keep me productive?

Container types

list
[10, 20, 30]
dict
{'one': 1, 'two': 2}

Composable iteration

return sorted(set(
    student.surname for student in students
    if not student.graduated
    ))
How is a dict
really used in practice?
Most fundamental
dict maneuver:
The Join
 semester.txt         students.txt

  N        N           1        N
                          
Smith    CS101      Williams Sophmore
Johnson  EE201      Johnson  Junior
Williams PH131      Smith    Freshman
Smith    PH132      Jones    Sophmore
Jones    PH131      Brown    Freshman
Brown    EE201             
Williams CS101
      

what we are avoiding

for line in open('semester.txt'):
    name, course = line.split()
    for line in open('students.txt'):
        name2, year = line.split()
        if name == name2:
            break
    

O(nm)

n — lines in semester file
m — lines in student file

this will run faster

students = [line.split() for line
            in open('students.txt')]

for line in open('semester.txt'):
    name, course = line.split()
    for name2, year in students:
        if name == name2:
            break
    

but has the same complexity

O(nm)

To avoid this disastrous
performance, we can use
the mighty dictionary

building a dict

years = {}
for line in open('students.txt'):
    name, year = line.split()
    years[name] = year


         students.txt
              
       Williams Sophmore
       Johnson  Junior
       Smith    Freshman
              

Given our dictionary—

{'Williams': 'Sophmore',
 'Johnson': 'Junior',
 'Smith': 'Freshman',
           

—asking for a student takes constant time

y = years['Johnson']
print y
# => 'Junior'

using a dict to join

...
for line in open('semester.txt'):
    name, course = line.split()
    year = years[name]  # the join!
    ...

           semester.txt
                
          Smith    CS101
          Johnson  EE201
          Williams PH131
          Smith    PH132
                
O(n + m)
complexity
years = {}
for line in open('students.txt'):
    name, year = line.split()
    years[name] = year

...
for line in open('semester.txt'):
    name, course = line.split()
    year = years[name]  # the join!
    ...
This dict technique is also
used in the world’s most
powerful databases
PostgreSQL’s EXPLAIN output:
Hash Join
  -> Seq scan on semester
  -> Hash
       -> Seq Scan on student

The whole solution

from collections import Counter, defaultdict

years = {}
for line in open('students.txt'):
    name, year = line.split()
    years[name] = year

enrollment = defaultdict(Counter)
for line in open('semester.txt'):
    name, course = line.split()
    year = years[name]  # the join!
    enrollment[course][year] += 1

The result

{'EE220': {'Sophmore': 89, 'Junior': 12},
 'CS125': {'Freshman': 198, 'Sophmore': 22},
 'PH441': {'Junior': 8, 'Senior': 22},
                     

Caveat

Powerful data types
do add complexity to a
programming language

Rival approaches

points = [(3,4), (5,12), (15,8)]

for p in points:
    print sqrt(p[0]**2 + p[1]**2)

# vs

class Point(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y

points = [Point(3,4), Point(5,12), Point(15,8)]

for p in points:
    print sqrt(p.x**2 + p.y**2)

Two variants

Constants can make indexing
more explicit, less error-prone
X = 0
Y = 1
points = [(3,4), (5,12), (15,8)]

for p in points:
    print sqrt(p[X]**2 + p[Y]**2)
Named tuples have real attributes,
which can eliminate indexing entirely
from collections import namedtuple
Point = namedtuple('Point', 'x y')

points = [Point(0,0), Point(3,0), Point(0,4)]

for p in points:
    print sqrt(p.x**2 + p.y**2)
But the choice must still be faced:
Use bare generic data types?
Or, define real classes?
type(points)  # => list? or PointSequence?
type(point)   # => tuple? or Point?
Plain tuples and lists and dicts
are easy to get started with
and require no boilerplate

but

they carry no record of your
intent so the names you give them
become your only documentation!
Error messages from generic data
structures tend to be less helpful
p = (4, 5)
d = sqrt(p[0]**2 + p[1]**2 + p[2]**2)
# => IndexError: list index out of range

p = Point(4, 5)
d = sqrt(p.x**2 + p.y**2 + p.z**2)
# => AttributeError: no attribute 'z'
Java programmers almost
always seem to write classes:
there is one way to do it
Python programmers have to
watch for the cusp where a problem
becomes too complex for generic types
Whether you use a plain list
or hide it behind a PointSeq class,
the powerful generic Python data types
will usually be behind everything you do!
tuple  list  dict  set

SINE QUA NON

Powerful built-in data types

point = (3, 4)
point = Point(3, 4)
Both of our examples
assumed that x and y
always belong together

NumPy

"""Alterative to:

points = [(3,4), (5,12), (15,8)]

"""
from numpy import array, sqrt

x = array([3, 5, 15])
y = array([4, 12, 8])

print sqrt(x*x + y*y)
#  => [  5.  13.  17.]

NumPy

print sqrt(x*x + y*y)
#  => [  5.  13.  17.]
Lets you build huge
result arrays with expressions
that look both simple and scalar
notebook.png

SINE QUA NON

Vector math

What about when
things go wrong?
or
“Exceptions in a Flask app”
exit()
def f():    def g():    def h():
  ...         ...         sys.exit()
  g()         h()
  ...         ...
                
The code that would have run
had h() returned never runs
def f():    def g():    def h():
  ...         ...         raise ValueError()
  g()         h()
  ...         ...
                
An exception also exits its function
and terminates all enclosing calls

But

You can catch an exception
and stop its propagation
def f():    def g():    def h():
  ...         ...         raise ValueError()
  ...         h()
  try:        ... # skipped
    g()
  except ValueError:
    log('error')
  ... # not skipped!

The problem:

How can a Flask app
return custom JSON errors?
not
IndexError: index out of range
but
{'error': 'Enter a password'}
@app.route('/<user_name>/<project_name>/')
@json_response
def project_page(user_name, project_name):

    user = db_load(User, user_name)
    if user is None:
        return {'error': 'does_not_exist'}, 404

    project = db_load(Project, user, project_name)
    if project is None:
        return {'error': 'does_not_exist'}, 404

    ...

1. Define custom exceptions

class AppError(Exception): ...
class NotFoundError(AppError): ...
class PermissionError(AppError): ...
class DatabaseError(AppError): ...
class AuthError(AppError): ...

2. Raise them in business logic

def db_load(cls, key):
    ...
    if obj is None:
        raise NotFoundError()
    ...

3. Build JSON for each exception

def exception_to_response(e):
    if isinstance(e, NotFoundError):
        name, code = 'not_found', 404
    elif isinstance(e, PermissionError):
        name, code = 'forbidden', 403
    elif isinstance(e, DatabaseError):
        name, code = 'unavailable', 503
    elif isinstance(e, AuthError):
        name, code = 'bad_auth', 401
    else:
        name, code = 'internal', 500

    return {'error': name}, code
So, assume that we are now
raising all the right exceptions
How do we catch them and
call exception_to_response()
in every one of our views?
I suggested decorating
every single view
from functools import wraps

def catch_errors(view):
    @wraps(view)
    def wrapper(*args, **kw):
        try:
            return view(*args, **kw)
        except AppError as e:
            return exception_to_response(e)
    return wrapper

@app.route('...')
@json_response
@catch_errors
def view(...):
    ...
Thus error-checking code disappears
entirely from the view functions!
@app.route('...')
@json_response
@catch_errors
def view(...):
    user = get_row(user, name)
    # => NotFoundError, DatabaseError
    data = user.cloud_open(filename)
    # => PermissionError
    return {'name': name, 'data': data}, 200
I illustrated this
approach for the student
Then, I suddenly sensed that I should
go read more Flask documentation
armin.jpg

flask.pocoo.org/docs/patterns/apierrors/

“to implement RESTful APIs…
implement your own exception type
and install an error handler for
it that produces the errors”
@app.errorhandler(DatabaseError)
@json_response
def handle_database_error(error):
    return {'error': 'unavailable'}, 503
That big elif block?
Goes away
That repeated @catch_errors?
Goes away
Because Flask handles this case already!

SINE QUA NON

Pervasive exceptions
(a design pattern)

caveat

We have made it easier
to write code that wants
to bail out on an error
user = get_row(user, name)
data = user.cloud_open(filename)
return {'name': name, 'data': data}, 200
But what about code that
wants to keep going?
try:
    user = get_row(user, name)
except NotFoundError:
    user = None
try:
    data = user.cloud_open(filename)
except PermissionError:
    data = ''
...

There's a pattern for that

getattr(obj, 'a')        # => AttributeError
getattr(obj, 'a', None)  # => None

mydict['a']              # => KeyError
mydict.get('a', None)    # => None

So your own calls can:

user = get_row(user, name, default=None)
The Story of loading SSL
certificates from strings

The problem

conn = HTTPSConnection(
    host='www.google.com',
    key_file='/u/brandon/client.key',
    cert_file='/u/brandon/client.crt',
    )
Requires client certificates
to be written to disk
with tempfile.NamedTemporaryFile() as f:
    f.write(cert_pem + key_pem)
    f.flush()
    conn = HTTPSConnection(
        host='www.google.com',
        key_file=f.name,
        cert_file=f.name,
        )
Awkward, expensive
My customer demanded
an alternative
So, of course, I Googled
“python ssl memory”

SINE QUA NON

Public Bug Tracker

http://bugs.python.org/issue16487

“Allow ssl certificates to be specified
from memory rather than files.”

\o/

“Python 3.4”

@#$%&!

Can I backport the feature?
  1. How does HTTPSConnection use ssl?
  2. How does ssl relate to C code?
  3. How can C code use in-RAM certs?

httplib.py

class HTTPSConnection(HTTPConnection):
    ...
    def __init__(self, ...
          key_file=None, cert_file=None...):
        ...
        self.key_file = key_file
        self.cert_file = cert_file

    def connect(self):
        ...
        self.sock = ssl.wrap_socket(
          sock, self.key_file, self.cert_file)

ssl.py

def wrap_socket(sock, keyfile=None,
                certfile=None, ...
                do_handshake_on_connect=True,
                ...):

    return SSLSocket(sock, keyfile=keyfile,
                     certfile=certfile, ...
                     do_handshake_on_connect=...)

ssl.py

class SSLSocket(socket):
    def __init__(
          self, sock, keyfile=None, certfile=None,
          ... do_handshake_on_connect=True, ...):
        ...
        self._sslobj = _ssl.sslwrap(self._sock,
          server_side, keyfile, certfile, ...)
        if do_handshake_on_connect:
            self.do_handshake()
        ...

_ssl.c

static PyObject *
PySSL_sslwrap(PyObject *self, PyObject *args)
{
   ...
   return (PyObject *) newPySSLObject(
       Sock, key_file, cert_file, ...);
}

static PyMethodDef PySSL_methods[] = {
    {"sslwrap", PySSL_sslwrap,
     METH_VARARGS, ssl_doc},
    ...
}

_ssl.c

static PySSLObject *newPySSLObject
    (... char *key_file, char *cert_file, ...)
{
    ...
    if (key_file) {
        ...
        ret = SSL_CTX_use_PrivateKey_file(
          self->ctx, key_file, SSL_FILETYPE_PEM);
        ...
        ret = SSL_CTX_use_certificate_chain_file(
          self->ctx, cert_file);
        ...
    }
    return self;

An aside

Bad name: key_file
Better: key_path
So, what does the
new patch call?
in = BIO_new_mem_buf(data, len);
pkey = PEM_read_bio_PrivateKey(in, ...);
ret = SSL_CTX_use_PrivateKey(...);

in = BIO_new_mem_buf(data, len);
x = PEM_read_bio_X509(in, ...);
ret = SSL_CTX_use_certificate(ctx, x);

The way was open!

SINE QUA NON

Complete Object Introspection

Polite convention

object._name
Leading underscore
warns other developers:
“this particular attribute is an
implementation detail

Freedom!

The API caller in Python decides
whether to assume the cost
of using an internal detail

Costs:

  1. Trick might not work
  2. There is usually no documentation
  3. Might break with the next version

Benefit:

Code re-use!

Alternatives:
rewrite
fork

So, how can I talk to C?

module.py──────────────────┐
           Python          
                           
└─ctypes──cffi──────────────┘
                      
                module.so──┐
               Python      raw C
                 Extension   Cython
                | Module      SWIG
                └───────────┘
                      
library.so─────────────────┐
         C library         
└───────────────────────────┘

SINE QUA NON

Extensibility

So, how did it work?

class PySSLObject(ctypes.Structure):
    _fields_ = [
        ('ob_refcnt', ctypes.c_int),
        ('ob_type', ctypes.c_void_p),
        ('Socket', ctypes.c_void_p),
        ('ctx', ctypes.c_void_p),
        ('ssl', ctypes.c_void_p),
        ]

def _bio(string):
    bio = libeay.BIO_new_mem_buf(
        string, len(string))
    if not bio:
        raise ss.SSLError(...)
    return bio
def install_text_cert(conn, cert_text, key_text):

    addr = id(conn.sock._sslobj)
    ptype = ctypes.POINTER(PySSLObject)
    obj = ctypes.cast(addr, ptype).contents

    c = libeay.PEM_read_bio_X509(
          _bio(cert_text), 0, 0, 0)
    k = libeay.PEM_read_bio_PrivateKey(
          _bio(key_text), 0, 0, 0)

    libssl.SSL_CTX_use_certificate(obj.ctx, c)
    libssl.SSL_CTX_use_PrivateKey(obj.ctx, k)
Freedom to access all attributes
means never being limited by what
the original author imagined
you might want to do

One last story

At the Hacker School
in New York this week:
from timeit import timeit

SETUP = """
import re
search = re.compile(r"([a-z])(\1)").search
b1 = 'bubble'
bk = 'bubble' * 1000
"""

print timeit('search(b1)', SETUP, number=10000)
print timeit('search(bk)', SETUP, number=10000)
0.0149960517883
5.89959216118

/usr/lib/python2.7/timeit.py

./timeit.py

—and added a print statement—

def inner(_it, _timer):

    import re
    search = re.compile(r"([a-z])(␁)").search
    b1 = 'bubble'
    bk = 'bubble' * 1000

    _t0 = _timer()
    for _i in _it:
        search(bk)
    _t1 = _timer()
    return _t1 - _t0
Whoops.
from timeit import timeit

SETUP = """
import re
search = re.compile(r"([a-z])(\1)").search
b1 = 'bubble'
bk = 'bubble' * 1000
"""

print timeit('search(b1)', SETUP, number=10000)
print timeit('search(bk)', SETUP, number=10000)

Sine qua nons

While several programming
languages have these features,
Python puts them together
in a way that is powerful
yet simple and elegant

Thank you very much!

(Photo of Armin Ronacher by
Nils Pascal Illenseer from 500px.com)