httplib and its discontents



──────────────────
PyOhio
2011 July 31st
Brandon Craig Rhodes
──────────────────
This is a talk
about what can go wrong
during programming
Not a talk about how
to use httplib or urllib2
For that, use the new wrapper
library by Kenneth Reitz:

http://pypi.python.org/pypi/requests

Public Service Announcement

Never install Python packages
system-wide except for virtualenv
$ virtualenv playground
$ cd playground
$ ls
bin/  include/  lib/
$ . bin/activate
(playground)$ pip install requests
(playground)$ python
>>> import requests
>>> r = requests.get('http://google.com')

“OO”

OO = Object-Oriented Programming

What is an “Object”?

History

There were variables:

firstname = 'Guido'
lastname = 'van Rossum'
state = 'CA'
age = 55
genre = 'Dutch'
firstnames = ['George', 'John', 'Thomas']
lastnames = ['Washington', 'Adams', 'Jefferson']
states = ['VA', 'MA', 'VA']
terms = [2, 1, 2]
But how do you pass
a whole person as a parameter?
ignorant_function(firstnames[i], lastnames[i],
                  states[i], terms[i])

aware_function(firstnames, lastnames,
               states, terms, i)

Records / Structures

presidents = [
    Person(firstname='George',
           lastname='Washington,
           state='VA',
           terms=2),
    ]

p = presidents[0]
print p.firstname, p.lastname

designate_holiday_for(p)

Quick Digression

presidents = [
    Person(firstname='George',
           lastname='Washington,
           state='VA',
           terms=2),
    ]              
       #    Why did I insert
       # this unnecessary comma?

A:

It creates perfect symmetry in a long list
some_primes = [
    6311,
    6317,
    6323,
    6329,
    6337,
    6343,
    ]
You can cut, paste,
and rearrange code with
the most wild abandon

Terminal commas are always allowed

♥ Python

So, records are great

p.firstname='George'
p.lastname='Washington
p.state='VA'
p.terms=2
They fixed our problem with data.
But what about our code?
We wrote many procedures
for processing records
def create_person(): 
def delete_person(p): 
def send_paycheck_to(p): 
def designate_retiree(p): 
def record_contribution(p): 
def produce_obituary(p): 
These procedures became very complex
as more varieties of each record appeared
def send_paycheck_to(p):
    "Produce a paycheck for person `p`."
    if p.role == 'student':
        
    elif p.role == 'employee':
        if p.status = 'biweekly':
            
        elif p.status = 'salary':
            if p.rank = 'emeritus':
                
            else:
                
These great forests of
if statement were chopped down
in the great OO revolution
In OO, we do for code
what records did for data
“method”
a function attached to data
myperson.process_payroll()
Now our programming language
implements the if forest for us
class Employee(object):
    def process_payroll(): 
    def move_to_retirement(): 

class BiweeklyStaff(Employee):
    def process_payroll(): 

class EmeritusProf(Employee):
    def move_to_retirement(): 
OO languages give the method
the object on which it was invoked
myperson.process_payroll()

Employee.process_payroll(myperson)
Python, of course, makes this
explicit rather than implicit
def process_payroll(self):
    

OO

Not just convenient, but
powerful in important ways
Packaging data and code together
makes it much easier to abstract

But could OO have a dark side?

“The Case of the HTTP Parser”

Stack Overflow question:

"Does Python have a module for parsing
HTTP requests and responses?"

HTTP

“Hypertext Transfer Protocol”

(hypertext means text with links)

Your web browser uses HTTP

“fetch http://google.com for me!”

Your browser:

An HTTP request is just text

GET / HTTP/1.1
Accept: text/html,application/xhtml+xml,
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cache-Control: max-age=0
Connection: keep-alive
Cookie:  
Host: www.google.com
User-Agent: Mozilla/5.0 (X11; Linux i686)
    AppleWebKit/535.1 (KHTML, like Gecko)
    Chrome/13.0.782.41 Safari/535.1
Q:
"Does Python have a module for parsing
HTTP requests and responses?"
A:

SimpleHTTPServer

∴ Python MUST have a parser!

Python's built-in web server

$ python -m SimpleHTTPServer
serves static files from
the current directory

Two questions

  1. Where in the Standard Library
    is this parser?
  2. How can I run it on a string?
Server classes take
care of sockets for you
(For all of the details, see
Foundations of Python Network Programming
by Brandon Rhodes and John Goerzen)

Server classes

SocketServer.BaseServer
SocketServer.TCPServer
BaseHTTPServer.HTTPServer
These open a socket,
set its IP address, run listen(),
then run accept() over and over
again accepting connections
But once the connection is
established, the Server classes
have no idea what to do next
So they pass the open
connection to a Handler
to do the actual talking

Handler classes

SocketServer.BaseRequestHandler
SocketServer.StreamRequestHandler
BaseHTTPServer.BaseHTTPRequestHandler
The HTTP handler is an extensible class
The Standard Library invites you
to create your own subclass of the handler
and add methods like do_GET() or do_POST()
GET / HTTP/1.1
Accept: text/html,text/plain
Host: www.example.com
If this request came in, the
server would call your do_GET()
POST / HTTP/1.1
Accept: text/html,text/plain
Content-Length: 12
Host: www.example.com

query=Python
For this the server would call do_POST()
Note that here OO is not being
used merely to provide a container
Instead, the module author thinks
that OO is a party and you are invited
You are invited to write new methods
on the author's class

Hint: this was probably a bad idea!

Classes — as we will soon see — can
be quite complex pieces of machinery.

Why invite another programmer in?

It breaks the clean barrier
that lets us do abstraction

Modern alternatives

                #inheritance

            BaseHTTPRequestHandler
                      
HTTPServer  MyHTTPRequestHandler

         # API
Mightn't it have been better
to also have an API between
HTTP parser and handler?
HTTPServer  HTTPParser  MyHandler
So how does BaseHTTPRequestHandler
actually parse the HTTP protocol?
BaseRequestHandler.__init__(self, request, client_address)
StreamRequestHandler.setup(self)
BaseHTTPRequestHandler.handle_one_request(self)
BaseHTTPRequestHandler.parse_request(self)

Great! How do I use it?

Q:

What does parse_request() need?

A:
Its needs are not documentated anywhere.
Why?
Because in Object-Oriented programming,
your conscience is taught that you can
use any attributes or methods on self
for free and without apologizing.
A:

Freebies:

self.default_request_version = "HTTP/0.9"
self.protocol_version = "HTTP/1.0"
self.MessageClass = mimetools.Message

Needs:

self.raw_requestline # string
self.rfile # used by self.MessageClass
self.send_error(code, message) # HTTP response!
That's quite an invocation protocol!
You have to set several object attributes.
This is called tight coupling

Useful phrase

“You've written some pretty
tightly coupled code there!”
(use the same tone of voice
as when you step in something)
Q:
So what does the Standard Library do
when they want to test parse_request()?
A:

“Well, you start by opening a socket…”

test/test_httpservers.py

self.server = HTTPServer(('', 0), self.request_handler)
self.test_object.PORT = self.server.socket.getsockname()[1]

Lesson

Python's Standard Library
will not parse an HTTP request …
unless you submit it through
a real live socket
(… or, unless you are willing to write code
that depends on the internal details
of the BaseHTTPRequestHandler)

I was willing.

from BaseHTTPServer import BaseHTTPRequestHandler
from StringIO import StringIO

class HTTPRequest(BaseHTTPRequestHandler):
    def __init__(self, request_text):
        self.rfile = StringIO(request_text)
        self.raw_requestline = self.rfile.readline()
        self.error_code = self.error_message = None
        self.parse_request()

    def send_error(self, code, message):
        self.error_code = code
        self.error_message = message
# Simply instantiate with the request text

request = HTTPRequest(request_text)

print request.error_code      # None
print request.command         # "GET"
print request.path         # "/who/ken/trust.html"
print request.request_version # "HTTP/1.1"
print len(request.headers)    # 3
print request.headers['host'] # "cm.bell-labs.com"

# Parsing can result in an error code and message

request = HTTPRequest(bad_request_text)

print request.error_code     # 400
print request.error_message  # "Bad request syntax"

Lesson

What have we learned?

OO is an incredible tool
for doing abstraction

BaseHTTPServer hides sockets really well!

But OO deadens the conscience
It lets programmers
write heavily coupled code because
“These are internal details of my class!
I can use anything on self that I want!”

Two remedies

  1. Testing
  2. Procedures

What if?

What if the HTTP parser had been written
as a good old-fashioned procedure?
The argument list and return statement
would provide clear documentation of what
data entered and exited the procedure
The code would have been prevented
from accumulating more and more
fiddly dependencies upon self
def parse_http_request(
      infile,
      request,
      version='HTTP/1.0',
      default_version='HTTP/0.9'):
    
    if failure:
        raise HTTPError(400, "Bad request")
    
    request.code = 
    request.path = 
    
    return

CherryPy

Robert Brewer is pretty awesome

From _cperror.py:

class HTTPError(CherryPyException):
    
    def get_error_page(self, ):
        return get_error_page()



def get_error_page(status, ):
    
He defines the procedure separately,
then writes the method that invokes
the procedure with data from self

Two remedies

  1. Testing
  2. Procedures
I recommend dependency injection
It can help our procedures like it
is already helping our classes

Dependency injection

Martin Fowler

http://martinfowler.com/articles/injection.html

Thank you!

Any questions?