by Brandon Rhodes • Home

WSGI and truncated chunked response bodies

Date: 14 February 2013
Tags:computing, python

I may be almost through with WSGI. While it has certainly worked for a number of my close-to-the-wire HTTP projects over the years, I seem finally to have reached an edge case where — as a standard — it cannot guarantee that I even return a correct response to browsers!

The great triumph of WSGI is that Python for the Web was suddenly pluggable. Whether you wrote your application as a raw WSGI callable or built atop a framework like Django or Pyramid, you could move from mod_wsgi running under Apache to flup running behind nginx to gunicorn running on Heroku without batting an eye or rewriting a single line of code.

The great tragedy of WSGI is its complexity. Despite the fact that there are code examples inlined into its PEP, it seems that hardly anyone can put together a fully correct server or piece of middleware. Writers like Armin Ronacher and Graham Dumpleton are good sources of complaints on this subject, as in Graham's recent pair of posts WSGI middleware and the hidden write() callable and Obligations for calling close() on the iterable returned by a WSGI application. The latter article makes the telling observation that, “Despite the WSGI specification having been around for so long, one keeps seeing instances where it is implemented wrongly.” The problem is that WSGI makes a very awkward gesture toward asynchronicity — an iterable response body — but lets the application block while doing all of the rest of its work. The resulting architecture is still completely unusable by actual async folks like the Twisted or Tornado teams, while managing to make life awkward for everybody else. Add in WSGI's other features, like an obscure synchronous write() call and the ability of the application to call start_response() several times if it changes its mind, and correctness starts to become very difficult to achieve.

The great salvation of WSGI is that hardly anyone actually has to touch it. Nearly the entire mass of the world's busy Python web programmers are protected from the Terrible Secret of WSGI by working behind some web framework or other. This lets WSGI's one great benefit shine — that servers and applications can be plugged into one other fairly arbitrarily — without anyone but framework authors having to wallow in its complexity and then attend the Web Summit to vent and recuperate.

But, on to my topic for today.

To my great surprise, it turns out that — for all its complexity — WSGI manages to be under-specified! Consider the following application:

def simple_app(environ, start_response):
    headers = [('Content-Type', 'text/plain')]
    start_response('200 OK', headers)

    def content():
        # We start streaming data just fine.
        yield 'The dwarves of yore made mighty spells,'
        yield 'While hammers fell like ringing bells'

        # Then the back-end fails!
        try:
            1/0
        except:
            start_response('500 Error', headers, sys.exc_info())
            return

        # So rest of the response data is not available.
        yield 'In places deep, where dark things sleep,'
        yield 'In hollow halls beneath the fells.'

    return content()

This tiny example manages to exhibit every essential property of the situation in which a much larger application has placed me:

Many of the resources in play will be cacheable by clients — some thanks to an ETag and others thanks to a far-future Expires header. This means that returning a truncated response without any indication of failure not only ruins the client's current attempt to use the resource, but might render the client permanently unable to proceed because it might never realize that its cached copy is truncated and that it needs to re-fetch the resource.

So it is absolutely imperative that the WSGI server running my application correctly signal truncated responses to HTTP clients. There are, to my knowledge, only two ways of doing so.

First, an HTTP server can specify a Content-Length but then close the socket before sending that much data. Standards-loving HTTP client libraries will always recognize failure in this case. However, one of the limitations that I have already stated is that I do not know the Content-Length until I have finished generating and returning the resource, so that is not an option here.

Second, an HTTP server can use chunked encoding but then close the socket prematurely either without finishing the current chunk, or by omitting the concluding zero-length chunk 0\r\n\r\n. An HTTP client will recognize this as a failure to receive the entire response.

As you can see in the example app above, I am doing everything right:

So, that is my situation.

I need to stream large responses without knowing their length and in circumstances where the client receiving the response body must always be able to recognize a truncated response so that they do not run off and try to operate upon the truncated data.

How do four common WSGI servers stack up when presented with the sample application above?

I will probably not use CherryPy in this particular application because, for other reasons, I am building it upon gevent and have therefore figured out how to work around the problems with its pywsgi server (and will soon be putting those changes together into a pull request). But it was heartening to see that, at the very gray edges of the WSGI standard where HTTP itself needs very careful handling — since HTTP includes no explicit way to say, “Wait! Never mind! I cannot finish this response after all!” — that at least one of the WSGI servers on my short-list manages to put together the most utterly correct behavior I can think of.

I will let you know which brand of beer Robert chooses.

©2021