CherryPy and running out of file descriptors during development

Date:	19 October 2010
Tags:	computing, python

Well, that was interesting! A fellow developer complained that, following my recent introduction of ØMQ into our project, he could only go through a few cycles of editing the source code, saving his changes, and watching CherryPy automatically restart before it would fail with:

IOError: [Errno 24] Too many open files

An lsof against the process confirmed that more files were open after every restart. My guess was that leftover references were keeping a few Python file objects alive from one CherryPy reload to the next, and I remembered having once used a neat utility for exploring the object heap. After some investigation I re-discovered that it was Heapy. So I added a pdb breakpoint, did some investigating, and was puzzled to find that after each restart only four Python file objects existed — one for each log file in our application.

(I also tried using objgraph, which is far easier to use than Heapy, but it could not tell that there were file objects in memory at all. I have no idea why.)

Well, this was a puzzle. How could the number of open file descriptors increase without bound when Python was clearly deleting all of the old file objects? The answer, once I finally tried reading the source code to the Autoreloader plug-in, was of course very simple: CherryPy performs each restart by doing an exec() to completely wipe out the process image and replace it with a new instance of the CherryPy application. Which is certainly a very thorough approach! Except for one thing: file descriptors in Unix are set, by default, to survive an exec() call, but the new instance of Python that spins up inside of the process does not know that they are there, so they never get closed. It appeared that suddenly calling ØMQ out of existence with an exec() also left a few sockets lying around.

Several possible solutions came to mind. What if a more thorough effort was made to delete all Python objects before running the exec() call? That sounded daunting, though — it might take a lot of effort to march through all of the application object trees closing everything down. I could have focused my efforts just on finding the file objects, but that approach felt fragile; what would happen the next time one of our developers wrote a new module that opened a log file?

Monkey patching the open() built-in to create files with their close-on-exec flag already set is, of course, too terrible a solution even to contemplate.

In the end, the simplest solution seemed to be the creation of a little CherryPy plug-in that, as the very last shutdown action, would loop over all existing file descriptors and set their close-on-exec flag. Here is the plug-in, in case the pattern helps anyone else:

"""Make sure file descriptors close when CherryPy exec's."""

import os
import sys
from cherrypy.process.plugins import SimplePlugin

class CloexecPlugin(SimplePlugin):
    def stop(self):
        if hasattr(sys, 'getwindowsversion'):
            return
        import fcntl  # not available under Windows
        max = os.sysconf('SC_OPEN_MAX') if hasattr(os, 'sysconf') else 1024
        for fd in range(3, max):  # skip stdin/out/err
            try:
                flags = fcntl.fcntl(fd, fcntl.F_GETFD)
            except IOError:
                continue
            fcntl.fcntl(fd, fcntl.F_SETFD, flags | fcntl.FD_CLOEXEC)

    stop.priority = 99

Of course, I suspect that this problem was happening all along, even before we added extra logging and then integrated ØMQ into our application. But back then, with maybe only one or two stray file descriptors surviving each restart, it would have taken five hundred or a thousand CherryPy restarts for the problem to be noticed — and, apparently, none of us developers ever reached that impressive total. Now we know to be careful!