Porting a C extension module to Python 3.0

Date:	9 December 2008
Tags:	computing, grok, pyephem, python, zope

Date:

9 December 2008

Tags:

computing, grok, pyephem, python, zope

With several packages already advertising Python 3.0 compatibility, it seemed high time to look into releasing my PyEphem astronomy package in an edition compatible with the new language. But I hesitated: how difficult is it really, and how many hours of work will it consume, to port a C-language extension module to Python 3.0?

The answer is that, while the necessary changes were surprisingly easy, they took lots of time to figure out because I did not find them documented in any one place. I offer the following notes to assist any other adventurers who want to experiment with porting their extension modules to 3.0. These notes might also suggest useful additions to the official documentation.

But, first, I need to issue three cautions. To develop under 3.0, you may have to forego several Python tools that you probably thought you could no longer do without. The world of 3.0 is a windswept and icy landscape from which the glaciers have just receded, and you will find the stone tools rather primitive when compared to the comforts of civilization that you enjoy under Python 2. To wit:

I cannot find virtualenv for 3.0, which is a disaster. This means that you have to create a separate Python 3.0 install, built with a different --prefix option to ./configure, for each development environment you want to create on your box.

I cannot find a version of the setuptools available for 3.0. This means limiting your setup.py instructions to the primitive vocabulary of the distutils package. For example, I find myself unable to run the PyEphem test suite at this late hour because I have been running it for so long with:

$ python setup.py test

that I am not sure how to get it running otherwise.

Should you succeed in porting your extension module, it is not at all clear how to distribute it. I had expected either a new PyPI to spring into being — since every package will need an entirely different version under 3.0 — or for a sophisticated scheme to appear for registering one pyephem.tar.gz as the Python 2 version and another pyephem.tar.gz for 3.0. But while the most recent version of your package can mark itself as 2-compatible or 3-compatible (or both) using classifiers, there is no way to have two “most recent” versions of a package. Are we supposed to start distributing a single tar.gz that includes the source code for both Python series, and that selects the right code by detecting the interpreter version at the top of the setup.py file?

So if you make the effort to port your code right now, you might find that the shiny new version of your module is all dressed up, but has no place to go. If you experiment with the following steps, though, you will at least be ready when an official distribution channel does appear for releasing your package into the wilds of 3.0.

Four Steps To 3.0

Yes, four steps were all that were necessary to convert my quite complex extension module to Python 3.0!

Use PyModule_Create(). The old mechanism that I had been using to initialize my extension module, the rather clunkily-named Py_InitModule3(), happily does not even exist in the Python 3.0 header files. Instead, call the PyModule_Create() function which you can find described in the Module Initialization section of the Extending and Embedding document. And be sure to keep its return value: unlike in older Pythons, you now have to return the module object it creates as the return value from your module initialization function.

Adjust all Python object headers. Each type object in my code started with a macro to set up the common fields that all Python objects share. This was then followed by the ob_size field, which in my code always is always zero, and then the type name:

/* For Python 2 */

static PyTypeObject BinaryStarType = {
     PyObject_HEAD_INIT(NULL)
     0,                   /* ob_size */
     "ephem.BinaryStar",  /* tp_name */
     ...

Though the Python 3.0 documentation still shows this as the way to create types, this technique will now completely fail. (The bug indicating that the documentation gets this wrong has, as its most recent comment, the helpful note “I'm lowering the priority so it doesn't block the release.”) Anyway, the solution is simple: the first two lines in the struct shown above simply have to be combined into a single macro call:

/* For Python 3.0 */

static PyTypeObject BinaryStarType = {
     PyVarObject_HEAD_INIT(NULL, 0)
     "ephem.BinaryStar",  /* tp_name */
     ...

With this change, my objects are now operating fine.

Use plain “static.” PyEphem inherits code from an era when it was popular to use variations on the static keyword so that Python could work around problems with various troublesome C compilers. This filled my code with staticforward declarations like the one you can see at the top of the old Python 2.2 Defining New Types page. It turns out that for well-behaved compilers these were always simply synonyms for the static keyword, which is what you must replace them with when porting your code to 3.0.

Upgrade to Unicode. Python 3.0 makes a clean and sharp distinction between strings, which are sequences of Unicode characters, and byte arrays, which can represent anything. To reflect this sea change down at the C level, the decision was made to eliminate everything from the C API whose name started with PyString. The obvious compiler errors that result from this provide a clear signal to programmers that they have to decide, everywhere that they had been using an old-style Python string, whether they should now represent that data with the PyUnicode or the PyBytes type. This was a brilliant decision; the transition could have been a nightmare had it remained possible for old code to compile without having been properly converted!

When migrating the PyEphem code base, I found that most of the Unicode transition was very easy. Everywhere that my code handled or created a string object, I simply changed the prefix of the function to PyUnicode and everything worked:

 PyString_Check      ... becomes ... PyUnicode_Check
 PyString_FromString ... becomes ... PyUnicode_FromString
 PyString_FromFormat ... becomes ... PyUnicode_FromFormat

Well, okay, the trick does not work everywhere; this one was harder to guess:

 PyString_Size ... becomes ... PyUnicode_GET_SIZE

The situations that require real thought are the places where my code needs to convert a Python string into the sort of simple ASCII character array that the underlying C library can absorb. At the moment, my code is leaning heavily on a pitiful PyUnicode_AsString() routine that I wrote just to get things working; in the morning I will have to look into doing this more correctly, including catching the error if a fancy Unicode character is present that cannot properly be converted.

Overall, I am very impressed with how quickly I was able to get my extension module compiling and running under Python 3.0. The procedure was simple — I just tried, over and over again, to build the module with:

$ python3.0 setup.py build

and then tackled the compiler errors that resulted. Once every last warning had been addressed, the module started up and operated without a single further complaint. This calls for celebration! I'm going to bed.