pyron: Making Python package development DRY to the point of no return
April 22nd, 2009
I finally snapped last week.
After years of writing verbose and repetitive setup.py files for my Python packages, I am unable to write another. Instead, I have started writing Pyron, a tool that gathers the same information by inspecting a Python package itself. Not only does this mean that I get to stop repeating myself, but that my projects will become much more uniform because package metadata will be represented through common conventions instead of explicit (and repetitive) configuration. Though Pyron is still very primitive, it has already allowed me to reduce simple packages to only a README.txt plus their actual Python source code.
The start of the trouble
What happened is that I wanted to create a simple Python package full of tools for professional authors working with rst documents, so that they could monitor their word count while writing, and convert their rst files into the proprietary formats used by various publications. But just to start a new Python project required me to create four entire files, and almost as many directories:
./cursive.tools/setup.py ./cursive.tools/cursive/__init__.py ./cursive.tools/cursive/tools/README.txt ./cursive.tools/cursive/tools/__init__.py
The setup.py file itself repeats the project name over, and over, and over again, reminding me of the old Adventure game's “maze of twisty passages, all alike”:
from setuptools import setup
setup(
name = 'cursive.tools',
version = '0.1',
description = 'Tools for restructured text files',
author = 'Brandon Craig Rhodes',
author_email = 'brandon@rhodesmill.org',
packages = ['cursive.tools', 'cursive'],
namespace_packages = ['cursive'],
)
The first __init__.py file shown above of course looks like:
import pkg_resources pkg_resources.declare_namespace(__name__)
Meanwhile, my stub README.txt and __init__.py files down in the bottom directory contained just enough information to get me started, whether I wanted to start by writing documentation and tests or get started by writing actual code:
``cursive.tools`` -- Tools for restructured text files ------------------------------------------------------ The routines in this ``cursive.tools`` package are designed for authors. They provide command-line tools that can examine Restructured Text files.
"""Command-line routines for Restructured Text authors.""" __version__ = '0.1'
And, having created these files, I stopped, and stared in horror.
For an entire hour I tried to move on. I tried to start writing actual code and actual documentation. I tried to just ignore the stupidity of what I had just written. Or, in the case of setup.py, what I had just written by cutting and pasting from another project on my hard drive — yes, it's actually become that bad, that we cut-and-paste file contents between Python projects because our boilerplate requires so much repetition while carrying so little information.
But, try though I might, I could not move on to writing code; I was finally defeated. The Python language has done such a wonderful job over the past decade of honing my asthetics and sharpening my senses that I am now unable to use its own standard packaging techquies! This new package would have to wait until I had resolved the problems that sat staring me in the face. Let us review them, one by one.
- After stating so carefully that this package was named cursive.tools, I then had to inform setup() that the project name would also be — who would have guessed? — cursive.tools as well! This is idiotic. Of course I am giving this project the same name as the package it contains; that is a best-practice from which modern Python projects have no excuse to dissent. Who wants to have to remember that you need the ZODB3 package when all you want to do is import persistent? Who wants to remember to depend on pyephem when all you want is to import ephem (a problem that I, myself, created in my own misguided Python youth)? Not me. And not, if they have any sense, my users.
- This package is named cursive.tools. Of course I want cursive to be a namespace package! That is so painfully obvious that it should not even require mention; it should be inferred.
- Similarly, the mention that cursive is a package in the packages declaration is redundant. Of course if a.b is a package then a is going to be a package as well! There's not even a way to avoid that in the Python language, so far as I know. Why even make me type it?
- The entire top-level __init__.py file — the one inside of the cursive directory — is utterly and entirely a boilerplate cut-and-paste. Given that cursive is already stated to be a namespace package, it should not even be necessary to provide the contents of its __init__.py; it's standard and can be copied straight from PEP-382.
- The package, you will note, has started out lacking a long_description despite the fact that it has a perfectly serviceable README.txt file. Many packages jump through the hoops of path manipulation just to find their own README.txt so that they can include it as their long description; but why, in the absence of an override, shouldn't its inclusion as the long description be the default?
- This raises the larger question of where, exactly, should a project README.txt even go — where on the filesystem, that is, should it be placed? There seems to be no consistency on this between different Python packages. Some people place it directly at the project top-level, next to the setup.py file, which is friendliest to developers checking out the source code from a public repository — but which makes the README.txt invisible to users! Others place it down inside of the package directory itself so that it will be included in their distribution, which is better; and still other Python projects have two separate README.txt files so that they have both bases covered!
- The package version is kept in two different places here: in the setup.py and also in the __version__ symbol of the module itself. When the version advances, both places will have to be updated — if the developer remembers! The alternative is for the setup.py to grow more complex by including its own bootstrap code that uses path manipulations to find and introspect the __version__ symbol inside of the module.
- The name of the package occurs both at the top of README.txt and inside of setup.py.
- The short description is repeated twice: once in the title of the README.txt and once in the setup() stanza of the setup.py.
- Finally, the directory structure of this project is ridiculous. If, as the setup.py clearly states, I am writing the cursive.tools module, why should I even include both a cursive and a tools directory? Since the only legitimate activity that I can undertake in constructing this module is to place files inside of cursive.tools, why do directories exist where files could collect outside of this one depository?
Obviously, the above arguments hold only for pure-Python packages; when C extensions and other special effects come into play, then excellent reasons arise for a complicated directory structure, sophisticated metadata, and possibly documentation above and beyond that distributed with binary versions of the package. But for normal packages, I am finished with writing and distributing a setup.py by hand.
Toward perfecting Pyron
My new tool for Python package building, Pyron — which, for those keeping score, is my very first bitbucket-hosted project (and I am very much enjoying these first few weeks of using Mercurial, since Guido made the big decision at the end of PyCon last month) — is not yet mature enough to warrant a first release on PyPI. Please check out the development version if you want to take a first look at Pyron. And, yes, Pyron currently has to include a setup.py of its own, which will not disappear until I release the first version and it can become self-hosting!
Please note that Pyron is only for developers! The sdist archives and the eggs produced for a Pyron-powered project are completely standard; the end users and developers installing a module will not be affected by your choice to use Pyron. It simply keeps your project repository cleaner by inferring package metadata on the fly rather than making you maintain a setup.py in version control along with your Python package.
A package developed with pyron only needs two files: README.txt and __init__.py. The two files quoted above will work just fine. These simply need to sit in the same directory, like this:
./cursive.tools/README.txt ./cursive.tools/__init__.py
See? All of the actual meat of the cursive.tools module remains when the files are stored like this, while the while repetition and boilerplate disappears! Check out the Pyron README.txt (or, of course, the same information as formatted in its project page on PyPI) for more details about how it works; here, I will just make three last observations:
- Sometimes I had to choose between best practices when deciding how Pyron would operate. Where, for example, should it find the package name? Instead of looking at the title of the README.txt, as it currently does, one could imagine my having written it to look somewhere in __init__.py (but there seems to be no agreed-upon place for a package to name itself), or even at the name of the directory in which the package is sitting (but often the directory will not be named cursive.tools, but something like branches/0.1 or even just trunk). In each case, I have tried to choose the most obvious and easy-to-maintain convention, and the real point is that there be some common idiom for everyone to fall into line with as more and more packages in the future abandon their setup.py files and start using Pyron.
- Sometimes no best practice existed, and I had to, frankly, make things up. Where should the author of a package go, without a setup.py file? In a special metadata file that I would have to invent? In some formatted region of the README.txt file? By choosing instead that it go inside an __author__ symbol in setup.py, I hope that I have at least preserved symmetry with an existing best-practice while, again, making future Python projects as readable as possible should Pyron use become widespread.
- Pyron should become more sophisticated in the future, and eliminate even more repitition. It currently needs project dependencies, for example, to be defined as a __requires__ constant in a package's __init__.py file. In the future, Pyron will hopefully gain the ability to inspect a project's import statements and make intelligent guesses about its dependencies that could often eliminate any need for explicit dependency declarations.
Thanks to Pyron, I am now happily working away on my cursive packages, and they should soon see their first releases. I can now sleep at night, knowing that boilerplate and repetition have finally vanished from my development code.
Posted:
Wednesday, April 22nd, 2009 at 4:34 pm
Categories: Computing, Python
You can leave a response, or trackback from your own site.
This is really exciting! Writing setup.py is not just a pain once you know it – it’s hard to learn, and clear instructions are hard to come by. I recently came by a *very good* Python author who’d never released eggs b/c he didn’t know how!
April 22nd, 2009 at 5:00 pmLooks like great stuff–I rarely get around to actually releasing any of my hobby code because the business of packaging it up always seems so odious to me. I’ll definitely have to give Pyron a look!
April 22nd, 2009 at 10:46 pmAwesome. I’ve been struggling with this same problem. I kept thinking that I was supposed to be using something like paster and create a “template” for my project setup boilerplate.
I look forward to trying it out on some of my projects!
April 22nd, 2009 at 10:50 pmThe author of Expert Python Programming, Tarek Ziadé,
April 23rd, 2009 at 9:59 ampublished his code from the book in some pbp packages.
It uses paster for creating boilerplate code via
paster’s templates. Check out pbp.skels on PyPi.
Thanks, Allen, I had not heard specifically of
pbp.skelsbefore. My only exposure to auto-written templates has been with thegrokprojectcommand, which builds a working web app for the developer to start with; and with building a few Zope project skeletons in their repository to help me start projects more easily.But, even if the boilerplate is written by a tool, it’s still boilerplate, and therefore something that I would like to get rid of. Python has taught me that I should prefer a language that is tidy, rather than writing in an ugly one and giving my IDE the job of producing all the extra verbiage. Sure, templates save typing; but the result is still a verbose mess that’s hard to modify because of how many of its parts repeat information.
April 23rd, 2009 at 10:15 amI love all this, and I go one step further:
There’s no point writing a short-and-then-long description in README.txt, when that information should *also* be in the docstring of the package’s __init__.py module.
In other words, I commonly get my ‘description’ and ‘long_description’ by parsing from, not the README.txt, but the package’s docstring.
If you have Pyron do this, and then have it automatically write a README.txt containing the content of the package’s __init__.py docstring, you’re down to manually writing *one file* as input to Pyron.
Rock it.
April 24th, 2009 at 1:35 amThis looks like it simplifies basic packaging quite a bit. Have you considered integrating it with Paver?
April 24th, 2009 at 10:56 amI’ve gotten to the point of having the package ask pkg_resources what its version is. Benefits include collecting the “-dev” and/or revision number automatically added by setuptools. Drawbacks include time lost on the first import each time it runs, dependence on pkg_resources, and loss of version information when it is not installed as an egg.
This sounds like a better solution to the same problem.
April 24th, 2009 at 3:58 pmMaybe the README should be written in reST, with obvious advantages: parsing that is quite easy with docutils and you could take advantage of the docinfo section to extract author(s) and the like. It could even contain the dependencies…
April 28th, 2009 at 8:11 amI agree that writing setup.py is painful and annoying. To that end I also have a project for creating boilerplate called poachplate. I described it a little in my recent pycon talk, but the point is to create boilerplate for python scripts (that also need to be libraries).
April 29th, 2009 at 12:42 pmWhy not make the README.txt be the source for __init__.py file in literate-programming style? I can see there might be sync problems when testing as opposed to packaging, and I’m one of those egg-virgins myself, so forgive any misunderstanding, but, starting with the triple quote,
Really enjoying the Python category of your site.
December 19th, 2009 at 3:17 am