Why triple-quotes make PyFlakes hang Emacs

I need to refine the aspersions which I cast against PyFlakes last night in my response to Chris McDonough, answering his bounty against the bug that Emacs would hang in Flyspell mode when he typed triple-quotes. My email, I admit, was not entirely fair to PyFlakes; but you must remember that I was writing late at night, and in great haste, wanting to be the first to respond among however many dozens of Emacs LISP programmers were racing to converge on the solution to Chris's problem.

My mistake was to exaggerate somewhat the verbosity with which PyFlakes reports a syntax error. My email to Chris, in fact, made the following rather extravagant claim:

The problem is that ... on a syntax error, PyFlakes prints out an error message, then the entire contents of the module that it cannot import, and finally a line that contains a number of spaces equal to the offset into the file of the syntax error...

Just glancing at the PyFlakes source code is enough to see that this accusation cannot be true:

    try:
        ...
    except (SyntaxError, IndentationError):
        value = sys.exc_info()[1]
        (lineno, offset, line) = value[1][1:]
        ...
        print >> sys.stderr, 'could not compile %r:%d:' \
            % (filename, lineno)
        print >> sys.stderr, line
        print >> sys.stderr, " " * (offset-2), "^"

Clearly, this simply prints the line that the Python exception cites as having caused the problem, followed by a primitive attempt to position a ^ character at the location of the error. For simple syntax errors, this actually produces output which is identical to that of normal Python:

$ python error1.py
  File "error1.py", line 3
    return x y
             ^
SyntaxError: invalid syntax
$ pyflakes error1.py
could not compile 'error1.py':3:
    return x y
             ^

But the PyFlakes behavior is quite different from that of the standard interpreter if the syntax error happens in a Python statement that has been continued across several lines of source code. Imagine that there are two functions in a file, and that we have started typing a docstring for the first one but have not yet closed the triple-quote:

def square(x):
    """Returns the square of x.
    return x * x

def cube(x):
    """Returns the cube of x."""
    return x * x * x

Here, Python and PyFlakes give quite different reports:

$ python error2.py
  File "error2.py", line 6
    """Returns the cube of x."""
             ^
SyntaxError: invalid syntax
$ ~/.emacs.d/usr/bin/pyflakes error2.py
could not compile 'error2.py':6:
    """Returns the square of x.
    return x * x

def cube(x):
    """Returns the cube of x."""
 ...76 spaces... ^

Do you see what has happened? The unterminated triple-quoted string looks as though it ends several lines later, at what we ultimately intend to be the beginning of the next triple-quoted string. (If the file instead contained no further triple-quoted strings, then the new string would appear to extend all the way to the end of the file.)

This is no problem for the Python interpreter itself, which modestly displays only the final line of the multi-line syntax error. But PyFlakes, not checking for this possibility, prints out the entire triple-quoted string, followed by a ^ character that is indented the entire length of the triple-quoted string. In the example that I was testing last night, this produced a line of nearly four thousand spaces that then ended in the lone little caret character.

So while PyFlakes is certainly more verbose than standard Python, it is not being nearly as profligate as I claimed. It does not insist on printing out your whole module, but limits its output to lines that actually appear involved in the error. Either way, it is the following line — the one that starts with all of the spaces — that is really the problem, since too many spaces send one of the Emacs Flymake regular expressions spiralling into exponential oblivion.

Perhaps Flymake should support a command-line option that omits code snippets entirely, so that Emacs will have only error messages to process. Either way, Chris and I can be more productive now that we can safely integrate a patched version of this excellent tool into our coding sessions.

Posted in Computing, Python | 1 Comment »

PyEphem now available for Python 3.0!

Eager not to be left behind by the advance of history, I have released PyEphem tonight for Python 3.0! After updating its C-language routines earlier this week, as described in my previous post, and adjusting its Python syntax, I thought that my work was done — until I received a bug report from Reto Schüttel, an enterprising Swiss programmer. He had read my previous post, asked me for the location of PyEphem's Python 3.0 branch, downloaded it with bzr, and already tried it out, not only on his Linux machine, but also on his OS X machine!

While the first twenty revisions of my branch deal with simple Python 3.0 mechanics, I must congratulate Mr. Schüttel for most of the improvements in the subsequent dozen revisions. We not only exchanged emails all day as I produced one revision after another that I needed him to test, but he then joined me on IRC tonight until 3:30am in his time zone as we worked out the last problems.

The issues were all related to localization under OS X. The astronomy library underlying PyEphem used the C-language functions sscanf() and atof() to turn strings into numbers, and it turns out that these functions are very sensitive to locale under the specific combination of OS X and Python 3.0! Because of his locale, the functions wanted a comma instead of a period to separate whole numbers from their decimal fraction (so that π would have to be input like 3,141 rather than like 3.141). They also wanted month name abbreviations to be in German rather than English, ruining my test cases that check planet positions against Naval Observatory tables which use English month abbreviations like Jan, Feb, and Mar with very little regard for how the months would be spelled in German. We are still mystified by the combination of Python version and operation system that were necessary to cause this problem:

Python 2.6 under Linux: tests passed
Python 3.0 under Linux: tests passed
Python 2.6 under OS X: tests passed
Python 3.0 under OS X: broken: sscanf()/atof() change with locale

But the tests worked fine if he put LANG=C on the command line.

To work around this problem, I discovered a wonderful PyOS_ascii_strtod() function in Python's C library that avoided all of the problems that I was having with localization, and so I gradually rewrote the astronomy routines to use that function instead. Fixing the problems with month names was easier; instead of trying to make Python convert month abbreviations by passing the '%b' conversion character to time.strptime(), I simply converted months to integers myself and then passed the integers in with the simpler '%m' format character. It was only late at night that we finally tracked down every routine that was misbehaving.

The final puzzle was how to release my new software. The Python Package Index does not yet allow a project to offer separate source code archives for both the Python 2.x and the 3.0 version of a project. There might be some clever way of storing both versions of the source code in the same .tar.gz file, but the packages I see on PyPI that already support both 2.x and 3.0 contain enough #ifdef statements to convince me that I want to keep my branches separate!

I was saved by a peculiarity of my project. Though it provides the Python package named ephem, the actual project on PyPI has always been named “PyEphem” instead. The product has to have the “Py” in front, you see, to distinguish it both from the original text-screen ephem command and the modern XEphem graphical application, and my first instinct had been to name the project after the product. But as I gained experience with PyPI, it seemed more and more awkward that programmers who wanted their programs to be able to:

import ephem

had to remember to type something different when installing:

easy_install pyephem

Since I had been wanting to switch PyPI names anyway, I suddenly saw my chance! I have now released the new Python 3.0 version of PyEphem under the actual package name ephem and will continue to maintain the 2.x version of the package separately under the old pyephem name. Obviously, this solution only works because of my project's unique circumstances; I have no idea what other projects should do who also want to come out with their new Python 3.0 versions this weekend.

Fans of PyEphem should rest assured that when I develop new features, I will be adding them to both versions of PyEphem for probably at least the next decade. I have absolutely no intention of abandoning or slowing development of the Python 2.x version of the library; I simply wanted the library available to users of the new platform. As always, feel free to email me with bugs, suggestions, and new features — and, enjoy using PyEphem and Python!

Posted in Computing, PyEphem, Python | No Comments »

Porting a C extension module to Python 3.0

With several packages already advertising Python 3.0 compatibility, it seemed high time to look into releasing my PyEphem astronomy package in an edition compatible with the new language. But I hesitated: how difficult is it really, and how many hours of work will it consume, to port a C-language extension module to Python 3.0?

The answer is that, while the necessary changes were surprisingly easy, they took lots of time to figure out because I did not find them documented in any one place. I offer the following notes to assist any other adventurers who want to experiment with porting their extension modules to 3.0. These notes might also suggest useful additions to the official documentation.

But, first, I need to issue three cautions. To develop under 3.0, you may have to forego several Python tools that you probably thought you could no longer do without. The world of 3.0 is a windswept and icy landscape from which the glaciers have just receded, and you will find the stone tools rather primitive when compared to the comforts of civilization that you enjoy under Python 2. To wit:

  • I cannot find virtualenv for 3.0, which is a disaster. This means that you have to create a separate Python 3.0 install, built with a different --prefix option to ./configure, for each development environment you want to create on your box.
  • I cannot find a version of the setuptools available for 3.0. This means limiting your setup.py instructions to the primitive vocabulary of the distutils package. For example, I find myself unable to run the PyEphem test suite at this late hour because I have been running it for so long with:
    $ python setup.py test
    that I am not sure how to get it running otherwise.
  • Should you succeed in porting your extension module, it is not at all clear how to distribute it. I had expected either a new PyPI to spring into being — since every package will need an entirely different version under 3.0 — or for a sophisticated scheme to appear for registering one pyephem.tar.gz as the Python 2 version and another pyephem.tar.gz for 3.0. But while the most recent version of your package can mark itself as 2-compatible or 3-compatible (or both) using classifiers, there is no way to have two “most recent” versions of a package. Are we supposed to start distributing a single tar.gz that includes the source code for both Python series, and that selects the right code by detecting the interpreter version at the top of the setup.py file?

So if you make the effort to port your code right now, you might find that the shiny new version of your module is all dressed up, but has no place to go. If you experiment with the following steps, though, you will at least be ready when an official distribution channel does appear for releasing your package into the wilds of 3.0.

(more...)

Posted in Computing, Grok, PyEphem, Python, Zope | 1 Comment »

Comprehension consistency at last in Python 3.0!

A new era is begun: Python 3.0 has been released, bringing the bright and burning lights of reason, consistency, and symmetry to bear on my favorite language. Guido van Rossum, the creator of Python, has carefully guided this final attempt to remove the warts that have accumulated over the language’s 17-year lifetime, and the result is magical.

Python 3.0 (r30:67503, Dec  4 2008, 10:23:44)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> [ n*n for n in range(5) ]
[0, 1, 4, 9, 16]
>>>
>>> { n*n for n in range(5) }
{0, 1, 4, 16, 9}
>>>
>>> { n: n*n for n in range(5) }
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Magnificence! Do you feel the waves of beauty crashing over you? No, no, not over the sequence of squares — over the fact that all three basic collection types now support comprehensions!

Comprehensions were first introduced in Python 2.0, but with the terribly awkward stipulation that they were only possible for lists, not for dictionaries. This meant teaching newcomers to the language that list construction was a special case, and that the collections that had deserved their own constructor syntax (lists and dictionaries, at that time; sets came later) were not equally powerful. It also made necessary the awkward and expensive technique of building, and then immediately discarding, a list of tuples to quickly construct dictionaries:

Python 2.5.2 (r252:60911, Nov 14 2008, 19:46:32)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> dict([ (n, n*n) for n in range(5) ])
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

The arrival of generator expressions in 2.4 did, at least, allow us to remove the ugly square brackets and avoid creating the list (though all of the tuples still got created then immediately discarded). But the problem remained that dictionaries made from inline generators did not look like dictionaries syntactically.

But, no more! They have even updated the scandalously withdrawn PEP 274 to announce that the feature has finally arrived. After the aching and painful years of the Python 2 series, the language once again shines bright and clear as a model of clever symmetries and low mental impedance. Python’s famously tight “feature set” can, now more easily than ever, fit comfortably into the programmer’s brain.

What shall I write first in Python 3.0? I wonder.

But you can be sure that my code will find lots of excuses for constructing dictionaries.

Posted in Computing, Python | 2 Comments »