Let's discuss the matter further

New Year’s meme: What are the oldest files in your home directory?

Celebrate the new year with a blog post discussing the oldest files that are still sitting somewhere beneath your home directory! The procedure is simple:

  1. Run the following script in your home directory. (You might want to use less to read the output.)
  2. Ignore files whose date does not reflect your own activity.
  3. List the oldest files in a blog post and discuss!
#!/usr/bin/env python
"""Print last-modified times of files beneath '.', oldest first."""
import os, os.path, time
paths = ( os.path.join(b,f) for (b,ds,fs) in os.walk('.') for f in fs )
for mtime, path in sorted( (os.lstat(p).st_mtime, p) for p in paths ):
    print time.strftime("%Y-%m-%d", time.localtime(mtime)), path

Only include files whose last-modified time is a date on which you really touched the file. The file's time should neither result from an error (a few files beneath my own home directory have an incorrect date of 1970-01-01), nor from unpacking someone else's archive that has old files inside of it. For example, I myself have excluded the following pair of nearly 17-year-old files because their dates reflect their age inside of the Python 3.0 source archive, instead of the actual moment last month when they became part of my home directory:

1992-03-02 ./src/Python-3.0/Demo/scripts/wh.py
1992-03-02 ./src/Python-3.0/Tools/scripts/dutree.doc

But there is no requirement that the actual content of each file you list be your own. Whether you wrote the file yourself long ago, or downloaded it from some ancient and forgotten FTP site, you have a story to share!

Within the rules given above, here are the oldest files beneath my own home directory:

(more...)
this is a visual separator

Why triple-quotes make PyFlakes hang Emacs

I need to refine the aspersions which I cast against PyFlakes last night in my response to Chris McDonough, answering his bounty against the bug that Emacs would hang in Flyspell mode when he typed triple-quotes. My email, I admit, was not entirely fair to PyFlakes; but you must remember that I was writing late at night, and in great haste, wanting to be the first to respond among however many dozens of Emacs LISP programmers were racing to converge on the solution to Chris's problem.

My mistake was to exaggerate somewhat the verbosity with which PyFlakes reports a syntax error. My email to Chris, in fact, made the following rather extravagant claim:

The problem is that ... on a syntax error, PyFlakes prints out an error message, then the entire contents of the module that it cannot import, and finally a line that contains a number of spaces equal to the offset into the file of the syntax error...

Just glancing at the PyFlakes source code is enough to see that this accusation cannot be true:

    try:
        ...
    except (SyntaxError, IndentationError):
        value = sys.exc_info()[1]
        (lineno, offset, line) = value[1][1:]
        ...
        print >> sys.stderr, 'could not compile %r:%d:' \
            % (filename, lineno)
        print >> sys.stderr, line
        print >> sys.stderr, " " * (offset-2), "^"

Clearly, this simply prints the line that the Python exception cites as having caused the problem, followed by a primitive attempt to position a ^ character at the location of the error. For simple syntax errors, this actually produces output which is identical to that of normal Python:

$ python error1.py
  File "error1.py", line 3
    return x y
             ^
SyntaxError: invalid syntax
$ pyflakes error1.py
could not compile 'error1.py':3:
    return x y
             ^

But the PyFlakes behavior is quite different from that of the standard interpreter if the syntax error happens in a Python statement that has been continued across several lines of source code. Imagine that there are two functions in a file, and that we have started typing a docstring for the first one but have not yet closed the triple-quote:

def square(x):
    """Returns the square of x.
    return x * x

def cube(x):
    """Returns the cube of x."""
    return x * x * x

Here, Python and PyFlakes give quite different reports:

$ python error2.py
  File "error2.py", line 6
    """Returns the cube of x."""
             ^
SyntaxError: invalid syntax
$ ~/.emacs.d/usr/bin/pyflakes error2.py
could not compile 'error2.py':6:
    """Returns the square of x.
    return x * x

def cube(x):
    """Returns the cube of x."""
 ...76 spaces... ^

Do you see what has happened? The unterminated triple-quoted string looks as though it ends several lines later, at what we ultimately intend to be the beginning of the next triple-quoted string. (If the file instead contained no further triple-quoted strings, then the new string would appear to extend all the way to the end of the file.)

This is no problem for the Python interpreter itself, which modestly displays only the final line of the multi-line syntax error. But PyFlakes, not checking for this possibility, prints out the entire triple-quoted string, followed by a ^ character that is indented the entire length of the triple-quoted string. In the example that I was testing last night, this produced a line of nearly four thousand spaces that then ended in the lone little caret character.

So while PyFlakes is certainly more verbose than standard Python, it is not being nearly as profligate as I claimed. It does not insist on printing out your whole module, but limits its output to lines that actually appear involved in the error. Either way, it is the following line — the one that starts with all of the spaces — that is really the problem, since too many spaces send one of the Emacs Flymake regular expressions spiralling into exponential oblivion.

Perhaps Flymake should support a command-line option that omits code snippets entirely, so that Emacs will have only error messages to process. Either way, Chris and I can be more productive now that we can safely integrate a patched version of this excellent tool into our coding sessions.

this is a visual separator

PyEphem now available for Python 3.0!

Eager not to be left behind by the advance of history, I have released PyEphem tonight for Python 3.0! After updating its C-language routines earlier this week, as described in my previous post, and adjusting its Python syntax, I thought that my work was done — until I received a bug report from Reto Schüttel, an enterprising Swiss programmer. He had read my previous post, asked me for the location of PyEphem's Python 3.0 branch, downloaded it with bzr, and already tried it out, not only on his Linux machine, but also on his OS X machine!

While the first twenty revisions of my branch deal with simple Python 3.0 mechanics, I must congratulate Mr. Schüttel for most of the improvements in the subsequent dozen revisions. We not only exchanged emails all day as I produced one revision after another that I needed him to test, but he then joined me on IRC tonight until 3:30am in his time zone as we worked out the last problems.

The issues were all related to localization under OS X. The astronomy library underlying PyEphem used the C-language functions sscanf() and atof() to turn strings into numbers, and it turns out that these functions are very sensitive to locale under the specific combination of OS X and Python 3.0! Because of his locale, the functions wanted a comma instead of a period to separate whole numbers from their decimal fraction (so that π would have to be input like 3,141 rather than like 3.141). They also wanted month name abbreviations to be in German rather than English, ruining my test cases that check planet positions against Naval Observatory tables which use English month abbreviations like Jan, Feb, and Mar with very little regard for how the months would be spelled in German. We are still mystified by the combination of Python version and operation system that were necessary to cause this problem:

Python 2.6 under Linux: tests passed
Python 3.0 under Linux: tests passed
Python 2.6 under OS X: tests passed
Python 3.0 under OS X: broken: sscanf()/atof() change with locale

But the tests worked fine if he put LANG=C on the command line.

To work around this problem, I discovered a wonderful PyOS_ascii_strtod() function in Python's C library that avoided all of the problems that I was having with localization, and so I gradually rewrote the astronomy routines to use that function instead. Fixing the problems with month names was easier; instead of trying to make Python convert month abbreviations by passing the '%b' conversion character to time.strptime(), I simply converted months to integers myself and then passed the integers in with the simpler '%m' format character. It was only late at night that we finally tracked down every routine that was misbehaving.

The final puzzle was how to release my new software. The Python Package Index does not yet allow a project to offer separate source code archives for both the Python 2.x and the 3.0 version of a project. There might be some clever way of storing both versions of the source code in the same .tar.gz file, but the packages I see on PyPI that already support both 2.x and 3.0 contain enough #ifdef statements to convince me that I want to keep my branches separate!

I was saved by a peculiarity of my project. Though it provides the Python package named ephem, the actual project on PyPI has always been named “PyEphem” instead. The product has to have the “Py” in front, you see, to distinguish it both from the original text-screen ephem command and the modern XEphem graphical application, and my first instinct had been to name the project after the product. But as I gained experience with PyPI, it seemed more and more awkward that programmers who wanted their programs to be able to:

import ephem

had to remember to type something different when installing:

easy_install pyephem

Since I had been wanting to switch PyPI names anyway, I suddenly saw my chance! I have now released the new Python 3.0 version of PyEphem under the actual package name ephem and will continue to maintain the 2.x version of the package separately under the old pyephem name. Obviously, this solution only works because of my project's unique circumstances; I have no idea what other projects should do who also want to come out with their new Python 3.0 versions this weekend.

Fans of PyEphem should rest assured that when I develop new features, I will be adding them to both versions of PyEphem for probably at least the next decade. I have absolutely no intention of abandoning or slowing development of the Python 2.x version of the library; I simply wanted the library available to users of the new platform. As always, feel free to email me with bugs, suggestions, and new features — and, enjoy using PyEphem and Python!

this is a visual separator

Porting a C extension module to Python 3.0

With several packages already advertising Python 3.0 compatibility, it seemed high time to look into releasing my PyEphem astronomy package in an edition compatible with the new language. But I hesitated: how difficult is it really, and how many hours of work will it consume, to port a C-language extension module to Python 3.0?

The answer is that, while the necessary changes were surprisingly easy, they took lots of time to figure out because I did not find them documented in any one place. I offer the following notes to assist any other adventurers who want to experiment with porting their extension modules to 3.0. These notes might also suggest useful additions to the official documentation.

But, first, I need to issue three cautions. To develop under 3.0, you may have to forego several Python tools that you probably thought you could no longer do without. The world of 3.0 is a windswept and icy landscape from which the glaciers have just receded, and you will find the stone tools rather primitive when compared to the comforts of civilization that you enjoy under Python 2. To wit:

  • I cannot find virtualenv for 3.0, which is a disaster. This means that you have to create a separate Python 3.0 install, built with a different --prefix option to ./configure, for each development environment you want to create on your box.
  • I cannot find a version of the setuptools available for 3.0. This means limiting your setup.py instructions to the primitive vocabulary of the distutils package. For example, I find myself unable to run the PyEphem test suite at this late hour because I have been running it for so long with:
    $ python setup.py test
    that I am not sure how to get it running otherwise.
  • Should you succeed in porting your extension module, it is not at all clear how to distribute it. I had expected either a new PyPI to spring into being — since every package will need an entirely different version under 3.0 — or for a sophisticated scheme to appear for registering one pyephem.tar.gz as the Python 2 version and another pyephem.tar.gz for 3.0. But while the most recent version of your package can mark itself as 2-compatible or 3-compatible (or both) using classifiers, there is no way to have two “most recent” versions of a package. Are we supposed to start distributing a single tar.gz that includes the source code for both Python series, and that selects the right code by detecting the interpreter version at the top of the setup.py file?

So if you make the effort to port your code right now, you might find that the shiny new version of your module is all dressed up, but has no place to go. If you experiment with the following steps, though, you will at least be ready when an official distribution channel does appear for releasing your package into the wilds of 3.0.

(more...)
this is a visual separator

Comprehension consistency at last in Python 3.0!

A new era is begun: Python 3.0 has been released, bringing the bright and burning lights of reason, consistency, and symmetry to bear on my favorite language. Guido van Rossum, the creator of Python, has carefully guided this final attempt to remove the warts that have accumulated over the language’s 17-year lifetime, and the result is magical.

Python 3.0 (r30:67503, Dec  4 2008, 10:23:44)
[GCC 4.3.2] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>>
>>> [ n*n for n in range(5) ]
[0, 1, 4, 9, 16]
>>>
>>> { n*n for n in range(5) }
{0, 1, 4, 16, 9}
>>>
>>> { n: n*n for n in range(5) }
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Magnificence! Do you feel the waves of beauty crashing over you? No, no, not over the sequence of squares — over the fact that all three basic collection types now support comprehensions!

Comprehensions were first introduced in Python 2.0, but with the terribly awkward stipulation that they were only possible for lists, not for dictionaries. This meant teaching newcomers to the language that list construction was a special case, and that the collections that had deserved their own constructor syntax (lists and dictionaries, at that time; sets came later) were not equally powerful. It also made necessary the awkward and expensive technique of building, and then immediately discarding, a list of tuples to quickly construct dictionaries:

Python 2.5.2 (r252:60911, Nov 14 2008, 19:46:32) 
[GCC 4.3.2] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>>
>>> dict([ (n, n*n) for n in range(5) ])
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

The arrival of generator expressions in 2.4 did, at least, allow us to remove the ugly square brackets and avoid creating the list (though all of the tuples still got created then immediately discarded). But the problem remained that dictionaries made from inline generators did not look like dictionaries syntactically.

But, no more! They have even updated the scandalously withdrawn PEP 274 to announce that the feature has finally arrived. After the aching and painful years of the Python 2 series, the language once again shines bright and clear as a model of clever symmetries and low mental impedance. Python’s famously tight “feature set” can, now more easily than ever, fit comfortably into the programmer’s brain.

What shall I write first in Python 3.0? I wonder.

But you can be sure that my code will find lots of excuses for constructing dictionaries.

this is a visual separator

Rise and Fall of the Two Waldos

I am experiencing my Flickr photostream in an entirely new way thanks to the tools they introduced this year for monitoring the traffic received by individual photographs. The old, static parts of my photostream suddenly look dynamic: I can see the rate at which each old photo is continuing to attract viewers. On October 17th, for example, was stunned to discover that the perennial favorite Harry Potter Lunchbox had, over the previous day, received fewer views than the perpetually distant second, “My Shirt” (both taken at Dragon*Con 2005). Here is Flickr’s graph of how “My Shirt” fared over the month of October, with my mouse over October 17th to highlight the day on which I first noticed its growing popularity:

graph of the photograph popularity The rise and fall of the popularity of “My Shirt” over the four weeks from 5 October through 1 November.

And by drilling down into the list of “Referrers” beneath the graph, I was even able to discover the source of its brief popularity! As Halloween approached, people were doing hundreds of Google Image searches for “waldo costume”, “where’s waldo costume”, and “where’s waldo shirt”, which brought them straight to my image. As you can see in the above graph, the swelling interest did not peak until Halloween itself, after which the photo plummeted back to its more typical popularity of one or two dozen views per day.

All sorts of gems are hidden in the statistics, waiting to be discovered. For example, it was very satisfying to learn that Yahoo! Image Search considers my wedding photograph A Grandfather in Attendance to be the most important “grandfather” image on the entire web. Behind every statistic is a story about how people find, and why they wind up visiting, each of my photos. Hopefully the fun of watching my old photographs will not distract me from taking some new ones!

this is a visual separator

The idea of a term paper

Displaying their usual talent for excerpt, the folks at Arts & Letters Daily directed my attention to a recent article in The Smart Set with this intriguing summary of its contents:

Term paper mill. Need $100 by Friday to keep the lights on? No sweat, if you’re a writer. Plenty of kids need ten pages on Hamlet by Thursday… more»

The article, entitled “Term Paper Artist”, alternates between hilarity and poignancy as its author shares his adventures writing hundreds of term papers for hire. But near the end, his tone suddenly becomes serious as he turns to the question of why so many students are unable to write term papers of their own. He thinks that the reason is important enough to stand alone in his article as a one-sentence paragraph. Here is his preceding paragraph, and then the zinger:

It’s not that I never felt a little skeevy writing papers. Mostly it was a game, and a way to subsidize my more interesting writing. Also, I’ve developed a few ideas of my own over the years. I don’t have the academic credentials of composition experts, but I doubt many experts spent most of a decade writing between one and five term papers a day on virtually every subject. I know something they don’t know; I know why students don’t understand thesis statements, argumentative writing, or proper citations.

It’s because students have never read term papers.

That is his diagnosis: students never see what term papers are supposed to look like, and so they have no idea how to produce them.

As he continued on, ridiculing the idea that students can produce something of which they are never once shown a good example, I realized that his argument was exactly the same as the one I made in my recent post Reading Code: A Computer Science Curriculum: that the production of any kind of literature, whether an essay in college or an elaborate routine in a computer program, is an essentially imitative act. Without being shown excellent examples from the genre they are expected to produce, students are left in the dark about what, exactly, they are trying to generate — and, more often than not, will fail. They are never even given the opportunity to demonstrate whether they do, in fact, lack the capacity to create, because they are never shown the goal towards which they are supposed to be striving.

this is a visual separator

Reading Code: A Computer Science Curriculum

I developed the following ideas about how to teach computer programming during a recent conversation with Daniel Rocco, a professor at the University of West Georgia, and Georgia Tech grad student Derek Richardson, and I wanted to expand on the ideas here on my blog. I have been heavily influenced by reading Greg Wilson, and, for all I know, he or someone else may have already put these ideas together into something like the outline below.

The first day of class

To teach computer programming, a professor ought to stride into the room on the semester’s first day, explain that a computer program is a text file full of instructions for the computer to follow, and then proceed to check out an example from version control. His desktop should be displayed on a big screen in front of the class. He should read through the example program with them, discuss it — it should be a Python program that does something requiring two or three dozen lines, like opening a window and displaying the message, “Hello, world!” — and then run it. He should demonstrate how to edit the program so that it prints something else instead (perhaps “Hello, professor!”), run the altered version for them, and, finally, check the modified program into version control.

The class now has the tools they need to complete their first assignment. The students are told that a version-control repository has been created for each of them; that they have each been subscribed to their repository’s email notification list; and that, upon returning home from class, their email inbox will already hold a message informing them that their first assignment — the “Hello, world!” program itself — has been checked into their repository by the course software. By the next class period they are to have checked the assignment out, altered it so that it prints “Hello” followed by their own name, and submitted the assignment by checking the program back in.

(more…)
this is a visual separator

Aphorism

Entertainment wants only to satisfy its audience;
but Art will teach us new desires.

(Penned after pondering the talk that Dana Gioia gave at Oxbridge 2008)

this is a visual separator

PyEphem 3.7.2.4, now on Launchpad!

PyEphem logo I have decided to give my PyEphem astronomy library for Python a public source code repository, an open forum for user questions, and a bug tracker where my users can see the progress of their bug reports out in the open rather than having them scattered across our email inboxes. To accomplish all of this, I simply registered PyEphem with Launchpad, a site built to host software projects that is already used by several projects for which I have great respect.

Because users might become confused now that PyEphem is spread across three web sites — the home page is here at rhodesmill.org, releases are posted over at the Python Package Index, and, again, the development project is now hosted at Launchpad — I have completely redesigned the PyEphem home page with the goal of making the three-site distinction clear, coherent, and easy to navigate. The new home page and documentation are generated by the wonderful Sphinx documentation engine, and I am still thrilled about how pretty my code samples look (check out the one on the PyEphem home page!) now that Sphinx is coloring them in with the renowned Pygments system.

I have simultaneously released a new version of PyEphem that includes the new Sphinx-based documentation, along with several important fixes to the software itself. From now on, rather than cluttering my own blog with every minor version of PyEphem that I might release, fans of the software should visit its News and announcements page on Launchpad and subscribe themselves to its Atom/RSS feed. You will still see the project mentioned here whenever a technical or scientific issue becomes interesting enough for me to write about; but the audience of astronomers and hobbyists who just need to know when the next version is released should not have to wade through my blog to do so!

My users have already begun transferring their questions and problems to Lauchpad, and I look forward to offering much greater accountability through a fully public development process.

this is a visual separator this decorates the bottom of the main column this decorates the bottom of the screen

Powered by WordPress