Installing "lxml" for Python under your WebFaction account

Well, drat.

Thanks to more than an hour of work today, I have a pretty list of a few dozen commands that make it easy for a WebFaction account holder to install the powerful lxml Python package for parsing HTML and XML under their hosting account. You can read Ian Bicking's wonderful blog post “lxml: an underappreciated web scraping library” for more information on why you want to be using lxml instead of any of its alternatives.

So, why do I say “drat”?

First, because I just tried out my instructions on another of my WebFaction accounts, and there the extra steps weren't even necessary; this other server of theirs already had lxml's dependencies installed! I suppose, had I been a bit more patient, that this support ticket that I glanced over this morning would have inspired me to ask WebFaction to install the libraries lxml needs on the server where I myself was working. But it felt like some sort of offense against symmetry to rely on something that WebFaction doesn't install everywhere, and I was perhaps just in too big of a hurry. Which, of course, cost more time in the end.

The other reason I say “drat” is because, now that I look at Ian's post again after all these months, I see that he has instructions for making the package install its own dratted copies of the system libraries it needs! Too bad that lxml's own installation instructions omit this crucial piece of information.

How typical, and how predictable. It turns out that I just needed to listen to Ian Bicking more carefully. How often we fail to do that, as individuals and as a Python community. Listen to Ian Bicking, everyone. Listen.

(more...)

Posted in Computing, Python, Web Notes | 4 Comments »

New Year's meme: What are the oldest files in your home directory?

Celebrate the new year with a blog post discussing the oldest files that are still sitting somewhere beneath your home directory! The procedure is simple:

  1. Run the following script in your home directory. (You might want to use less to read the output.)
  2. Ignore files whose date does not reflect your own activity.
  3. List the oldest files in a blog post and discuss!
#!/usr/bin/env python
"""Print last-modified times of files beneath '.', oldest first."""
import os, os.path, time
paths = ( os.path.join(b,f) for (b,ds,fs) in os.walk('.') for f in fs )
for mtime, path in sorted( (os.lstat(p).st_mtime, p) for p in paths ):
    print time.strftime("%Y-%m-%d", time.localtime(mtime)), path

Only include files whose last-modified time is a date on which you really touched the file. The file's time should neither result from an error (a few files beneath my own home directory have an incorrect date of 1970-01-01), nor from unpacking someone else's archive that has old files inside of it. For example, I myself have excluded the following pair of nearly 17-year-old files because their dates reflect their age inside of the Python 3.0 source archive, instead of the actual moment last month when they became part of my home directory:

1992-03-02 ./src/Python-3.0/Demo/scripts/wh.py
1992-03-02 ./src/Python-3.0/Tools/scripts/dutree.doc

But there is no requirement that the actual content of each file you list be your own. Whether you wrote the file yourself long ago, or downloaded it from some ancient and forgotten FTP site, you have a story to share!

Within the rules given above, here are the oldest files beneath my own home directory:

(more...)

Posted in Computing, Python, Web Notes | 13 Comments »

Rise and Fall of the Two Waldos

I am experiencing my Flickr photostream in an entirely new way thanks to the tools they introduced this year for monitoring the traffic received by individual photographs. The old, static parts of my photostream suddenly look dynamic: I can see the rate at which each old photo is continuing to attract viewers. On October 17th, for example, I was stunned to discover that the perennial favorite Harry Potter Lunchbox had, over the previous day, received fewer views than the perpetually distant second, “My Shirt” (both taken at Dragon*Con 2005). Here is Flickr’s graph of how “My Shirt” fared over the month of October, with my mouse over October 17th to highlight the day on which I first noticed its growing popularity:

graph of the photograph popularity The rise and fall of the popularity of “My Shirt” over the four weeks from 5 October through 1 November.

And by drilling down into the list of “Referrers” beneath the graph, I was even able to discover the source of its brief popularity! As Halloween approached, people were doing hundreds of Google Image searches for “waldo costume”, “where’s waldo costume”, and “where’s waldo shirt”, which brought them straight to my image. As you can see in the above graph, the swelling interest did not peak until Halloween itself, after which the photo plummeted back to its more typical popularity of one or two dozen views per day.

All sorts of gems are hidden in the statistics, waiting to be discovered. For example, it was very satisfying to learn that Yahoo! Image Search considers my wedding photograph A Grandfather in Attendance to be the most important “grandfather” image on the entire web. Behind every statistic is a story about how people find, and why they wind up visiting, each of my photos. Hopefully the fun of watching my old photographs will not distract me from taking some new ones!

Posted in Web Notes | No Comments »

The idea of a term paper

Displaying their usual talent for excerpt, the folks at Arts & Letters Daily directed my attention to a recent article in The Smart Set with this intriguing summary of its contents:

Term paper mill. Need $100 by Friday to keep the lights on? No sweat, if you’re a writer. Plenty of kids need ten pages on Hamlet by Thursday… more»

The article, entitled “Term Paper Artist”, alternates between hilarity and poignancy as its author shares his adventures writing hundreds of term papers for hire. But near the end, his tone suddenly becomes serious as he turns to the question of why so many students are unable to write term papers of their own. He thinks that the reason is important enough to stand alone in his article as a one-sentence paragraph. Here is his preceding paragraph, and then the zinger:

It’s not that I never felt a little skeevy writing papers. Mostly it was a game, and a way to subsidize my more interesting writing. Also, I’ve developed a few ideas of my own over the years. I don’t have the academic credentials of composition experts, but I doubt many experts spent most of a decade writing between one and five term papers a day on virtually every subject. I know something they don’t know; I know why students don’t understand thesis statements, argumentative writing, or proper citations.

It’s because students have never read term papers.

That is his diagnosis: students never see what term papers are supposed to look like, and so they have no idea how to produce them.

As he continued on, ridiculing the idea that students can produce something of which they are never once shown a good example, I realized that his argument was exactly the same as the one I made in my recent post Reading Code: A Computer Science Curriculum: that the production of any kind of literature, whether an essay in college or an elaborate routine in a computer program, is an essentially imitative act. Without being shown excellent examples from the genre they are expected to produce, students are left in the dark about what, exactly, they are trying to generate — and, more often than not, will fail. They are never even given the opportunity to demonstrate whether they do, in fact, lack the capacity to create, because they are never shown the goal towards which they are supposed to be striving.

Posted in Web Notes | No Comments »

PyEphem 3.7.2.4, now on Launchpad!

PyEphem logo I have decided to give my PyEphem astronomy library for Python a public source code repository, an open forum for user questions, and a bug tracker where my users can see the progress of their bug reports out in the open rather than having them scattered across our email inboxes. To accomplish all of this, I simply registered PyEphem with Launchpad, a site built to host software projects that is already used by several projects for which I have great respect.

Because users might become confused now that PyEphem is spread across three web sites — the home page is here at rhodesmill.org, releases are posted over at the Python Package Index, and, again, the development project is now hosted at Launchpad — I have completely redesigned the PyEphem home page with the goal of making the three-site distinction clear, coherent, and easy to navigate. The new home page and documentation are generated by the wonderful Sphinx documentation engine, and I am still thrilled about how pretty my code samples look (check out the one on the PyEphem home page!) now that Sphinx is coloring them in with the renowned Pygments system.

I have simultaneously released a new version of PyEphem that includes the new Sphinx-based documentation, along with several important fixes to the software itself. From now on, rather than cluttering my own blog with every minor version of PyEphem that I might release, fans of the software should visit its News and announcements page on Launchpad and subscribe themselves to its Atom/RSS feed. You will still see the project mentioned here whenever a technical or scientific issue becomes interesting enough for me to write about; but the audience of astronomers and hobbyists who just need to know when the next version is released should not have to wade through my blog to do so!

My users have already begun transferring their questions and problems to Lauchpad, and I look forward to offering much greater accountability through a fully public development process.

Posted in Computing, PyEphem, Python, Web Notes | 1 Comment »

Wordle

What fun! An application has been placed on the Web named Wordle which, given some paragraphs of text as input, produces very striking images by drawing the most important words from your document so that they are largest. The basic idea is a long-standing one on the Web, as exemplified in dozens of sites with busy and ugly tag clouds whose halfhearted attempt to create interest by varying their font size barely makes the idea worthwhile. But viewing Wordle, I am simply startled that word frequency analysis can produce something so beautiful! Here, as an example, someone has submitted the Constitution:

Constitution WordleUnited States Constitution

One can spend several minutes just staring at the words so basic to our national life, and pondering the significance of their relative sizes! To make my own contribution to the burgeoning world of Wordle documents, I created a program to extract the memorial messages from the Marshall Booth Guest Book on legacy.com, and then submitted the result to Wordle. After several tries, and after experimenting with the color options, I came up with something I find quite satisfactory:

Marhsall Booth WordleMarshall Booth memorial

I am sure that Wordle documents will look rather formulaic once everyone has used them to generate Christmas cards one or two years in a row. But they are without question of much greater visual interest than any other tag cloud I have ever seen, and are therefore a big step forward simply by making word frequency something worth staring at.

It would be fun to submit novels, or theological treatises, or each of the books of Paul, to Wordle and then see whether students of English literature or Biblical exegesis could identify the original document simply by which words appeared the most often. I think there would be interesting surprises! Could people tell apart the five acts of Hamlet?

Posted in Computing, Web Notes | No Comments »

A database symbol for GraphViz

Download the source for my GraphViz database symbol featured in this article: DatabaseShape.ps
I have started using the GraphViz application, which accepts a list of nodes and arrows, and figures out how to attractively arrange them in a diagram. For example, you can very nearly produce this output: by supplying this rather modest input file to GraphViz (most of whose length comes from my wanting particular colors)::
 digraph Application {
    rankdir=LR;
    node [shape=box,style=filled,fillcolor="#C0D0C0"];
    subgraph clusterClient {
       label="Client"; style=filled; bgcolor="#D0C0A0";
       "Browser";
    };
    subgraph clusterServer {
       label="Server"; style=filled; bgcolor="#D0C0A0";
       "App";
       "Database" [shape=DatabaseShape,peripheries=0];
    };
    "Browser" -> "App" [label="HTTP"];
    "App" -> "Database" [label="SQL"];
 }
I used the words “very nearly” because, in fact, GraphViz only knows how to draw simple shapes like rectangles, and is ignorant of the standard cylinder-shaped database symbol that I have used here by asking for a DatabaseShape. Submitting the above code to GraphViz will, normally, produce three nodes that are all rectangles. To teach it about the database shape, I had to write some PostScript. (more...)

Posted in Computing, Web Notes | No Comments »

Check!

I treated my Facebook account as little more than a curiosity, browsing occasionally for friends of my youth, until discovering the Chess Application! Online chess relieves the game of two horrors for me, of which I had not even been aware until I noticed them during this first online game because of their absence: the horror of a waiting opponent, and the curse of having to wait for them in turn! It turns out that, for a novice like myself who can take upwards of a half-hour to even begin to appreciate the complexity of a given position, to play a live game is only to be rushed through a series of bad decisions. But now, over morning coffee, I can ponder the board for as long as I wish, and can therefore begin — just begin — to glimpse the beauty of the complex possibilies that each move offers. And then, my move complete, I can go about my day without being fixed, inactive, in a chair while my opponent weighs his decision. I will have to thank Ilan for telling me about this — maybe even by playing him in person sometime, like he originally asked!

Posted in Web Notes | No Comments »