Installing "lxml" for Python under your WebFaction account

Well, drat.

Thanks to more than an hour of work today, I have a pretty list of a few dozen commands that make it easy for a WebFaction account holder to install the powerful lxml Python package for parsing HTML and XML under their hosting account. You can read Ian Bicking's wonderful blog post “lxml: an underappreciated web scraping library” for more information on why you want to be using lxml instead of any of its alternatives.

So, why do I say “drat”?

First, because I just tried out my instructions on another of my WebFaction accounts, and there the extra steps weren't even necessary; this other server of theirs already had lxml's dependencies installed! I suppose, had I been a bit more patient, that this support ticket that I glanced over this morning would have inspired me to ask WebFaction to install the libraries lxml needs on the server where I myself was working. But it felt like some sort of offense against symmetry to rely on something that WebFaction doesn't install everywhere, and I was perhaps just in too big of a hurry. Which, of course, cost more time in the end.

The other reason I say “drat” is because, now that I look at Ian's post again after all these months, I see that he has instructions for making the package install its own dratted copies of the system libraries it needs! Too bad that lxml's own installation instructions omit this crucial piece of information.

How typical, and how predictable. It turns out that I just needed to listen to Ian Bicking more carefully. How often we fail to do that, as individuals and as a Python community. Listen to Ian Bicking, everyone. Listen.

In the meantime, here are some successful and unsuccessful ways of installing lxml under your WebFaction account. Consider the following to be a set of choose-your-own adventure scenarios.

  • If the WebFaction host your account lives on already has libxml and libxslt installed, then installation is simple:

    $ easy_install lxml
    Searching for lxml
    Reading http://pypi.python.org/simple/lxml/
    ...
    Finished processing dependencies for lxml
    
  • If your WebFaction host lacks libxml, but you listen to Ian Bicking and download the source code yourself, then your install will succeed:

    $ wget http://pypi.python.org/.../lxml-2.2.2.tar.gz
    $ tar xfz lxml-2.2.2.tar.gz
    $ cd lxml-2.2.2
    $ STATIC_DEPS=true python setup.py install
    ...
    Finished processing dependencies for lxml==2.2.2
  • If your WebFaction host lacks libxml, but you listen to Ian Bicking, but you rely on easy_install to fetch the package, then your install will fail because it tries building inside of a temporary directory that, on WebFaction, you apparently cannot access:

    $ STATIC_DEPS=true easy_install lxml
    Searching for lxml
    Reading http://pypi.python.org/simple/lxml/
    ...
    Running "./configure --without-python --disable-dependency-tracking
     --disable-shared --prefix=/tmp/easy_install-81ufo5/lxml-2.2.2/buil
    d/tmp/libxml2" in build/tmp/libxml2-2.7.3
    error: Permission denied
  • If your WebFaction host lacks libxml, and you fail to listen to Ian Bicking, then you can at least install lxml and its dependencies manually using the following commands, as I worked out this morning. The trick is that instead of trying to tell setup.py where you have installed the libraries by using CC= at the beginning of the command line or something like that, you need to make sure that the special command xslt-config is on your path somewhere:

    $ cd ~
    $ mkdir usr
    $ mkdir usr/src
    $ cd usr/src
    $ wget ftp://xmlsoft.org/.../libxml2-2.7.3.tar.gz
    $ wget ftp://xmlsoft.org/.../libxslt-1.1.24.tar.gz
    $ tar xfz libxml2-2.7.3.tar.gz
    $ tar xfz libxslt-1.1.24.tar.gz
    $ cd libxml2-2.7.3
    $ ./configure --prefix ~/usr
    $ make install
    $ cd ..
    $ cd libxslt-1.1.24
    $ ./configure --prefix ~/usr
    $ make install
    $ cd ..
    $ PATH=$HOME/usr/bin:$PATH
    $ wget http://pypi.python.org/.../lxml-2.2.2.tar.gz
    $ tar xfz lxml-2.2.2.tar.gz
    $ cd lxml-2.2.2
    $ python setup.py install
    ...
    Finished processing dependencies for lxml==2.2.2

But, as I mentioned, Ian's technique is faster. :-)

Posted: Saturday, August 1st, 2009 at 6:03 pm
Categories: Computing, Python, Web Notes

You can leave a response, or trackback from your own site.

  • David

    Hi!

    Thanks for the info on how to install lxml from sources.
    I am having the hardest time of my life trying to build it from sources. Could you tell me the steps in more details, I installed libxml2 2.7.2, libxslt 1.1.15, and lxml 2.2 but when I run make on lxml, it gives all sort of errors.

    You talked about xslt-config, what should I do with it ? How about paths to the dependencies ? How should I do that ?
    Cheers

  • Jorge Vargas

    Brandon there is a little bug in your commands.

    STATIC_DEPS=true python setup.py install

    Should be STATIC_DEPS=true pythonX.Y setup.py install

    where X.Y are the python version you want it installed for.

  • ben

    I got an odd error after using the STATIC_DEPS method where lixml/libxslt existed but were older versions.

    >>> from lxml import etree
    ImportError: /home/****/lib/python2.5/lxml-2.2.4-py2.5-linux-i686.egg/lxml/etree.so: undefined symbol: gcry_check_version

    A google search turned up some references to libcrypt of all things, so I’m baffled. I assume that there’s an issue between the existing libxml2 and the dep version for my particular context. Anyways, if anyone hits a wall and already has the dependencies on webfaction, the below method took care of it, but I’m stuck with the older libxml/libxslt builds where I was hoping to take them up to the latest available.

    easy_install-2.5 –always-unzip -s $HOME/bin -d $HOME/lib/python2.5 lxml

  • Emil Stenström

    Thanks alot for these simple instructions! They work not only on Webfaction, but on other shared hosts, such as Site5! I’m guessing there are many more out there…

    Commands that did it for me:
    $ cd lxml-2.2.4
    $ STATIC_DEPS=true python setup.py install –home=..

    Where home points to your home directory, that is expected to contain a lib directory, with a python directory in it. You need to have added that directory to your PYTHONPATH (via your .bash_profile).

    Happy hacking!

  • Mitch

    Unfortunately, these steps don’t work on Dreamhost. When I follow the steps to bullet point #2 or bullet point #4 and then type the following:
    $ python
    >>> import lxml
    >>> from lxml import etree

    I get:
    Traceback (most recent call last):
    File “”, line 1, in ?
    ImportError: /home/pshack/virtualenvs/polurls.com/lib/python2.4/site-packages/lxml-2.2.6-py2.4-linux-x86_64.egg/lxml/etree.so: symbol xmlSchematronSetValidStructuredErrors, version LIBXML2_2.6.32 not defined in file libxml2.so.2 with link time reference

    Looks to me that lxml isn’t finding the libxml2 or libxslt libraries unless I set an environment varialbe called LD_LIBRARY_PATH and point it to my custom install directory (could install them in /usr/lib/ since I’m on a shared host with no root access). From my research, having to set LD_LIBRARY_PATH to get things to work is a bad hack.

    Even worse, passenger_wsgi doesn’t seem to recognize the LD_LIBRARY_PATH variable, so my site doesn’t run unless I run the Django development server from the command line. Blah, I f-ing hate lxml.

  • Syntax

    Mitch,
    I managed to solve the same issue on Dreamhost,

    You can remove the need for “LD_LIBRARY_PATH” if you build lxml with “–auto-rpath” which turns on GCC’s “-rpath” flag at compile time (pointing it at the paths to the dynamic libraries you want to use).

    The steps are:
    - get lxml sources (I used: http://pypi.python.org/packages/source/l/lxml/lxml2.2.6.tar.gz)
    - build and install with: “python setup.py install –auto-rpath”

    and you should be good.

  • Sean F

    Here’s an easy way to install lxml on a shared WebFaction server:
    mkdir $HOME/tmp
    export TEMP=$HOME/tmp
    CFLAGS="$CFLAGS -lgcrypt -fPIC" STATIC_DEPS=true easy_install-2.6 lxml

  • Paulus

    Syntax, thanks, that solve the problem for me.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>