<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>Let’s Discuss the Matter Further</title>
    <link>http://rhodesmill.org/brandon</link>
    <description>Thoughts and ideas from Brandon Rhodes</description>
    <pubDate>Tue, 08 May 2012 02:01:57 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>One sentence per line, please</title>
      <link>http://rhodesmill.org/brandon/2012/one-sentence-per-line/</link>
      <pubDate>Tue, 03 Apr 2012 01:25:09 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[document processing]]></category>
      <category><![CDATA[computing]]></category>
      <guid>http://rhodesmill.org/brandon/2012/one-sentence-per-line/</guid>
      <description>One sentence per line, please</description>

      <content:encoded><![CDATA[<div class="document">
<p>I give some advice each year
in my annual <a class="reference external" href="http://sphinx.pocoo.org/">Sphinx</a> tutorial
at <a class="reference external" href="https://us.pycon.org/">PyCon</a>.
A&nbsp;grateful student asked where I myself had learned the tip.
I&nbsp;have done some archæology and finally have an answer.
Let me share what I teach them about “semantic linefeeds,”
then I will reveal its source —
which turns out to have been written
when I was only a few months old!</p>
<p>In the tutorial,
I ask students whether or not
the Sphinx text files in their project will be read by end-users.
If not, then I encourage students to treat the files
as private “source code” that they are free to format semantically.
Instead of fussing with the lines of each paragraph
so that they all end near the right margin,
they can add linefeeds anywhere
that there is a break between ideas.</p>
<p>The result can be spectacular.</p>
<p><a href="http://rhodesmill.org/brandon/2012/one-sentence-per-line/">Read more</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Visible Indentation in Python Publishing</title>
      <link>http://rhodesmill.org/brandon/2011/visible-indentation/</link>
      <pubDate>Sun, 20 Feb 2011 23:00:28 EST</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[document processing]]></category>
      <category><![CDATA[books]]></category>
      <category><![CDATA[computing]]></category>
      <guid>http://rhodesmill.org/brandon/2011/visible-indentation/</guid>
      <description>Visible Indentation in Python Publishing</description>

      <content:encoded><![CDATA[
<p>It suddenly occurred to me that I managed to write <a href="http://rhodesmill.org/brandon/2011/foundations-of-python-network-programming/">an entire blog post about my new book last month</a> without so much as mentioning that it represents a landmark, so far as I know, in Python publishing.</p>
<a class="image-reference">
  <img src="http://rhodesmill.org/brandon/2011/chevron-sample.png"
       alt="Sample Python code with indentation marked"
       width="320" height="296" />
</a>
<p>What was my big idea? That in printed Python code, indentation should be visible.</p>
<p>Any of you who have had to read many Python listings printed in books will immediately recognize the problem that I wanted solved. When a listing is long enough to run on to a second page, it is often less than clear whether the code there is continuing at the same level of indentation, or whether the page break has just happened to correspond to a point in the code where it dedented out to the previous level. Languages with braces or explicit “end” statements to close blocks never have to worry about this. But in Python — especially where a script or code snippet ends without ever returning to the outermost level of indentation — the last few lines of the script feel as though they are left hanging if they stand alone at the top of a new page of text.</p>
<p>Of course, there were several practical considerations that had to be settled. A symbol for indentation had to be chosen, for example. I selected the Unicode double-chevron because it is a character that is never valid in actual Python code. Then the publisher had to be convinced to try the experiment; it helped that I had the full support of my editor, <a href="http://www.liveandletwrite.com/">Laurin Becker</a>, who also prepared the layout people for the fact that these chevrons were <i>not</i> part of the code and would need to remain visually distinct from it. Finally — because I had no desire to insert and color each chevron by hand — I had to write an <a href="http://lxml.de/">lxml</a> script to insert the chevrons into my OpenOffice documents, then go back and ruefully remove by hand the chevrons that got nonsensically inserted into snippets of other languages like HTML.</p>
<p>But now I want to hear what readers think! I have not yet seen a review that mentions whether the visible indentation helps, hurts, or is simply irrelevant to the Python listings. If you happen to have seen the new edition of <a href="http://www.amazon.com/gp/product/1430230037?ie=UTF8&tag=letsdisthemat-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=1430230037">Foundations of Python Network Programming</a><img src="http://www.assoc-amazon.com/e/ir?t=letsdisthemat-20&l=as2&o=1&a=1430230037" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />, let me know what you think. While creating the effect cost some time and effort, I will happily do it again in my next book if turns out to have actually helped readers scan the program listings.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Sphinx + Mercurial = My favorite CMS</title>
      <link>http://rhodesmill.org/brandon/2010/sphinx-mercurial-cms/</link>
      <pubDate>Sun, 21 Mar 2010 22:58:28 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[document processing]]></category>
      <category><![CDATA[computing]]></category>
      <guid>http://rhodesmill.org/brandon/2010/sphinx-mercurial-cms/</guid>
      <description>Sphinx + Mercurial = My favorite CMS</description>

      <content:encoded><![CDATA[
<p>
Though I write and maintain some of the content for our <a href="http://pyatl.org/">Python Atlanta web site</a>, updates and additional content often come in from other users. For example, our Plone interest group — headed up by <a href="http://www.ifpeople.net/about/people/cjj">Christopher Johnson</a> — has their <a href="http://pyatl.org/plone">own page on our web site</a>. And the information about our <a href="http://pyatl.org/bookclub">book club</a> is both written and regularly updated by <a href="http://www.doughellmann.com/">Doug Hellmann</a>.
</p>
<a class="image-reference" href="http://pyatl.org/">
  <img src="http://rhodesmill.org/brandon/static/2010/pyatl-thumb.png"
       alt="Python Atlanta web site"
       width="240" height="135" />
</a>
<p>
How can a collaborative site like ours best be edited and updated? Well, I would like to report some modest initial success with an experimental approach: I now maintain the site as a <a href="http://sphinx.pocoo.org/">Sphinx-powered</a> documentation system stored in a <a href="http://bitbucket.org/brandon/pyatl.org/">BitBucket repository</a> into which I pull changes made by my collaborators. The advantages are several.
</p>
<ul>
<li>The change management tools supported by traditional CMS systems, even at their best, seem somehow anemic when compared to the toolkit provided by a good DVCS like <a href="http://mercurial.selenic.com/">Mercurial</a>. Where, for example, does even a capable CMS like Plone provide anything like Mercurial's “backout” or “blame” commands?
</li>
<li>Markup as well-designed as <a href="http://docutils.sourceforge.net/rst.html">reStructuredText</a> is not only a lot of fun to use, bit it also very cleanly separates content from design. Authors working in plain text tend to produce clean, readable content without the messy markup often associated with visual HTML editors, or, worse yet, the disaster that is Microsoft Word.
</li>
<li>Staging — a feature I find essential, but which seems missing from many default CMS configurations — occurs automatically! Each author can see locally how the site will look with their changes, and after doing a pull I can review the site's appearance on my laptop before finally deploying the new content to the production site.
</li>
</ul>
<p>
To top it all off, authors get to use their own editor-of-choice when making contributions, and we all get extra practice cloning and merging in my favorite DVCS. I am optimistic about this direction, but I will post again if we wind up hitting snags in the future. Finally, of course, feel free to clone our repository if you want to see how Sphinx looks when running a generic web site.
</p>
]]></content:encoded>
    </item>
    <item>
      <title>The September 2009 issue of Python Magazine</title>
      <link>http://rhodesmill.org/brandon/2009/pymag-september/</link>
      <pubDate>Tue, 20 Oct 2009 01:41:07 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[document processing]]></category>
      <category><![CDATA[computing]]></category>
      <guid>http://rhodesmill.org/brandon/2009/pymag-september/</guid>
      <description>The September 2009 issue of Python Magazine</description>

      <content:encoded><![CDATA[
<a class="image-reference">
  <img src="http://rhodesmill.org/brandon/static/2009/pymag-september.jpg"
       alt="Cover of September 2009 Python Magazine"
       width="200" height="258" />
</a>
<p>
  The September issue of Python Magazine
  appeared on the web late last week
  and only now, as a new week has started,
  am I finally sitting down to announce it!
  The articles range from technically heavy development topics
  to high-level thoughts about the whole Python community,
  with plenty in between.
</p>
<p>
  I have to say that our prettiest article this month
  is “Using Python to Create Beautiful Documents” by Yusdi Santoso,
  who shares the basic secrets to document generation
  that he learned when building the
  <a href="http://www.europython.eu/">EuroPython 2009</a> brochure
  using a Python program.
  Traditional typesetting and computer typography
  were both interests of mine when I was growing up,
  so it was fun to read Yusdi's introduction to using
  <a href="http://www.reportlab.org/">ReportLab</a>
  to generate PDF documents.
  I look forward to his follow-up article
  that we will soon be publishing,
  on the specific techniques that he used in creating
  the EuroPython booklet.
</p>
<p>
  The other technical articles are an introduction
  to using <a href="http://www.w3.org/TR/soap/">SOAP</a> in Python;
  a guide to displaying objects in a Mac OS X GUI created with PyObjC;
  an article introducing Python's own built-in
  <a href="http://docs.python.org/library/tkinter.html">Tkinter</a>
  GUI toolkit;
  and a small excursion of my own
  that attempts to explain the popular “trick”
  (well, it really confused <i>me</i> the first time I saw it!)
  of defining a decorator using a pair of nested functions.
  I should confess that my own article
  contains what is probably this issue's biggest mistake,
  as pointed out quite promptly by alert reader Emanuel Woiski:
  in the code sample that is its whole crux of my example,
  I somehow managed to omit one of the most crucial lines,
  shown here in bold:
</p>


<div class="pygments_autumn"><pre><span class="k">def</span> <span class="nf">log</span><span class="p">(</span><span class="n">function</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">log_wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
        <span class="k">print</span> <span class="s">&quot;called </span><span class="si">%s%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span>
            <span class="n">function</span><span class="o">.</span><span class="n">__name__</span><span class="p">,</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
            <span class="p">)</span>
        <span class="o">&lt;</span><span class="n">b</span><span class="o">&gt;</span><span class="k">return</span> <span class="n">function</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span><span class="o">&lt;/</span><span class="n">b</span><span class="o">&gt;</span>
    <span class="k">return</span> <span class="n">log_wrapper</span>
</pre></div>



<p>
  I suppose I will now need remedial cut-and-paste training of some sort.
</p>
<p>
  Finally, the issue is rounded out
  by three articles that move back from Python coding
  and step out to wider vantage points.
  Justin Lilly provides an excellent guide
  to customizing your Vim setup
  so that it becomes a powerful Python
  integrated development environment.
  Steve Holden muses about why diveristy is so difficult
  and reveals some of the recent goings-on
  surrounding the diversity statement
  that the Python Software Foundation has been working on.
  And my own editorial seeks to point any Python Magazine readers
  who do not yet have a strong connection with the wider community
  in the direction of greater engagement
  with the world of Python.
</p>
<p>
  All in all, I think the issue
  is a nice mix of fact, experience, and opinion.
  Please consider subscribing if you would like to hear more
  about what people are doing with Python, and how.
  I enjoy reading it; so might you.
</p>
]]></content:encoded>
    </item>
    <item>
      <title>Applying PDF watermarks upside down</title>
      <link>http://rhodesmill.org/brandon/2009/pdf-watermarks-upside-down/</link>
      <pubDate>Tue, 07 Apr 2009 23:49:33 EDT</pubDate>
      <category><![CDATA[document processing]]></category>
      <category><![CDATA[computing]]></category>
      <guid>http://rhodesmill.org/brandon/2009/pdf-watermarks-upside-down/</guid>
      <description>Applying PDF watermarks upside down</description>

      <content:encoded><![CDATA[
<p>
Now that the excitement
of PyCon 2009 is over,
it is time for me to finish this brief series of blog posts
on watermarking PDF files.
In the
<a href="/brandon/2009/graphicsmagick-saved-the-day/"
   >first post I outlined how GraphicsMagick and Adobe Reader</a>
proved essential to the project
for their ability to produce correct PDF files
and then help me verify their correctness.
The
<a href="/brandon/2009/pdf-watermark-margins/"
   >second post showed how an image can be applied as a watermark</a>
using the <a href="http://www.pdfhacks.com/pdftk/">pdftk</a>
PDF Toolkit utility.
The resulting watermark,
after some margins had been added using a Python script,
looked rather attractive:
</p>
<div class="caption">
<a href="http://rhodesmill.org/brandon/static/2009/wmark2.pdf"
><img src="http://rhodesmill.org/brandon/static/2009/wmark2.png"
alt="Watermark with margins" />
</a><strong>Watermarked page (click for PDF).</strong>
The light blue design is a PDF file
that <tt>pdftk</tt> resizes and centers
on the target document.
</div>
<p>
My last challenge was that, on certain pages,
the watermark we were using
had to be turned upside down.
“Simple,” I thought, “I'll use <tt>pdftk</tt>
to turn the watermark over before applying it!”
I just had to process the watermark image
with the letter <tt>S</tt> (“south”),
which tells <tt>pdftk</tt> to rotate the image by 180°,
and then use the result as the watermark:
</p>
<pre>
$ pdftk arecibo2.pdf cat 1S output arecibo3.pdf
$ pdftk in.pdf background arecibo3.pdf output wmark3.pdf
</pre>
<div class="caption">
<a href="http://rhodesmill.org/brandon/static/2009/wmark3.pdf"
><img src="http://rhodesmill.org/brandon/static/2009/wmark3.png"
alt="Watermark with margins" />
</a><strong>Upside-down watermark (click for PDF).</strong>
Whoops! After turning the watermark upside down,
<tt>pdftk</tt> lost the ability to properly center it.
</div>
<p><a href="http://rhodesmill.org/brandon/2009/pdf-watermarks-upside-down/">Read more</a></p>]]></content:encoded>
    </item>
    <item>
      <title>Adding margins to PDF watermarks</title>
      <link>http://rhodesmill.org/brandon/2009/pdf-watermark-margins/</link>
      <pubDate>Sun, 15 Mar 2009 23:24:26 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[document processing]]></category>
      <category><![CDATA[computing]]></category>
      <guid>http://rhodesmill.org/brandon/2009/pdf-watermark-margins/</guid>
      <description>Adding margins to PDF watermarks</description>

      <content:encoded><![CDATA[
<p>
This is the second article in my series
on adding “watermark” images to PDF files,
which sit behind any text and graphics that were already on the page.
Last week I outlined
<a
href="http://rhodesmill.org/brandon/2009/graphicsmagick-saved-the-day/"
>the first two lessons that I learned
while developing this watermark process</a>:
first, always use Adobe Acrobat
to verify that you are creating valid PDFs in your toolchain,
and second, the version of GraphicsMagick
that currently comes with Debian unstable
produces better PDF files
than the version of ImageMagick they ship.
</p>
<p>
Then I digressed with a blog entry on a slightly different topic,
<a href="http://rhodesmill.org/brandon/2009/nested-comprehensions/"
>nested list comprehensions in Python</a>,
because I happened to write one
while creating the image we will use as our sample watermark.
It shows the famous Arecibo space message,
and is a <a href="/brandon/static/2009/arecibo.png">tiny image</a>
of only 23×73 pixels
that looks like this when enlarged:
</p>
<div class="caption">
<img src="http://rhodesmill.org/brandon/static/2009/arecibo-big.png" alt="Arecibo message" />
</div>
<p>
The basic watermarking process itself is very simple
thanks to a wonderful tool that I discovered
called <a href="http://www.pdfhacks.com/pdftk/"><tt>pdftk</tt></a>
(short for “PDF toolkit”)
which, as usual, Debian has already packaged for me.
It can rotate documents,
extract pages,
concatenate several files together,
and help fill out PDF forms from data in a file.
Of particular interest here is its ability to either “stamp” an image
on top of each page of a document,
or to place one in the background as a watermark.
</p>
<p>
The watermark image itself has to be a PDF file —
<tt>pdftk</tt> does not deal in any other file formats —
which is why I needed GraphicsMagick
to convert the Arecibo image into a PDF in the first place.
Putting the two steps together,
one has a primitive but workable process
for using a PNG image as a watermark:
</p>
<pre>
$ gm convert arecibo.png arecibo.pdf
$ pdftk in.pdf background arecibo.pdf output wmark1.pdf
</pre>
<div class="caption">
<a href="http://rhodesmill.org/brandon/static/2009/wmark1.pdf"
><img src="http://rhodesmill.org/brandon/static/2009/wmark1.png"
alt="Letter with basic watermark" />
</a><strong>Hefty watermark (click for PDF).</strong>
A first attempt at watermarking results in a huge watermark
that reaches both to the top and bottom edges of the page.
</div>
<p>
As you can see,
<tt>pdftk</tt> automatically adjusts the size of the watermark image
to reach precisely to the edges of the page being marked —
which is a huge favor
given the difficulty I would have had
in resizing the watermark myself
to match the page size of the input file.
But, in the above case,
the result seems less than perfectly attractive;
watermarks usually sit tidily near the center of a page,
rather than running all the way against its edges.
</p>
<p>
Clearly, we want to add some margins to the watermark.
And though margins are easy to add to some image formats —
they would be simple to add
to the <tt>arecibo.png</tt> file that we are using
in this example —
in actual practice I need to support watermarks
that might be in vector formats like SVG or EPS.
While I could go through each possible input format
and contrive some way of adjusting its margins,
it would obviously be much more convenient
to convert everything to PDF first,
and then add margins directly to the PDFs.
</p>
<p>
I used Debian's <tt>apt-cache</tt> <tt>search</tt> command
to look for additional tools that might help me
(which is how I found <tt>pdftk</tt> in the first place!)
and found an old command called <tt>pdfcrop</tt>
that was part of the <tt>texlive</tt> series of packages;
it supports a <tt>--margins</tt> option
with which whitespace can be added around a PDF file.
But I found that it often would refuse to process
a perfectly good PDF file
with a horribly uninformative error message like:
</p>
<pre>
Error: Cannot move `tmp-pdfcrop-10631.pdf' to `out.pdf'!
</pre>
<p>
I tried to investigate the error message,
but discovered that <tt>pdfcrop</tt> is actually a Perl script
that writes LaTeX macros
which are then run against the target PDF file.
And it was last updated in 2004.
I have, alas, elected not to make it part of my toolchain.
</p>
<p>
Then I discovered that Python itself
has a <a href="http://pybrary.net/pyPdf/"
>quite serviceable PDF package named pyPdf</a>,
with the bonus that it is written in pure Python
and therefore requires no external libraries!
Thanks to its ability to adjust the “bounding box”
that defines the edges of an image in PDF coordinates,
adding margins was as simple as loading the image,
doing some addition and subtraction,
and then saving the result.
To add modest 10-point margins to the Arecibo message,
for example, we can create this <tt>wmargins.py</tt> script:
</p>


<div class="pygments_autumn"><pre><span class="kn">from</span> <span class="nn">pyPdf</span> <span class="kn">import</span> <span class="n">PdfFileWriter</span><span class="p">,</span> <span class="n">PdfFileReader</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">PdfFileReader</span><span class="p">(</span><span class="nb">file</span><span class="p">(</span><span class="s">&#39;arecibo.pdf&#39;</span><span class="p">,</span> <span class="s">&#39;rb&#39;</span><span class="p">))</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">pdf</span><span class="o">.</span><span class="n">getPage</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">for</span> <span class="n">box</span> <span class="ow">in</span> <span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">mediaBox</span><span class="p">,</span> <span class="n">p</span><span class="o">.</span><span class="n">cropBox</span><span class="p">,</span> <span class="n">p</span><span class="o">.</span><span class="n">bleedBox</span><span class="p">,</span>
                                    <span class="n">p</span><span class="o">.</span><span class="n">trimBox</span><span class="p">,</span> <span class="n">p</span><span class="o">.</span><span class="n">artBox</span><span class="p">):</span>
    <span class="n">box</span><span class="o">.</span><span class="n">lowerLeft</span> <span class="o">=</span> <span class="p">(</span><span class="n">box</span><span class="o">.</span><span class="n">getLowerLeft_x</span><span class="p">()</span> <span class="o">-</span> <span class="mi">10</span><span class="p">,</span>
                     <span class="n">box</span><span class="o">.</span><span class="n">getLowerLeft_y</span><span class="p">()</span> <span class="o">-</span> <span class="mi">10</span><span class="p">)</span>
    <span class="n">box</span><span class="o">.</span><span class="n">upperRight</span> <span class="o">=</span> <span class="p">(</span><span class="n">box</span><span class="o">.</span><span class="n">getUpperRight_x</span><span class="p">()</span> <span class="o">+</span> <span class="mi">10</span><span class="p">,</span>
                      <span class="n">box</span><span class="o">.</span><span class="n">getUpperRight_y</span><span class="p">()</span> <span class="o">+</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">output</span> <span class="o">=</span> <span class="n">PdfFileWriter</span><span class="p">()</span>
<span class="n">output</span><span class="o">.</span><span class="n">addPage</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="n">output</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s">&#39;arecibo2.pdf&#39;</span><span class="p">,</span> <span class="s">&#39;wb&#39;</span><span class="p">))</span>
</pre></div>



<p>
You can test this yourself by installing pyPdf
in a convenient temporary directory with <tt>virtualenv</tt>,
running the above script, then calling <tt>pdftk</tt> on the result:
</p>
<pre>
$ virtualenv vpython
$ vpython/bin/easy_install pyPdf
$ vpython/bin/python wmargins.py
$ pdftk in.pdf background arecibo2.pdf output wmark2.pdf
</pre>
<div class="caption">
<a href="http://rhodesmill.org/brandon/static/2009/wmark2.pdf"
><img src="http://rhodesmill.org/brandon/static/2009/wmark2.png"
alt="Watermark with margins" />
</a><strong>Watermark with margins (click for PDF).</strong>
Margins prevent the watermark from reaching the page edges,
which allows the blocks of text to assume the role
of defining the visual shape of the page.
</div>
<p>
All pretty simple, right?
Well, it turns out that there was one final complication —
and that, before I was finished, I actually wound up spending
more than an hour reading the PDF specification
in order to understand what, exactly, was going wrong!
But that will be the topic for <a href="http://rhodesmill.org/brandon/2009/pdf-watermarks-upside-down/">my last blog post in this series.</a>
Stay tuned.
</p>
]]></content:encoded>
    </item>
    <item>
      <title>GraphicsMagick saved the day</title>
      <link>http://rhodesmill.org/brandon/2009/graphicsmagick-saved-the-day/</link>
      <pubDate>Tue, 10 Mar 2009 12:32:55 EDT</pubDate>
      <category><![CDATA[document processing]]></category>
      <category><![CDATA[computing]]></category>
      <guid>http://rhodesmill.org/brandon/2009/graphicsmagick-saved-the-day/</guid>
      <description>GraphicsMagick saved the day</description>

      <content:encoded><![CDATA[
<p>
I had never heard of
<a href="http://www.graphicsmagick.org/">GraphicsMagick</a> until yesterday,
when I discovered that the venerable, if clunky,
<a href="http://www.imagemagick.org/">ImageMagick</a> suite
was ruining one of my customer's print jobs
by producing invalid PDF files.
This is actually the third major failure
that this particular project has encountered
because of flaws in standard open-source document tools.
In this and my next two blog posts,
I will outline the bugs that I have encountered,
in the hopes of saving some future reader
the time that it took me to track them down.
</p>
<p>
But I will begin the series rather simply,
with the first two lessons that I learned during the project:
</p>
<dl>
<dt><b>1. Always verify PDF correctness with Adobe Acrobat.</b></dt>
<dd>
<p>
The trusty <a href="http://www.foolabs.com/xpdf/">Xpdf</a>
viewer, with which I have viewed PDF files for years,
turns out to have a remarkable ability
to decipher and display even somewhat damaged PDF files.
That's a great feature — <i>if</i> someone else has produced the PDF,
and you just need to read it, whether it's damaged or not.
But this “feature” becomes a problem
if you have just <i>produced</i>
a PDF, and want to know about any errors in the file
before your customer does!
</p>
<p>
In this situation, Adobe's
<a href="http://get.adobe.com/reader/">Acrobat Reader</a>
should be your viewer of choice.
Not only is it probably the software
that your customers will be using anyway,
but it is — and this seems intentional on Adobe's part —
a very stringent interpreter.
The error it displays for a corrupt PDF, I must admit,
is among the least-informative I have seen this month:
</p>
<pre>
There was a problem reading this document (14)
</pre>
<p>
But the information that this <i>does</i> yield is invaluable:
your customer will not be able to see or print this PDF
until you find the bug in your toolchain and fix the result.
It is, of course, objectionable and problematic
to have to install a closed-source product
on what might otherwise be a completely clean development system;
but I have found it irreplaceable
for its ability to show me problems
before my customers see them.
</p>
<p>
This mistake cost me more time than you might imagine.
The tool chain I built for my customer generates
several intermediate PDF files,
and it turns out that the error was happening
fairly early in the process —
so that one of the first files produced,
and therefore each subsequent PDF,
was invalid and could not be opened with Acrobat.
But because I was viewing them all with Xpdf,
I spent many minutes looking at the final step in the chain
and wondering why that PDF tool was often dying,
when the PDFs it was consuming looked so good in my viewer!
</p>
</dd>
<dt><b>2. Avoid <a href="http://www.imagemagick.org/">ImageMagick</a>
6.3.7 when producing PDFs;
try <a href="http://www.graphicsmagick.org/">GraphicsMagick</a> instead!</b>
</dt>
<dd>
<p>
More recent versions of ImageMagick might not have any problem
producing PDF files from PNG images,
but the ImageMagick that currently ships with Debian unstable,
version 6.3.7,
seems to have routine problems trying to turn some of my customer's
PNG files into valid PDFs.
To avoid having to compile, install, and maintain
my own version of ImageMagick,
I cast around for an alternative,
and was startled when Google brought me to
the <a href="http://www.graphicsmagick.org/">GraphicsMagick</a> project!
Here was ImageMagick done right:
instead of creating dozens of commands on your system,
as though this were the 1970s,
GraphicsMagick defines a single <tt>gm</tt> binary
with multiple sub-commands:
</p>
<pre>
$ gm convert watermark.png watermark.pdf
</pre>
<p>
Check out their web site for more great features;
but I'm simply happy that the PDFs it has produced so far
have been clean, correct, and consistent.
</p>
</dd>
</dl>
<p>
A question for my readers:
can a good, open-source PDF checker be found somewhere,
that is at least as stringent as Adobe Acrobat?
Leave a comment below if you have a suggestion;
such a tool would have made this project considerably easier!
</p>]]></content:encoded>
    </item>
  </channel>
</rss>

