GraphicsMagick saved the day
March 10th, 2009
I had never heard of GraphicsMagick until yesterday, when I discovered that the venerable, if clunky, ImageMagick suite was ruining one of my customer's print jobs by producing invalid PDF files. This is actually the third major failure that this particular project has encountered because of flaws in standard open-source document tools. In this and my next two blog posts, I will outline the bugs that I have encountered, in the hopes of saving some future reader the time that it took me to track them down.
But I will begin the series rather simply, with the first two lessons that I learned during the project:
- 1. Always verify PDF correctness with Adobe Acrobat.
-
The trusty Xpdf viewer, with which I have viewed PDF files for years, turns out to have a remarkable ability to decipher and display even somewhat damaged PDF files. That's a great feature — if someone else has produced the PDF, and you just need to read it, whether it's damaged or not. But this “feature” becomes a problem if you have just produced a PDF, and want to know about any errors in the file before your customer does!
In this situation, Adobe's Acrobat Reader should be your viewer of choice. Not only is it probably the software that your customers will be using anyway, but it is — and this seems intentional on Adobe's part — a very stringent interpreter. The error it displays for a corrupt PDF, I must admit, is among the least-informative I have seen this month:
There was a problem reading this document (14)
But the information that this does yield is invaluable: your customer will not be able to see or print this PDF until you find the bug in your toolchain and fix the result. It is, of course, objectionable and problematic to have to install a closed-source product on what might otherwise be a completely clean development system; but I have found it irreplaceable for its ability to show me problems before my customers see them.
This mistake cost me more time than you might imagine. The tool chain I built for my customer generates several intermediate PDF files, and it turns out that the error was happening fairly early in the process — so that one of the first files produced, and therefore each subsequent PDF, was invalid and could not be opened with Acrobat. But because I was viewing them all with Xpdf, I spent many minutes looking at the final step in the chain and wondering why that PDF tool was often dying, when the PDFs it was consuming looked so good in my viewer!
- 2. Avoid ImageMagick 6.3.7 when producing PDFs; try GraphicsMagick instead!
-
More recent versions of ImageMagick might not have any problem producing PDF files from PNG images, but the ImageMagick that currently ships with Debian unstable, version 6.3.7, seems to have routine problems trying to turn some of my customer's PNG files into valid PDFs. To avoid having to compile, install, and maintain my own version of ImageMagick, I cast around for an alternative, and was startled when Google brought me to the GraphicsMagick project! Here was ImageMagick done right: instead of creating dozens of commands on your system, as though this were the 1970s, GraphicsMagick defines a single gm binary with multiple sub-commands:
$ gm convert watermark.png watermark.pdf
Check out their web site for more great features; but I'm simply happy that the PDFs it has produced so far have been clean, correct, and consistent.
A question for my readers: can a good, open-source PDF checker be found somewhere, that is at least as stringent as Adobe Acrobat? Leave a comment below if you have a suggestion; such a tool would have made this project considerably easier!
Posted:
Tuesday, March 10th, 2009 at 12:32 pm
Categories: Computing, Document processing
You can leave a response, or trackback from your own site.
I’ve had some experience with iText, an open-source Java package for PDF manipulation. The package is extremely robust and well-maintained. I would be surprised if it does not provide the validity checks you need… that is, if you’re willing to fire up a JVM.
March 10th, 2009 at 4:28 pmMeta question: do you have any idea why this (very interesting to me) post didn’t show up on Planet Python? I saw the Feb 26 post and the Mar 11 post about list comprehensions, but nothing in between.
March 17th, 2009 at 7:52 pmJason, thanks for letting me know about that resource! I’ll keep it in mind in case I ever deign to, as you say, fire up a virtual machine for one of my projects.
Marius, this post did not appear in Planet Python because I did not actually tag it “Python” here in my blog, and the link that I submitted to the Python planets specifically filters for that tag. I did try my best to work in some Python content, but nothing really fit; so I was careful to refer back to the post in some subsequent, Python-related posts, but as a good citizen couldn’t exactly claim that this was Python material.
March 17th, 2009 at 8:00 pmBrandon, YOU saved the day! It seems I already had gm installed; it seems I had ImageMagick 6.3.7 [ubuntu hardy] too! I was converting some png [from the gimp] to pdf; converting them for instance to eps in between made no difference; gm did though, thanks!
April 28th, 2009 at 7:33 amHi,
I am the principle GraphicsMagick maintainer and just ran across this posting. I am happy to hear that GraphicsMagick worked well for you.
Bob
September 21st, 2009 at 9:20 pm