I finally gave PrinceXML a quick look after my inquiries regarding generating PDFs (on the server) with Firefox/Gecko. I received no less than ten recommendations advocating Prince so I figured I'd write up the experience as recompense.
Prince is basically a print / typesetting / PDF generation tool based on popular web standards (HTML, XML, CSS, SVG, etc). For applications that are developed for the web first and then want PDF/print capabilities added on top, a tool like Prince makes a lot of sense. You can take your web content, sprinkle on a bit of print specific CSS, and get very acceptable PDF output. This beats the pants off having to write and maintain a separate LaTeX, DocBook, FO, or raw PDF apparatus to get good print support for your primarily web-based content.
I'm not going to get into a whole lot of Prince’s more advanced features here; there’s a demo of Prince in action on YouTube that lays everything out nicely and the documentation is available on the web.
First Impressions
The first thing I noticed — after the restrictive license and the $3,800 price tag — was that, for a proprietary, closed source software outfit, these people appear to have their shit together. I was expecting something really bloated and over-engineered, designed primarily for Windows, maybe with a crappy port to one of the “enterprise” Linux distributions. I have no idea why I thought that was the case – maybe because I knew it was payware; or, maybe it was due to the product name having “XML” in it when XML isn’t really all that interesting to what the product actually does. The reality is that the product is well-designed, very lightweight, and has packages available for Windows, MacOS X, Solaris 10, various Linux distros, and FreeBSD.
EDIT: I should have noted that there is “a free Personal license for interactive use on a single computer” that embeds a small P on the upper right-hand area of the first page. The $3,800 license mentioned above is for running Prince on the server, which is what I'm most interested in, and is their most expensive option. Professional and Academic licenses are also available and are significantly more affordable. My apologies for any confusion this may have caused.
You won’t find Prince in your package or ports repository but the simple packages provided are the next best thing. There doesn’t seem to be any external/dynamic library dependencies, which is weird considering all of the stuff it has to do: SGML/XML/HTML/CSS parsing, PDF generation, HTTP interaction (including SSL and crypto support), multiple image format decode, text layout, embedded SVG support, etc. The compressed distributables are about 4MB on average. The lone prince binary on MacOS weighs in at a mere 11MB (and that includes both PPC and x86 versions of the program). This tiny binary appears to be entirely self contained. A feat of coding, or maybe build engineering – impressive either way.
I'm assuming its written in C (EDIT: it’s actually Mercury but compiles down to C), which is always a good thing in my book so long as I'm not the maintainer. You typically don’t get the aforesaid attributes out of any other language environment. And there’s an abundance of MIT/BSD licensed library code in C floating around out there that could be used to cobble together most of the baseline functionality mentioned above.
All of this to say that Prince passes my don’t waste my time test with flying colors. I'm pleasantly surprised by the whole experience thus far and ready to put it through its paces.
Read The License
Before we do anything, you should really read the license, paying special attention to this bit:
Licensee shall not modify, adapt, translate or create derivative works based upon the Software. Licensee shall not reverse engineer, decompile, disassemble or otherwise attempt to discover the source code of the Software.
We’re about to officially become Licensees so you should be cool with that before proceeding.
Installation
The following assumes you’re running on MacOS X and have /usr/local/bin on your PATH. The process should be basically the same on Linux or FreeBSD:
$ cd /tmp
$ curl http://www.princexml.com/download/prince-6.0r5-macosx.tar.gz | tar xvzf -
$ cd prince*
$ sudo ./install.sh < /dev/null
Prince 6.0
Install directory
This is the directory in which Prince 6.0 will be installed.
Press Enter to accept the default directory or enter an alternative.
[/usr/local]:
Installing Prince 6.0...
Creating directories...
Installing files...
Installation complete.
Thank you for choosing Prince 6.0, we hope you find it useful.
Please visit http://www.princexml.com for updates and development news.
I'm usually one to insist on package management but this was pretty painless and there’s no library/header spew so uninstalling isn’t a big deal.
Kicking The Tires
We’ll start with something simple – this site’s main index:
$ time prince -o tomayko.pdf http://tomayko.com
real 0m7.819s
user 0m1.670s
sys 0m0.222s
$ open tomayko.pdf
The result, tomayko.pdf, looks great considering I've made no print-specific tweaks. The margins I have set for browser viewing are a bit much, though, and we don’t want the navigation, search box, or page numbers. This is trivially fixed with a user stylesheet:
$ cat <<EOF > print.css
body { margin:0 !important }
#footer, #nav, .pages { display:none }
EOF
$ prince -o tomayko-print.pdf -s print.css http://tomayko.com
$ open tomayko-print.pdf
Much better: tomayko-print.pdf. Note that I had to use !important in my user CSS to override attributes specified in the page CSS. As an alternative to passing the user stylesheet on the command line, I could have used a print specific stylesheet embedded directly in my site’s HTML (e.g., <link media='print' ...>).
Big Documents and Performance
Let’s try something a bit more complex. The single web page version of The FreeBSD Handbook is almost 5MB of DocBook-generated HTML. It uses a wide variety of HTML’s markup capabilities and a custom set of DocBook-based CSS.
$ time prince -o handbook.pdf \
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/book.html
[snip "entity already defined" warnings]
real 2m53.200s
user 1m40.743s
sys 0m8.389s
$ open handbook.pdf
1m40s is fast. It barfed a bunch of “entity redefined” errors on stderr (as it should) but the result is quite acceptable. I'm not going to link to the handbook PDF here because it’s a little over 11 MB. You can find it with a little digging if you’re really curious or just grab Prince for yourself and generate your own.
Images / SVG Support
There’s one last test I'd like to show.
One of the huge problems with taking web content to print is image resolution. Images on the web, when taken directly to print at the same eye size, usually end up somewhere between 76 and 100 DPI. An image with anything more complex than vertical and horizontal lines looks absolutely horrible in print at 100 DPI. Even simple charts and graphs will appear grainy and jaggy on paper at such low resolution; logos too. Content that needs to go to both mediums will usually have separate web and print optimized versions – a pain to maintain.
This is one of the reasons I dig SVG so much and Prince supports SVG in a very good way. What I was reluctantly expecting to see here was for the SVG to be rasterized to a high resolution bitmap image and then embedded in the PDF but what Prince does is so much better.
Sam Ruby has been embedding little handcrafted SVG images in his weblog entries for a little while now so we’ll use him as our guinea pig:
$ prince -o svg.pdf http://intertwingly.net/blog/2008/02/01/SVG-Tidy
$ open svg.pdf
Check this out: svg.pdf. It might not seem very interesting at first but zoom in 500% or so and look at the little graphic: it’s perfect. Prince is converting the SVG directly into PDF drawing instructions, retaining its vector goodness. This makes for perfect image output at any DPI.
Prince is an extremely impressive piece of Non-Free software that is otherwise very well-suited to the philosophy of Unix. I have little doubt that it would be more than capable of handling anything I would throw at it and can heartily recommend it if you’re comfortable with the license and the price.
Discuss
Thanks for the positive write up! It’s nice when someone actually notices the effort put into build engineering :)
Prince is actually written in Mercury, a high-level declarative programming language that compiles down to C, offering reasonable portability, decent performance and allowing easy integration with other C libraries like libjpeg/png/gif/tiff etc. for image loading.
— Michael on Sunday, February 03, 2008 at 11:55 AM #
There probably is no need for the search form to be visible when media is print, eh? I've removed it.
— Sam Ruby on Sunday, February 03, 2008 at 12:14 PM #
Great find/review. I've been looking for a better way to work on a book in text form for some time, where the resulting PDFs are print quality. There’s a lot of Free tools that do it, but none that are as easy as this (and this may be a better path than writing directing in OpenOffice/Word).
— mx on Sunday, February 03, 2008 at 12:21 PM #
Thanks for the extensive write-up! I've read and heard a lot about PrinceXML before, but never seen such an in-depth analysis of how it can be used to solve practical, real-world problems. The price tag is literally just insane; even the single-user one. I wish they had a “personal license” too with a more affordable price tag.
— Asbjørn Ulsberg on Monday, February 04, 2008 at 10:41 PM #
Asbjørn: From the download page
> We offer a free Personal license for interactive use on a single > computer.
— Shawn Wheatley on Tuesday, February 05, 2008 at 07:32 AM #
I've encountered PrinceXML and I consider its baseline support to be quite impressive. However, to go to the next level, and truly be “print” ready, Prince really needs TeX style hyphenation/justification (this goes beyond dictionaries and involves the total-fit algorithm, hyphenation exceptions, and beyond.) Otherwise, you've merely got a “browser” that actually does print related markup properly.
— Edward Z. Yang on Friday, February 08, 2008 at 12:25 PM #
I’ll back all the nice words on Prince –
Exposing myself to a flame war – which by no means is my intention – I risk offering the free alternative!
I was out for this kind of feature without the close to 4K funding on my budget, so I turned to “flying saucer”.
It is not a commercial product – not by far – but it solves your basic PDF'ing just as well as prince.
All the fancy pancy stuff – well, then you’re of to prince again :)
Just thought I'd mention it.
Oh, and please don’t shoot the messenger ;)
Cheers Walther
— Walther H Diechmann on Saturday, February 09, 2008 at 04:32 AM #
it’s just a shame that it doesn’t appear to generate tagged PDFs, but apart from that it looks quite excellent.
— patrick h. lauke on Saturday, February 09, 2008 at 10:26 AM #
I'd like to see a demo of how well Prince does tables that are longer than 1 page.
— Vincent Murphy on Sunday, February 10, 2008 at 11:55 AM #
We use PrinceXML at my company and are entirely pleased with it. The team there is superb and Michael Day has been enormously helpful. We use their product to convert horrible HTML (insane usage of tables) and it handles it quite well. I couldn’t be happier.
— Clayton Magouyrk on Monday, February 11, 2008 at 09:32 AM #
Doesn’t opera use it or one of the opera coders also coded it?
Also google docs (the word processor) is using it for it’s pdf export.
— Justin Goldberg on Friday, February 15, 2008 at 06:00 AM #
Just stumbled upon the project Pisa: http://www.htmltopdf.org/
I can not give it a good or bad review, as I've just happened upon it today thanks to reddit.com, but it’s not mentioned above and it may work out.
— Jon Miller on Tuesday, February 19, 2008 at 08:51 AM #
True. Prince has been extremely impressive. I still think that the $3800 price tag is tad too bit, and has been forcing me to search for alternatives.
— Avadhut on Friday, May 15, 2009 at 09:26 PM #
Sorry, comments are temporarily closed due to a broken spam filter. bbl.