You start counting from JQuery, go to Backbone, Ubuntu, Google Chrome and finally come to Nodejs. From the language its written in, to the browsers it runs on, to the servers its hosted on and technologies which run the server logic. Every single building block of Framebench has inherited its origins from open source. So this post is dedicated to one such piece of open source software which has helped us tackle one of the most difficult problems we were facing: Handling PDFs in Framebench. This post is about pdf2htmlEX by Lu Wang (https://github.com/coolwanglu/
pdf2htmlEX is simply fantastic! It renders PDF files into HTML preserving the text, fonts and formats and making it light weight for both server and client. There is one part of the equation that slightly bothered us, Installing pdf2htmlEX. Though it was not a herculean task, we still felt the process was fairly undocumented and the community will definitely benefit from an installation article. So here it goes!
There is an Ubuntu PPA available, for the recent version of Ubuntu. You can install pdf2htmlEX using only 3 commands :
-sudo apt-add-repository ppa:coolwanglu/pdf2htmlex -sudo apt-get update -sudo apt-get install pdf2htmlex
If you have problems installing it this way, you can always build it on your system.
Building pdf2htmlEX on your system can be a bit tricky as it uses many dependencies. So here’s a quick guide to install and use pdf2htmlEX.
First you need to download the source:
-git clone --depth 1 git://github.com/coolwanglu/pdf2htmlEX.git
After the download, cd into the new created directory and build the software.
Before building the software, we need to install/build various dependencies.
One of the major ones is poppler.
Download the latest version of poppler from http://poppler.freedesktop.org/releases.html
Once its downloaded, we need to build the software.
Run the following commands from your terminal:
-sudo apt-get install libopenjpeg-dev -tar -xvf poppler-0.22.3.tar.gz -cd poppler-0.22.3/ -./configure --enable-xpdf-headers
-make -sudo make install
After libpoppler is installed properly, we need to install the latest version of libfontforge. This has some dependencies of its own, which can be downloaded by the following commands.
-mkdir fontforge -cd fontforge -sudo apt-get update; sudo apt-get install libpng12-dev zlibc zlib1g-dev libtiff-dev libungif4-dev libjpeg-dev libxml2-dev libuninameslist-dev xorg-dev subversion cvs gettext git libpango1.0-dev libcairo2-dev python-dev;
After this, we need to download and build libfontforge.
Make a separate src folder.
-mkdir src -cd src
Download the source code from https://github.com/fontforge/fontforge/downloads
-wget https://github.com/fontforge/fontforge/downloads/ fontforge_full-20120731-b.tar.bz2
Once downloaded, issue the following command:
the resulting tar file can be unzipped using the command :
-tar -xvf fontforge_full-20120731-b.tar
Then download freetype and spiro :
-git clone git://git.sv.gnu.org/freetype/freetype2.git -svn co http://libspiro.svn.sourceforge.net/svnroot/libspiro/
now build these together :
-./configure -make -sudo make install
All fontforge dependencies are installed
Come out of the libspiro folder using cd .. and then run the following commands :
-cd ./fontforge-20120731-b/ -./configure
-make -sudo make install
fontforge is built successfully!!
Now cd into the pdf2htmlEX directory
and run the following command :
-cmake .&& make && make install
All is set, you are good to go!
Now you are ready to use pdf2htmlEX as
-pdf2htmlEX –help -man pdf2htmlEX
Enjoy PDF to HTML conversion with this awesome tool!!