Our love for Open Source!

You start counting from JQuery, go to Backbone, Ubuntu, Google Chrome and finally come to Nodejs. From the language its written in, to the browsers it runs on, to the servers its hosted on and technologies which run the server logic. Every single building block of Framebench has inherited its origins from open source. So this post is dedicated to one such piece of open source software which has helped us tackle one of the most difficult problems we were facing: Handling PDFs in Framebench. This post is about pdf2htmlEX by Lu Wang (https://github.com/coolwanglu/pdf2htmlEX)

pdf2htmlEX is simply fantastic! It renders PDF files into HTML preserving the text, fonts and formats and making it light weight for both server and client. There is one part of the equation that slightly bothered us, Installing pdf2htmlEX. Though it was not a herculean task, we still felt the process was fairly undocumented and the community will definitely benefit from an installation article. So here it goes!

There is an Ubuntu PPA available, for the recent version of Ubuntu. You can install pdf2htmlEX using only 3 commands :

-sudo apt-add-repository ppa:coolwanglu/pdf2htmlex
-sudo apt-get update
-sudo apt-get install pdf2htmlex

If you have problems installing it this way, you can always build it on your system.

Building pdf2htmlEX on your system can be a bit tricky as it uses many dependencies. So here’s a quick guide to install and use pdf2htmlEX.

Clone pdf2htmlEX

First you need to download the source:

-git clone --depth 1 git://github.com/coolwanglu/pdf2htmlEX.git


 

After the download, cd into the new created directory and build the software.

-cd pdf2htmlEX

 

 Install Dependencies

Before building the software, we need to install/build various dependencies.

One of the major ones is poppler.

  • Poppler

Download the latest version of poppler from http://poppler.freedesktop.org/releases.html

Once its downloaded, we need to build the software.

Run the following commands from your terminal:

-sudo apt-get install libopenjpeg-dev
-tar -xvf poppler-0.22.3.tar.gz
-cd poppler-0.22.3/
-./configure --enable-xpdf-headers

 



-make
-sudo make install

 

  • Fontforge

After libpoppler is installed properly, we need to install the latest version of libfontforge. This has some dependencies of its own, which can be downloaded by the following commands.

-mkdir fontforge
-cd fontforge
-sudo apt-get update; sudo apt-get install libpng12-dev zlibc 
zlib1g-dev libtiff-dev libungif4-dev libjpeg-dev libxml2-dev 
libuninameslist-dev xorg-dev subversion cvs gettext git 
libpango1.0-dev libcairo2-dev python-dev;

 

After this, we need to download and build libfontforge.

Make a separate src folder.

-mkdir src
-cd src

 

Download the source code from https://github.com/fontforge/fontforge/downloads

-wget https://github.com/fontforge/fontforge/downloads/
fontforge_full-20120731-b.tar.bz2


Once downloaded, issue the following command:

-bunzip2 fontforge_full-20120731-b.tar.bz2

 

the resulting tar file can be unzipped using the command :

-tar -xvf fontforge_full-20120731-b.tar

 

Then download freetype and spiro :

-git clone git://git.sv.gnu.org/freetype/freetype2.git
-svn co http://libspiro.svn.sourceforge.net/svnroot/libspiro/

 

now build these together :

-cd ./libspiro

 


-./configure
-make
-sudo make install

 

All fontforge dependencies are installed

  • Build Fontforge

Come out of the libspiro folder using cd .. and then run the following commands :

-cd ./fontforge-20120731-b/
-./configure

 

 
-make
-sudo make install

 

-sudo ldconfig

 

fontforge is built successfully!!

 

Building pdf2htmlEX

Now cd into the pdf2htmlEX directory


and run the following command :

-cmake .&& make && make install

 

 

All is set, you are good to go!

Now you are ready to use pdf2htmlEX as

-pdf2htmlEX “your_pdf.pdf”

 

for help,

-pdf2htmlEX –help
-man pdf2htmlEX

 

Enjoy PDF to HTML conversion with this awesome tool!!

4 Comments

  1. Love your article. You missed v in sudo apt-get install libopenjpeg-de(V). Otherwise perfect so far.

    Reply
  2. One More Mistake : Fontforge – … libxml2-dev l ibuninameslist-dev …( libuninameslist-dev )

    Reply
  3. One more, found it when I reused this guide. (That’s how good it is).
    -./configure –enable-xpdf-headers

    Should be
    -./configure –enable-xpdf-headers

    Reply

Leave a Reply