Points to keep in mind when working with the iTextSharp HTML to PDF converter (HTMLWorker)

If you are looking for a tool to convert an HTML document to a PDF, then the iTextSharp converter is a great tool. iTextSharp provides open source libraries for various languages including Java and .NET. If you wish to commercially use iTextSharp, you will, however, need to pay for the license.

Keep in mind that I am basing my observations on the iTextSharp port to .NET. The Java version may behave differently.

I personally like the iTextSharp tool, and even given its quirks, will continue using it. Please note thatthis article is focussed on observations of the iTextSharp HTMLWorker object. Significant advancements have been made with the iTextSharp XMLWorker object which you can read more about in another article I have written.

Have a look at the collection of articles I’ve written that cover the iTextSharp for .NET PDF generation.

Here are some of the main points to consider when planning/writing your HTML to PDF conversion function using the iTextSharp HTMLWorker Object

  • There are two main ways of using the iTextSharp libraries to generate a PDF document:
    1. You can directly write to an iTextSharp document as you read from your database (this is nice because it bypasses the need to read/write from the server’s file system).
    2. You can read an HTML file from the server’s file system into a StringWriter() object
  • The iTextSharp converter does not properly handle global page styles applied through a style sheet. Best case scenario the converter ignores the global style sheet, and worst case it renders it as text to the generated PDF document.
  • To manually control the font size in a converted iTextSharp document you need to specify the font size that at the HTML element level using either the <font size=”1″ element or the in-line style=”font-size:10px;” attribute. This also applies to other page-level styles.
  • The default font-size setting used by iTextSharp when none is specified is Helvetica 12 pt.
  • iTextSharp will not correctly handle jagged HTML tables. Specifically, if your 10 column table has a row with only 1 column, then their conversion routine will start taking columns from rows after the 1 column row in order to create the 10 columns. Depending on your code, this can end up with surprising results if you are generating your table columns/rows dynamically from a database. Furthermore, iTextSharp does not support the HTML rowspan attribute.
  • Including images with your new PDF can be tricky. The three most common methods are:
    1. To reference an image document by path and name on the local filesystem
    2.  To use a System.Drawing.Image object
    3. To pass an image document as a URL
  • Images will be imported at 72dpi and will most likely lose significant resolution. Use the .ScalePercent(24f) method to try to correct for this.
  • It’s best to force your generated PDF document into landscape mode when working with HTML table data.
  • Use the iTextSharp.text.Document method to set your PDF to an A4 page in landscape layout with 1pixel margins as follows:
    • Dim myDocument As iTextSharp.text.Document = New iTextSharp.text.Document(iTextSharp.text.PageSize.A4.Rotate(), 1, 1, -100, 0)
  • iTextSharp does not support page breaks. This is a bit of a glaring omission in my opinion. Some third party add-ons exist that enable page breaks, but given my misgivings about using too many independently developed add-ons, I have gone the route of using <br> tags instead. Thus when rendering a Web page to PDF I pass in a parameter that toggles a series of line breaks that moves my report tables to separate pages. It may not be the best solution, but it works.

Some specific workarounds when using the HTMLWorker object

In this article I’ve been working with the iTextSharp HTMLWorker object. This object has been deprecated, but many older systems still use the HTMLWorker object. Converting from the HTMLWorker to the XMLWorker requires you to reformat your HTML from HTML 4.01 to XHTML, which can be a significant amount of effort. Thus it’s still important to know your way around the HTMLWorker object.

Here are some of the quirks/workarounds I’ve noticed with the HTMLWorker object

  • If you are inserting a blank table cell, the &nbsp; code for a blank space does not work. In fact the cell won’t even be rendered, and if it is the only cell in a row, then the row won’t be rendered. Using a space character from the spacebar seems to do the trick though and forces the cell to render.
  • If you are applying font-size styles to your text, the HTMLWorker object seems to really dislike non-numeric font sizes. For example:
    • Using font-size: XX-small; will cause the HTMLWorker object to simply skip rendering the HTML object in which the style is being set.
    • Specifically, I set the font size in a span tag that was  in a table cell. Surprisingly this caused the entire table row not to render when the document was exported to PDF.
    • However, I was able to get my table row to render by changing the font-size to a numeric value. When I changed the style of the span tag to font-size:8px; then the row rendered in the PDF and the text had the correct  font size.
  • If you want to apply a background color to a table cell, then the HTMLWorker object does not recognize the css markup: background-color:red; The HTMLWorker object instead accepts the table cell attribute bgcolor=”#FF0000″
  • If you want to size a table cell or table column, you cannot use CSS. Instead use the <td width= syntax. Furthermore, you cannot size the table cell with an absolute number. Since the PDFConversion algorithm automatically sizes your HTML tables to 100% of the PDF document, you must also size your columns using percentages. So for example <td width=”5%” works like a charm.
Advertisements

9 thoughts on “Points to keep in mind when working with the iTextSharp HTML to PDF converter (HTMLWorker)

    1. Thanks for the heads up, I hadn’t heard that iTextSharp deprecated its HTMLWorker library. I will have a look at the XMLWorker library, hopefully they have made some improvements.

  1. width=”10%” does not work for me. I can’t figure out why. All cells are the same width… I guess something has changed.

    1. Table column sizing has always been tricky in iTextSharp and they may have made some updates since I looked into the HtmlWorker object. Does absolute sizing work now? They may have patched the HtmlWorker object to support it properly, but in the process may have broken relative widths.

      I know when I’m working with tables using the XMLWorker object , then I have to use inline styles to set the absolute width of the table as well as the absolute width of each table cell. When the column widths consistently equal the absolute table width then the XMLWorker object does correctly size the columns. It’s tricky business though.

  2. Hi team,
    I want our own width of every Collumn but not effecting if I am using iTextSharp dll for HTML to PDF convertor based on Render control HTML code.

  3. iTextSharp does not support page breaks.

    Using Chunk its possible. Chunk a new page. doc.chunk a new page.
    pdfDoc is of Type Document
    tableLayout is of type iTextSharp

    pdfDoc.Add(tableLayout);
    pdfDoc.Add(Chunk.NEXTPAGE);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s