brite, a MVC Framework for jQuery

Java: Word, RTF, PDF convert, read, import, export, write

April 7th, 2009 by jeremychone

Reading/Writing Word and PDF document could be quite tricky.

For Word (.doc), the best option seems to be using OpenOffice to convert the file to a text format (RTF or OpenOffice XML format), modify the text document, and use OpenOffice to convert back to .doc. JODConverter 3.x seems to be a great option for doing that in a Web Application.

For PDF, iText seems to be the solution.

There seems to be also a very complete and mature commercial product for Java which is Aspose. It supports direct access to Office documents such as .doc file allowing to preserve any meta data such as revision and such. Its a little pricey but manageable for a SaaS application, although, I am not sure how their definition of “location” applies to a SaaS application.

Here are some pointers:




  • IText (download): Java Lib to read/create PDF document.
  • FOP: reads a formatting object (FO) tree and renders the resulting pages to a specified output (The primary output target is PDF)

2 Responses to “Java: Word, RTF, PDF convert, read, import, export, write”

  1. Adam Markey Says:

    For creating .DOC, .RTF, and .PDF files out of structured data (not reading / converting existing docs), I’ve seen alot of promise from the Eclipse BIRT project which can output all of those formats using their own meta-reporting syntax and java code – it’s quite nice!

  2. jeremychone Says:

    Thanks Adam. I just added the Eclipse BIRT Project to the post.
    In my case I need to read/write content. I do not generate from scratch.