Java: Word, RTF, PDF convert, read, import, export, writeApril 7th, 2009 by jeremychone
Reading/Writing Word and PDF document could be quite tricky.
For Word (.doc), the best option seems to be using OpenOffice to convert the file to a text format (RTF or OpenOffice XML format), modify the text document, and use OpenOffice to convert back to .doc. JODConverter 3.x seems to be a great option for doing that in a Web Application.
For PDF, iText seems to be the solution.
There seems to be also a very complete and mature commercial product for Java which is Aspose. It supports direct access to Office documents such as .doc file allowing to preserve any meta data such as revision and such. Its a little pricey but manageable for a SaaS application, although, I am not sure how their definition of “location” applies to a SaaS application.
Here are some pointers:
- First an EXCELLENT document about Microsoft binary file format: Why are the Microsoft Office file formats so complicated? (And some workarounds)
- JODConverter 3.x (still beta): Java API using OpenOffice converter. Pros: Very complete; Cons: Require Native. JODConverter 2.x (not maintained anymore)
- OpenOffice.org UNO: API to OpenOffice (Java, C++, …) (Example 1)
- HWPF (from POI): Java API to read/write OLE document. Pros: Pure java; Cons: Not maintained and incomplete (the HWPF part)
- Eclipse BIRT: (see Adam’s comment)
- Microsoft Office (97-2003) Binary format. (Do not waste your time with this, Why are the Microsoft Office file formats so complicated? (And some workarounds )
- Generate RTF Word Documents Using a Java Application (using iText): iText can only write RTF (not read)
- RTF2FO (commercial $150/CPU, $750/site): RTF to XML converter.
- RTF-TO-XML (similar price than RTF2FO)
- Majix: Not maintained anymore (last release was in 2004)