Do you know the way to convert Microsoft word document to XML?
1.> I guess you want to have content in xml (not style) - can try with microsoft vb (alt + F11) & msxml parser. need to define xml schema based on document structure.
2.> Convert word into rtf document (save as)- and parse it.
3.> Convert word into HTML (save as)- and parse it (xmlc)
Can we use Java Native Interface to do the conversion?
To convert the doc.
1. Use a 3rd party doc format convertor (Verity, Fulcrum, any open-source? etc).
2. You can open and save as using COM.
You could use JNI or a Java/COM/ActiveX bridge to do this.
maybe this might help:
I love POI and regularily use it to interface with Excel spreadsheets. However, the support for Word DOCs has not been written yet and, although its likely to be excellent when its finished, its certainly not an option now.