Hi, we are working with XML files that can be as big as 6Mb. Some XML documents that obey to a certain condition must be duplicated where only 2 elements need to change value. I thought using XPath to select the 2 XML elements and updating. But as far as I know, you can only update elements after selection with XPath if you use the JDOM XPath class. Using the JDOM XPath class implicates that you first have to build a JDOM in memory tree representation of the XML document. This can be quite dramatic when a JDOM tree is build in memory from a 6Mb XML document. A solution to this problem is to use an XSLT stylesheet with Xalan where we give the new values for the 2 elements as parameters to the Transformer class. Xalan will use a event driven parser like Xerces that makes use of streams. Are there any other possibilities? I think we're not the only people that need to change a couple element values in a XML with as little overhead as possible. Best regards, Mark
- Posted by: Mark Noten
- Posted on: December 19 2006 07:36 EST
If all you are doing is changing simple textual values in two elements in a large document, you would be best using the SAX API. However, the question is how easily you can identify those two elements when the parser reaches them. Since you didn't post the XPaths, I can't comment on how easy or difficult it will be. If the XPaths aren't too complicated, you can probably track the context in SAX. There is, as it happens, some discussion of this in an XML APIs chapter that I wrote in the soon to be released book "Advanced XML Applications from the Experts at The XML Guild": http://www.amazon.com/XML-Power-Comprehensive-Guide-Guides/dp/1598632140/ Cheers, Tony.
The XML file is quite big but the elements that need to change value are on top (the XXX and YYY value). XXX YYY false ... up to 6 Mb of content ... So a simple SAX2 handler from Xerces that extends the DefaultHandler base implementation could delegate the SAX events to an XMLSerializer that acts like a document handler. When a SAX2 event is raised at the start of the element we could set a flag so that the characters method in the handler does not write XXX to the ByteArrayOutputStream but a specified value. What I know of Xerces is that it has support for SAX version 2. Is there any reason (performance, stability, ...) for choosing the SAX API over the Xerces API? Thank again Tony for your quick reply!
The SAX API is implemented by various parsers. By all means use Xerces and its SAX implementation. Cheers, Tony.