Discussions

XML & Web services: StAX and DOM/XPath Large XML?

  1. StAX and DOM/XPath Large XML? (4 messages)

    I have a large XML file to process, much larger than can be handled with DOM. I'm looking into using STax, but ideally would like to have StAX create DOM documents when it hits a particular element. For example .. couple hundred thousand b elements... I want get the stuff in the b element, and ideally it would be returned as a DOM Document so I can run XPath queries to extract only the necessary nodes. The DOM document can be garbage collected each iteration, so no memory issues. Basically I want a solution that combines the power of XPath/DOM with the efficiency of SAX/StAX. Anyone know of anything? Thanks.

    Threaded Messages (4)

  2. oops[ Go to top ]

    When I previewed, it unescaped my XML. Here's the XML: <a> <b><stuff><in><here></b> .. couple hundred thousand b elements... </a>
  3. This is something like a SAX filter[ Go to top ]

    If you only needed to load one section of the document into a DOM tree, you could have implemented a SAX filter, which just passes on certain SAX events and filters out others. In your case, you can still write a SAX event handler that * ignores SAX events until it receives the start event for the element whose content is to be loaded into a DOM tree for processing; * implementing the same kind of API as a SAX filter, you use your SAX handler as a source of SAX events, i.e. you implement the XMLReader interface; * I recommend you use XOM rather than DOM. The XOM Builder (http://www.xom.nu/apidocs/nu/xom/Builder.html) can take an XMLReader as its input; * Each time you come across an element whose content you need to process as a tree, create a new XOM Builder and use your SAX handler as the XMLReader for the Builder. You will have to fake the start/end document events, and pass through the original start element event; * Your SAX handler will need to track where it is in the original document to match start/end elements. If there is any possible ambiguity (e.g. the element that you are processing can contain other elements with the same name), you can use a stack to track start/end elements, to determine when you are at the end of the block you need to process. That's a quick overview of what you can do. Cheers, Tony. -- Author, XML APIs chapter, "Advanced XML Applications from the Experts at The XML Guild" http://www.amazon.com/Advanced-XML-Applications-Experts-Guild/dp/1598632140/ref=sr_1_1?ie=UTF8&s=books&qid=1195115964&sr=1-1
  4. Re: This is something like a SAX filter[ Go to top ]

    Thanks for the suggestion. I ended up using the Woodstox StAX parser and building a DOM tree for each element with standard javax..DocumentBuilder. I found a solution supported by dom4j http://www.dom4j.org/faq.html#large-doc but I needed to perform Schema validation on my DOM document (which JDOM doesn't seem to support) and wanted to stick with the standards based javax.* stack.
  5. VTD-XML[ Go to top ]

    I think you should try VTD-XML... it is capable of doing random access on large (upto 2GB) XML documents... Its memory usage is 1.3~1.5x of XML text, DOM is 4~10x.. http://vtd-xml.sf.net