nux-1.4 released - easy, efficient and powerful XML processing

Discussions

News: nux-1.4 released - easy, efficient and powerful XML processing

  1. Nux is an open-source Java toolkit making efficient and powerful XML processing easy. Improvements and additions in this 1.4 release focus on scalability, reliability and ease of use, maintaining API compatibility with prior releases.

    A detailed changelog is here: http://dsd.lbl.gov/nux/changelog.html
    Downloads are here: http://dsd.lbl.gov/nux-download/releases/

    XQuery and XOM
    • Upgraded to xom-1.1-final (with compatible performance patches). xom-1.0.x and xom-1.1.x continue to work fine, albeit less efficiently.
    • Upgraded to saxonb-8.6.1, implementing XQuery W3C Candidate Recommendation, 3 November 2005 (Saxon 8.6, 8.5, 8.4, 8.3 still continue to work fine).
    • saxon8-xom.jar is nomore needed as its contents are directly compiled into nux.jar, improving simplicity and reliability.
    • Constructing a new compiled XQuery object is now about 20 times faster.
    • Added driver for official W3C XQuery Test Suite (XQTS). Contains some 8500 test cases.
    XML Streaming and bnux Binary XML Streaming
    • Added Streaming Serialization of Very Large Documents in the nux.xom.io package. Using memory consumption close to zero, the new StreamingSerializer enables writing arbitrarily large XML documents onto a destination, such as an OutputStream, both for standard textual XML as well as bnux binary XML (and STAX).
    • Added streaming bnux deserialization for handling arbitrarily large input documents; uses an InputStream and an application provided NodeFactory just like a XOM Builder does.
    • Added bnux serialization to an OutputStream.
    • To enable true streaming, a serialized bnux document now consists internally of one or more independent pages, each at most 64 KB large. Each page is a tokenized byte array containing a portion of the XML document, in document order. Once a page has been read/written related (heavy) state can be discarded, freeing memory. No more than one page needs to be held in memory at any given time. For very large documents this reduces memory consumption, increases throughput and reduces latency. For small to medium sized documents it makes next to no difference.
    • Slightly more compact bnux data format (version number has changed).
    • Improved performance on reuse of BinaryXMLCodec instances (recommended).
    • bnux serialization and deserialization is now roughly 3 times faster when using documents containing namespaces, closely matching performance for documents without namespaces.
    • Added Streaming conversion of standard textual XML to and from binary format, enabling conversion of arbitrarily large documents. The corresponding fire-bnux command line conversion tool now works in fully streaming mode, too.
    Other changes

    • Added AnalyzerUtil.getMostFrequentTerms(). Returns (frequency:text) pairs for the top N distinct terms (aka words), sorted descending by frequency (and ascending by term, if tied).
    • Removed deprecated methods XOMUtil.toByteArray() and XOMUtil.toString(). The methods remain available but have been moved into class FileUtil.
    • Added more test document collections in samples directory.
    • Added package nux.xom.sandbox, a playground for kicking around various ideas and prototypes without any API compatibility guarantees. Code quality varies from sketchy to reliable, but is generally not nearly as well designed and tested as the remainder of Nux. In the future some of these classes may (or may not) graduate into stable packages.
  2. With the new streaming API for XML, I thought things had pretty much reached the end of innovation for XML parsing, but looking at yesterday's VTD and nux, looks like there's still room for improvements. congrats to both on pushing the envelope.

    peter