Discussions

News: BEA Releases Preview of Streaming API for Java (StAX)

  1. The Streaming API for XML (StAX) is a new Java API for parsing and writing XML easily and efficiently. Spearheaded by BEA, StAX has passed the final approval ballot of the Java Community Process (see JSR-173).

    Processing XML is a standard task in most computing environments. Until now, developers have typically used two approaches: the Simple API for XML processing (SAX) and the Document Object Model (DOM). Although both methods have their advantages, both also have significant disadvantages, such as a lack of iterative processing (SAX) and a potential performance loss resulting from reading the entire XML document into memory (DOM).

    StAX solves these problems by providing more control of XML parsing to the programmer, in particular by exposing a simple iterator-based API and an underlying stream of events. Methods such as next() and hasNext() allow an application developer to ask for the next event, or pull the event, rather than handle the event in a callback. StAX also enables the programmer to stop processing the document at any time, skip ahead to sections of the document, and get subsections of the document.

    StAX helps you process XML faster and easier in these typical use cases:
    - Data binding, a two-way process that reads and writes XML (unmarshaling and marshaling) to and from a programming language data structure
    - SOAP message processing (SOAP is an XML message transport format used predominantly by Web services)
    - Parsing a specific XML vocabulary
    - Processing pipelined XML

    View BEA's StAX Preview Release: http://dev2dev.bea.com/technologies/stax/index.jsp

    View the JSR 173: Streaming API for XML JCP Home

    Threaded Messages (8)

  2. Any downside ?[ Go to top ]

    Is there a comparision check list between DOM, SAX and StaX? Is StaX a replacement or suppliment to existing parsers.

    I hope a day will come, where a developer do not have to understand so many parsers. Why not this parsers thingy be like SQL. Any thoughts?

    - a frustrated developer.
  3. Any downside ?[ Go to top ]

    Is there a comparision check list between DOM, SAX and StaX? Is StaX a replacement or suppliment to existing parsers.


    It looks closer to SAX, because in both cases you read sequentially. The main difference is that in SAX you have inversion of control. The parser calls *you*. In StAX it's the either way around. The other major difference is that the events are not method calls, they are objects, and they have an int code. So instead of doing this:

    startElement(..) {
    ..
    }

    endElement(..) {
    ..
    }


    > I hope a day will come, where a developer do not have to understand so many parsers. Why not this parsers thingy be like SQL. Any thoughts?
  4. Any downside ?[ Go to top ]

    SORRY ABOUT PREVIOUS MESSAGE - GOT CUT OFF

    > Is there a comparision check list between DOM, SAX and StaX? Is StaX a replacement or suppliment to existing parsers.

    It looks closer to SAX, because in both cases you read sequentially. The main difference is that in SAX you have inversion of control. The parser calls *you*. In StAX it's the either way around. The other major difference is that the events are not method calls, they are objects, and they have an int code. So instead of doing this:

    startElement(..) {
      ..
    }

    endElement(..) {
      ..
    }

    you do this:

    add(XMLEvent event) {
      switch(event.getType()) {
        case XMLEvent.START_ELEMENT:
          ..
        case XMLEvent.END_ELEMENT:
          ..
      }
    }

    IMHO this fixes a problem in SAX: it has several listener-type interfaces like ContentHandler, ErrorHandler, etc. and if you want to make a pipelines the number of interfaces and the number of methods makes things complicated. To make a filter people can extend XMLFilterImpl, but this implements all the major interfaces, which maybe is not the developer's intention.

    With StAX, making a filter is trivial, you just call add(XMLEvent) on the next link in the chain, even if you are not interested in the particular type of event.

    The only problem could be one of performance, because I think method calls are more easily optimized than thousands of XMLEvent objects. On the other hand an implementation could use a pool and re-use the XMLEvent objects.

    > I hope a day will come, where a developer do not have to understand so many parsers. Why not this parsers thingy be like SQL. Any thoughts?

    I think what's really annoying is the attempts that were made at tying DOM, SAX, and streams together, meaning JAXP and TrAX. I don't think what came out is very clean. It would be better to keep those separate. But I think all XML access methods so far have their legitimate applications, DOM, SAX, StAX and JAXB (and other mapping tools.) When it gets messy is when you start mixing these together ...

    One problem I see is that the XSLT spec is not designed for streaming documents, and I hope something will fix that soon. Until then in my current project StAX alone may not be very useful.
  5. Any downside ?[ Go to top ]

    add(XMLEvent event) {
       switch(event.getType()) {
         case XMLEvent.START_ELEMENT:
           ..
         case XMLEvent.END_ELEMENT:
           ..
       }
     }


    Going from interfaces to a switch seems like a step backwards to me.

    One problem I see is that the XSLT spec is not designed for streaming documents, and I hope something will fix that soon.

    XSLT should designate which expressions can be done in one pass, thereby identifying a subset of XSLT that can be streamed. A stylesheet could judged as streamable or not.
  6. Any downside ?[ Go to top ]

    Going from interfaces to a switch seems like a step backwards to me.


    Not if you have an extensible set of events. The thing is that with a pipeline you do have that. And you still have type checking because each type of event has a different class.
     
    > XSLT should designate which expressions can be done in one pass, thereby identifying a subset of XSLT that can be streamed. A stylesheet could judged as streamable or not.

    That's true. It should work that way, but in Xalan, it doesn't. And XSLT allows to do almost anything with nodes, so the full-employment-theorem-for-compiler-writers may become a problem.

    On the other I found this STX api which is not standard yet, but does transformations for streaming data:

    http://www.xml.com/pub/a/2003/02/26/stx.html
  7. events versus states[ Go to top ]

    Hi Brian,

    STAX supports a low level view of the data (XMLStreamReader) that is styled after XPP and uses integer event types to mark the states. It also supports an object-event approach that lets you use event objects to process XML (XMLEventReader). So you can pick which style fits the type of program you are writing.

    Thanks,

    Chris (JSR-173 Spec Lead)
  8. Lookes like XML Pull parser[ Go to top ]

    It's funny but it looks like XML Pull parser the same concept only sligthly different API litle more OO style

    http://www.xmlpull.org/
  9. Looks like XML Pull parser[ Go to top ]

    Hi Giedrus,

    The creator of XML Pull, Aleksander Slominski, was on the Expert Group for StAX and we based some of the API & RI on the work done in XPP.

    Thanks,

    Chris