News

Introducing Axiom

An XML Object model should be memory efficient and fast in processing XML. These two goals are very difficult to satisfy with the technologies that exist so far. However, with the introduction of StAX (Streaming API for XML), these goals are no longer difficult to implement. The Apache Web services community was also searching for a fast-performing and memory-efficient object model for their next generation Web service engine, Axis2. AXIOM (Apache aXIs Object Model) was the implementation that the Apache WS community came up with, along with StAX, to achieve the above goals.

Eran Chinthaka

Published: 01 Jul 2006

Introduction

An XML Object model should be memory efficient and fast in processing XML. These two goals are very difficult to satisfy with the technologies that exist so far. However, with the introduction of StAX (Streaming API for XML), these goals are no longer difficult to implement. The Apache Web services community was also searching for a fast-performing and memory-efficient object model for their next generation Web service engine, Axis2. AXIOM (Apache aXIs Object Model) was the implementation that the Apache WS community came up with, along with StAX, to achieve the above goals.

This article will first introduce you to the architecture of Axiom and then explain how some of the things can be done using Axiom (Axiom is also referred to as OM, among the user and developer community).

Architecture

Axiom looks at the XML stream through StAX readers. It will read events from the StAX reader, through one of the builders available, and construct the memory model. The specialty of this object construction is that the object model will be built on demand and will not be fully built at once. Let us understand this through an example.

<ns1:Article xmlns:ns1='http://www.serverside.com/articles/introducingAxiom'>
 <Name>Introducing Axiom</Name>
 <Author>
  <Name>Eran Chinthaka</Name>
  <Origin>Sri Lanka</Origin>
 </Author>
 <RelatedProject xmlns:apache='http://www.apache.org' apache:organization='Apache'>
  <Name>Apache Axiom</Name>
  <URL>http://ws.apache.org/commons/axiom</URL>
 </RelatedProject>
</ns1:Article>

If I want to get the name of the author, I will only need to read until the <Name> of <Author> tag. Axiom will build the object model in the memory until that point, and it keeps the rest in the stream, without creating an object for the rest of the XML (Axiom has a way to read the name of the author without creating anything at all in the memory. It is called caching and will be discussed later). The above technique comes in handy when you want to process large XML and when you are mediating XMLs.

Now let's see how Axiom achieves this.

StAX parser allows Axiom builder to get the events whenever the builder wants events. Axiom builder creates object model in the memory only from the events it receives from the parser. The decision of a builder to proceed the parser depends on the user's requirements. If a user needs to read the name of the author of the above XML fragment, the builder makes the parser proceed until the Name tag, and from the events builder receives in that process, it creates an object model in the memory. If a user needs to know the Name of the RelatedProject, then the builder make the parser proceed until the end of the Name element of RelatedProject, and builds the object model. All this will happen transparent to the user, so he doesn't need to worry about it.

The object model the builder creates can be implemented in different ways. This is achieved through factories and sets of interfaces. During the initial stages, one implementation of Axiom was based on the table model, similar to the model found in Xalan, and the other was based on a linked list model. After a performance evaluation, the table model was discontinued as linked list model was considered performing well for general cases. Later, the Axiom api was implemented on top of the W3C DOM api. Latest releases of Axiom come with a linked based implementation and a W3C DOM api based implementation.

Working With Axiom

Getting Ready for the Action

First, one needs to download Axiom jars and the dependencies before digging into code. There are a couple of ways to do this. The easiest way is to download the Axiom 1.0 binary release from http://ws.apache.org/commons/axiom/download.cgi. Once you download it, extract it to, say, AxiomHome. You will see axiom-impl-1.0.jar and axiom-api-1.0.jar in Axiom_Home/build folder. Put those two in your classpath together with the dependent libraries found in Axiom_Home/lib folder.

More adventurous users can download the source and build from it or check out the Axiom sources from the svn and build from it.

Reading and Writing

Let's first read an XML and put it out using Axiom (See XMLReaderAndWriter sample in resources).

We need to first create an instance of the StAXOMBuilder from the input stream we have. For this example, let us read an XML from a file.

  StAXOMBuilder builder = new StAXOMBuilder(new FileInputStream(xmlFile));
  OMElement documentElement = builder.getDocumentElement();
  System.out.println("xml = " + documentElement.toStringWithConsume());

You can ask for the document element from the builder, which will return an OMElement (OMElement represents the element information item of the XML infoset). You can now print this documentElement out to the output stream.

Concept of Caching: When the builder pulls events from the parser, it has the option of creating the memory model or not creating it. If it creates the memory model, then one can later retrieve information from that memory model. This is called caching. The user has the option of caching or not caching when the builder is pulling the events. When you call the toStringWithConsume() method, the StAXOMBuilder pulls events from the input stream without creating an object model, and writes out to the output stream. But if you want to read and write with caching then you could have called the toString() method. If you look carefully at the caching flag, it's just like you are switching between from SAX to DOM. But the difference here is that the object model is not built from start to finish, unless the whole stream is serialized.

Programmatical Creation

As I mentioned above there can be different object model implementations available. So as a best practice we need to use a factory in creating an object. OMFactory will give you instances of OMElement, OMNamespace, OMText, etc. Let's programmatically create the XML introduced above.

Let's first create the factory.

        OMFactory factory = OMAbstractFactory.getOMFactory();

Creating OMElement

There are different ways that one can create an OMElement. Since Axiom encourages the use of namespaces, all the methods to create an OMElement have OMNamespace as an argument. But if an element doesn't contain a namespace, one can easily pass null for it.

Since the article element is associated with a namespace, we have to first create that namespace and then associate it with the element.

        // lets create the namespace object of the Article element
        OMNamespace ns = factory.createOMNamespace("http://www.serverside.com/articles/introducingAxiom", "article");
        // now create the Article element with the above namespace
        OMElement articleElement = factory.createOMElement("Article", ns);

Now let us create the Name element of the article and set the text to "Introducing Axiom." Because of the flexibility of the Axiom api, the same thing can be done in different ways.

        // method 1
        OMElement articleName = factory.createOMElement("Name", null);
        articleElement.addChild(articleName);
        articleName.setText("Introducing Axiom");
        
        // method 2
        OMElement articleName = factory.createOMElement("Name", null, articleElement);
        articleName.setText("Introducing Axiom");
        
        // method 3 : the more compact way
        factory.createOMElement("Name", null, articleElement).setText("Introducing Axiom");

Since there is no namespace associated with this element, a null has been passed.

Creating Namespaces

One can declare namespaces to an element using the declare namespace method. For example, you might want to declare the Apache namespace to associate with the organization attribute of the RelatedProject element.

        // create the related project element
        OMElement relatedProject = factory.createOMElement("RelatedProject", null, articleElement);
        // create the Apache namespace
        OMNamespace apacheNS = relatedProject.declareNamespace("http://www.apache.org", "apache");

If one needs to declare a default namespace, he can use the declareDefaultNamespace(uri) method of the OMElement.

Creating Attributes

To create the organization attribute, one could create an OMAttribute and add that to the RelatedProject element. But the addAttribute method eases the task of the user by creating an OMAttribute internally.

 relatedProject.addAttribute("organization", "Apache", apacheNS);

Navigating the Object Model

Now let us see how we can retrieve some information from the created Axiom tree:
articleElement.getLocalName will retrieve the local name of the article element. You can even get the name of the element as a QName by calling articleElement.getQName().

Let's navigate through the children of article element:
Since Axiom preserves the full infoset, if you call articleElement.getChildren(), it return an iterator with all types of nodes in it. But if you want to get all the element children, then you can call articleElement.getChildElements().

 Iterator allChildren = articleElement.getChildren();
        while (allChildren.hasNext()) {
            OMNode omNode = (OMNode) allChildren.next();
            omNode.serialize(System.out);
            System.out.println("");
        }

As you can see here, all the children are of type OMNode. In Axiom all the Objects extend from the OMNode interface and all of them support the serialize and serializeAndConsume methods which will be called when you call toString and toStringWithConsume methods, respectively.

How do I retrieve a specific element, if I know the name? Here we go. I want to get RelatedProject element. If you want to get the first match, then you can call getFirstChildWithName(QName). But if you want to get all the matches, the you have to call getChildrenWithName(QName).

 // lets search for a specific element
        OMElement relatedProjectElement = articleElement.getFirstChildWithName(new QName("RelatedProject"));
        System.out.println("relatedProjectElement = " + relatedProjectElement);

If you know how to handle XMLStreamReader, you can get XMLStreamReader from any of the OMElements. The advantage here is that you do not need to worry about the way this OMElement is being created. It can be programmatically created or created from an input stream. Or it may be that the object model is not created fully in the memory. Axiom will handle all these complexities for you and give out XMLStreamReader.

One more thing: When you get this XMLStreamReader, you can set the caching to on or off, depending on your preference.

 // lets get an XMLStreamReader from the article element with caching
        XMLStreamReader xmlStreamReader = articleElement.getXMLStreamReader();

        // now lets get an XMLStreamReader from the article element without caching
        XMLStreamReader xmlStreamReaderWithoutCaching = articleElement.getXMLStreamReaderWithoutCaching();

Performance

Axiom is improving further with the latest and continuous improvements. During the latest performance benchmarking it was proven that Axiom is up to par with the other object models. This benchmarking was done with the worst case of Axiom, where the object model is fully built in the memory. Axiom becomes more and more efficient in terms of memory when the tree is partially built. Becuase of that Axiom gives a very good edge for an XML processing engine like Apache Axis2.

Conclusion

Axiom is a community effort to build a better XML object model. It was around for more than 2 years and proven to be stable. Axiom is the core object model of various XML processing engines, including Apache Axis2.

You also may be able to help us in this effort. You may want to know even more about Axis2. Visit Axiom's homepage, subscribe to our mailing lists and give us feedback in any way you can.

Resources

Code samples
Get the most out of XML processing with Axiom - Article published by Eran Chinthaka in IBM developerWorks
JSR 173 - Streaming API for XML
Apache Axiom - Home page of Apache Axiom project
Apache Axis2 - The next generation Apache Web services engine which uses Axiom as the underlying object model
Axiom Performance Testing Results

About the Author

Eran Chinthaka is pioneering member of Apache Axis2, Axiom and Synapse projects, working full time with WSO2.