Manipulating XML data easily and efficiently in Java remains a challenge. Numerous approaches to XML binding exist in the industry, including DOM, JAXB, XML Beans, Castor and SDO. In this article, Ed Merks and Elena Litani explore, through the use of an example, how the Eclipse Modeling Framework solves the XML binding problem and how it compares to alternatives.
The model that is used to represent models in EMF is called Ecore, and since Ecore is itself a model, it is called a meta model, i.e., the model of a model. EMF supports this core
meta model API, Ecore, analogous to XML Schema, as well as a core instance data model API, EObject, analogous to DOM Node. Ecore is to abstract syntax what XML Schema is to concrete
syntax, i.e., a unifying meta model. But rather than start with vague abstractions, it seems best to start from something well known and concrete on which to draw comparisons.
To bring concreteness to the discussion, we explore the binding problem by way of an example. Consider the problem of creating the following XML instance using W3C DOM, i.e.,
Read Binding XML to Java
Two thirds of the way down the article the text goes very, very small and continues tiny all the way to the end. The problem seems to be on line 972 of the page source where there is an incorrect opening of a span element rather than the closing of the existing span.
...needs to be replaced with this ...
I am fixing the bug now; the change will be live shortly. Thank you for bringing this to our attention - I apologize for any inconvenience.
From the article:
In the universe of all models, the simplest self describing model plays a singularly unique role as the one model that binds all other models.
This surely belongs in a philosophy paper! Maybe I shouldn't have skipped to the end, but once I read that I decided I didn't have a hope for the rest of the article!
I find a seemingly shallow comment like this most disappointing. The statement "In the universe of all models, the simplest self describing model plays a singularly unique role as the one model that binds all other models." would belong in a philosphy paper if it weren't made absolutely concrete by the fact that Ecore is used to model all the other models including itself. I would challenge anyone to provide or point out something that's similarly expressive yet simpler.
'Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.' —Jamie Zawinski
Substitute "XML" for "regular expressions" in that great quote. It fits perfectly - perhaps is an even better example.
We use JAXB 2.0 and it works perfect. I wouldn't consider using anything else.
We use JAXB 2.0 and it works perfect. I wouldn't consider using anything else.
Agree with that. JAXB is a convenient tool for many (not all) XML tasks.
I've started using JiBX for a few projects. I find it very easy to use, once i set up a few ant tasks. Beside it got killer performance. (Reminds me a lot of Hibernate. It can handle schemas too, but not 100% bulletproof. Although not had any problems yet.)
I like it mutch more then JAXB 1, but i have a feeling JAXB 2 i good as well.
Not tried it yet. My servers is still running jdk 1.4.
I've started using JiBX for a few projects. I find it very easy to use, once i set up a few ant tasks. Beside it got killer performance.
According to the homepage JiBX uses proprietary "binding definitions" instead of XML-Schemas and byte code enhancement :-(
Yes it does! :O
What i did was using their tool to create the mappingfile from a XML-Schema. Then generate java classes from the mapping file. But this is done from my two ant tasks, so it's pretty painless.
However that ekstra layer comes in handy if you have a big object structure that would map to a given xml format but the syntax or layout dos not match 100%. The mapping file takes your old stucture and convert it to "another format" (e.g. from that XML-Schema that is 70% the same as your object.. )
There is way too much in this explanation for it to be a good solution when you consider that there is a much simpler way to do this. The problem I have found with most binding frameworks is that the mapping is done on the Java side or by using some sort of xml mapping document. This reinvenents the wheel as an oval.
The easy way to do this is to generate a schema from a class (not the other way around) and use a XPath (or similar) XML-oriented language to do the mapping to and from XML formats. JAXB 2.0 apparently does the schema generation. I don't know of an existing framework that puts this all together in a neat package.
EMF will also generate a schema starting just with annotated Java, but the more I explain, the more complex it will all seem. Like beauty, simplicity is in the eye of the beholder...
XStream seems a bit easier than any of this.
XStream is a great product, but usually only resolve the XML serialization/deserialization problem, not the XML binding problem.
But I agree it is enough for most situations.
EMF is intended to deal with data from any source, not just XML, so while it may be complex by comparison, it also deals with a broader domain and unifies them under a single umbrella. E.g., http://www.elver.org/
is contributing support for Hibernate integration.
Its an interesting article, but trying show a tool is better than DOM is not exactly hard; hitting yourself repeatedly with a cricket batter is better than trying to do complex stuff with DOM.
One problem I have with a lot of the O/X mappers is that everything that is schema driven ends up being horribly inflexible, with the structures it handles being fixed at compile time. Take XmlBeans -it generates hundreds of classes, and wherever it gets to xsd:any, you end up in, yes, DOM.
Now, I dont know how well EMC works in the field, but my requirements for an XML mapping are
-excellent xsd:any support
-XPath support, so I can navigate around via xpath paths.
-easy to move elements from one tree to another, and to create elements without knowing their final destination.
-Java5 integration; the nodes and attributes should be Iterable and so go into a foreach clause
-let me provide my own factory at parse time. I don't want java classes that represent the schema, I want java classes that represent the objects I use *in* my app.
-stable, good test suite, etc.
-xmlns support. I hate namespaces, but need them, and need tools that understand them.
My test for how well these things work is writing the classes to handle WS-Addressing in all its many variations, that is 2003/03, 2004/04, 2005/08 interim and 2005/08 final, then see how well you can work with the arguments, compare addresses, move stuff between them. Certainly it was enough to put me off XmlBeans.
So far, Xom is the tool that meets my needs. I've had to retrofit the java5 support by having my own extended elements with the java5 extensions, but otherwise its very nice indeed.
- EMF does support xsd:any and xsd:anyAttribue, including both lax and strict processing.
- EMF does not provide XPath support directly, but projects like TPTP have built that on top. All the information needed for XPath evaulation is available.
- Moving, and copying subtrees is trivial and bidirectional reference integrity (containment semantics) is ensured and enforced.
- Support for Java 5.0 is our primary work item for the 3.0 development cycle. Things like EObject.eContents(), EObject.eCrossReferences(), and EObject.eAllContents() provide trivially simple generic navigation already; the later walks the entire containment tree with a single iterator loop.
- The resource framework for serialzation and deserialization is extensible.
- EMF is highly stable and is EMF is used in most of IBM's products. Bugs are typically fixed within a week. EMF will generate a JUnit test suite for your model.
- XML namespaces are properly and fully handled, including support for QNames.
I see lots of points here which have been going round and round the industry for years.
Yes, XML is used as a solve-all solution.
Yes, XML is not a perfect modeling language.
And yes there are many alternatives of integrating java and XML. But to try and evaluate one technology against another is an extremley difficult task given the range of applications and approachs taken. As for any reasonably mature technology there are good and bad points of each.
But let's not get into yet another discussion on that subject.
This thread is introducing the EMF as an XML binding solution. Great. EMF is good and XML is good, but are they enough?
In my (probably biased) opinon, why just stick to XML binding? What about non-XML binding? Why do developers suggest XML as the solution for all of life's problems? ... because they know of an easy way of integrating with it. But what about non-XML based structures?
we propose a model driven architecture based on MOF/UML which lets you generate java code (ala JAXB, XMLBeans etc.) which will bind java code to your model definition. And where does this model come from? ... from schema, DTD, RELAXNG, an RDBMS, one of our pre-built financial services standards such as SWIFT FIN, ISO20022 UNIFI, FpML, TWIST, CREST, FIX etc. or one of your own models you've built to represent your legacy file format. Whatever the source of the model you end up with the same API for all your data integration. ... and what's more you can execute XPath, XQuery and XSLT against the bound code.
Take a look!
Product Development Director
Thank you Simon,
There are two features of C24
's binding tool I'd like to point out as sticking out from the others and that's apart from the support for xsd:any, idrefs, substitution groups and high performance serialization.
Code generated by C24's Integration Objects ("IO") editor or ANT tasks offers full support for XSLT 2.0 and XQuery (and of course XPath 2.0). This means that you can execute XSLT and XQuery against bound Java objects as if they were the original XML instance. Add to that the fact that C24-IO binds to things like CSVs and more interesting standards (as Simon pointed out) and you can execute XQuery and XSLT on a CSV file with native code performance, the instance is never converted XML.
We have clients using tools like XMLSpy's MapForce to generate the XSLT and then using C24-IO to execute the transformation in production, since they're Java Objects the transformation can be executed in a grid on something like Tangosol's Coherence or GigaSpaces.
Before you compare it to XMLBeans, in fact BEA use C24-IO for their more nasty integration problems, XMLBeans is great for XML but leaves you dry for most other things. Companies like IONA
use it natively to provide an extremely powerful distributed SOA/ESB solutions.
Finally there's another point to consider, most of these complex standards have constraints. Take FpML
for example, a very complex XML standard with just about every feature of schema you can think of. Added to this are contraints, these constraints can not be defined in schema simply because schema doesn't support contraints beyond types and regular expressions. You need DSLs like Schematron and WS-OCL, C24 of course supports these and can therefore fully support the standards. The constraints are bound into the object along with the model.
For the simple stuff I recommend XMLBeans and JAXB, we use both internally but when things get complex you're really going to have to think about the tools you use.
Food for thought hopefully,
EMF provides exactly what you are describing. This article merely focuses on the XML support because that seems to be the cat's meow for so many people at the moment and is particularly relevant to the community at this site. But you can start with a model that's specified as annotated Java, UML2, or XML Schema, and convert to Ecore. And, given an Ecore instance, you can generate Java, export an XML Schema, or export a UML2 instance. In other words, all the model forms are interchangeable and any one may be your starting point for producing the others. It's also dirt cheap, i.e., free, there's an active newsgroup where your questions will be answered in hours, and an active development group where your bugs will be fixed typically within a week. The larger community is contributing things like Hibernate integration and the GMF project (http://www.eclipse.org/gmf
) will generate a functionally complete graphical editor for your model.
IMHO, a number of readers and the article itself miss a very important concept. As an XML-Object mapping tool, EMF is probably on-par, maybe somewhat better, maybe somewhat inferior, to other existing tools.
However, EMF is also an object modeling tool like UML with extensive code generation functionality. Automatic generation of object data editors is available. The core model, the XML serializer, the code generator, and the editor all have extensive option flags for fine tuning. In addition, there are many third-party components that interoperate with EMF.
True if one is just looking for an XML-Object bridge there are many choices. However if one is building a large project based on Eclipse, EMF has a lot to offer in addition to XML serialization. It is quite efficient and effective to have so much in a single cohesive package.
Chief Technology Officer
I am just wondering why nobody mention XMLBean here. it is opensource, it is denoted from BEA, it has best performance so far and most important, it has been used extensively in many commercial software products.
...then the authors just dropped the ball.
I see nothing compelling in the article to make me switch from XMLBeans.
Our conclusion is that all the XML binding solutions produce roughly similar APIs and that from this perspective there is little in the way of differeniating them. So I'm not the least bit surprised that there's nothing compelling to make you switch. But, should you need a structured or graphical editor for editing instances, Hibernate integration, integration with data from other sources, you'll likely find EMF becomes far more compelling.
As Steve Punte points out, EMF applies to a much broader domain of problems than just XML binding. So while I won't claim EMF is head an shoulders above the other XML binding solutions, it seems fair to claim it's far more generally applicable and extensible than most. Given a schema and a few minutes stepping through the wizards, and you'll have a functionally complete integrated Eclipse editor for editing instances...
XML is in many cases not bound to any schemas, VTD-XML
is the next generation XML processing API that goes beyond DOM
I don't understand how one derives meaning from arbitrary unconstrained syntax? (Perhaps I'm being a philosopher again.) I would argue as well that our industry in its relentless drive toward simplicity seems to end up with more and more complexity instead. I think the basis for this is the following observation: for any particular technology, those who do not yet understand it, want it to be simpler, while those who do already understand it, want more and more and more. This to me explains why simplicity is in the eye of the beholder and why it is an unreachable goal, or perhaps more accurately, why it is a starting point, but never the end point.
I tried to use EMF for the FpML schema. It didn't work out, there were unsupported XML Schema features.
(FpML is the schema for the derivatives industry. It is a good test because it exercises many XML Schema features http://www.fpml.org/
Key to Model Driven Engineering is that the models must be *precise* and *complete*. The EMF models are precise, but they're not complete. They will be complete when you can do loss-less roundtrip engineering from any XML Schema through eCore and back to a semantically equivalent XML Schema. The enterprise binding tools on the market can do this today.
The EMF binding to XML Schema is based on the MOF to XSD mapping in XMI. As MOF is almost a subset of UML you can use this to map XSD to UML, and eCore is almost a UML subset, so you have a XSD to eCore mapping. The MOF to XSD mapping maps all MOF features, but not all XML Schema features. The upshot is you can roundtrip models from eCore to XSD and back, but not always from XSD to eCore and back. The other consequence is XMI only knows the default UML profile and won't understand your Java/Oracle/EAI/EDOC profile.
To map in the remaining features of XML Schema to eCore would probably require the addition of OCL constraints to support the non-OO features. This would be a big job. To understand all our customization the model transformation from XSD to eCore would need to be modifiable in a model transformation language.
In all my testing of EMF what was outstanding was the quality of the support, with instant bug fixes and private drops of code made available seven days a week by the EMF team. They were brilliant.
I'm not sure if XML Schema -> Ecore -> XML Schema being a round trip is essential. After all, if you have a schema, you really don't need to generate it (and it's always possible to record the entire original schema and spit it back out again). Granted if you would like to transform the Ecore itself and then spit out the modified original XML Schema, it's more important to be non-lossy. In bugzilla https://bugs.eclipse.org/bugs/show_bug.cgi?id=51210
there is a prototype implementation that records the entire particle structure of the original schema as EAnnotations. With that, we'd be better able to spit out the original form (although this is not one of our primary development goals).
Probably the more essential capability is to ensure that all the instance constraints that are checked by the original schema are also checked by the Ecore model for it. Again, the patch in the above bugzilla prototypes a design that is capable of enforcing complex type restrictions, though that's not such an extensively used XML Schema capability according to http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html
. We also provide no support for key/keyref/unique, but that's another not so frequently used capability.
The EMF team is always happy to work with the community to make incremental improvements...
My goal scenario was being able to author industry standard XSD in eCore. That way eCore becomes the primary artefact and the XSD is generated, rather than the other way round as it is today.
Thanks for the link. I'm watching it.