A new TSS article, by Joseph Chandler, presents an architecture to solve the problems presented by XSL needing to be well formed XML when working with Entity references, import and include mechanisms. By processing the XSL as a JSP page, the developer can use JSP include tags, producing a well-formed XSL file that can be compiled into a Template class, which is useable by an XSLT processor.
Skin Emax - An XML/XSLT Architecture for the Web
Are people still pushing XML/XSL as a viable web framework? It has been my (painful) experience that XSL is not a workable, maintainable solution.
1) XSL is a poor language with limited expressiveness. It has one of the most awkward, counterintuitive syntaxes I have ever encountered.
2) It is SLOW SLOW SLOW. I have seen many performance reports that indicate this and my saw first-hand an XSL-based architecture at my last fulltime job that was just a performance nightmare. Not only do you have to deal with the problems of XML serialization, but XSL processors just run a lot slower than other methods of putting together a web page.
What's the general consensus out there? Don't think of this as flamebait but rather as a request for proof to the contrary.
I agree - XSL can be difficult to work with and it certainly is slow, although there are techniques to mitigate that somewhat. However, there are several distinct advantages to such a framework:
1) Strong separation enforced between data and presentation. The awkwardness of XSL is partly to thank. :)
2) Ease of which templates can be included, imported, and transformations pipelined.
3) Great flexibility when integrating with other systems - human or automated. The XML data can be combined with any stylesheet creating HTML, XML, WML, CSV, or any other text-based format. In addition, the XML can be returned as is for loosely coupled integration with other systems (ie. a portlet in a web portal or a poor man's web service).
4) XSLT skills are cross-platform (used in Java, .NET, Perl, C++, etc).
5) Client side transformations will reduce bandwidth usage and improve server performance
Number 3 was the deciding factor on my last project. One of the big issues in the Navy is they have thousands of stovepipe web applications and want to find ways to leverage the functionality in new applications such as their web portal. A web application written using XML/XSL can easily return XML for processing by the new client saving the Navy the thousands of dollars that may be required to rewrite parts of the application.
The awkwardness of XSL can be mitigated by pipelining the transformations (ala Cocoon) to keep the transformations small or eliminate them altogether. In fact, XML processing application could use JSP or Velocity to create the content and be processed by STX or SAX content handlers.
Again, an XML-based solution, which can include XSL but doesn't have to, will probably be slower than presentation technologies, but what you lose in performance, I feel you more than make up in flexibility and maintainability. Of course those tradeoffs can't be made in many cases, but when they can, it might be worth it.
So for limited integration work, XSL might work fine. For anything more complicated than that, it just doesn't cut it. Though in an integration scenario, I think you do need some sort of XML-based architecture no matter what your transformation mechanism is. Other scenarios that XML might work well for: publishing or content-based sites with distributed authorship, web service friendly stuff, where your focus is on delivering web services(otherwise, web services are better done as an adjunct to your core system and shouldn't dictate the underlying architecture), and I guess maybe something where you want to have DHTML tree-based navigation and xml data structures help (though this last case is what people in my old company tried to do that I mentioned in my original post).
With that in mind, as a side note to this, does anyone have any alternatives to XSL that would be good for doing dynamic data transformations? I've always wanted something (preferably open source) that could do that. I'm talking complex transformations where your managing the mapping between lots of nodes (up into the hundreds), and where you have a system where keeping track of your transformations and keeping them up to date is important (so ease of mapping and transformation management is key).
For just rearranging data for display in web-applications, CSS2 is sufficient and very easy to use, by a power of degree easier to use than XSLT. This has been kind of overlooked and is not very well known.
Take a look on selectors
), pseudo-classes and display: table, display: table-row, display: table-cell.
For complicated data transformation in the MS world, you have the Data Transformation Services (DTS) in SQL Server:
What I would like to know is - if there is any equivalent (Open Source or commercial) product to for Java?
If a GUI is NOT a must for you check out Cocoon (completly written in Java) - especially the Cocoon control flow which is included in Cocoon 2.1M1 which will be available during the next days. (http://cocoon.apache.org
This is one good data processing / pipelining tool
As a long-time DTS user myself, I have to say you would struggle to do with DTS what you can do with transformation pipelines - it is simply too limiting. For example, using SQLServer, how would you, say, render a JPEG from a set of coordinates and ftp it to a remote server? Using XSL, this is not only trivial, but I can change it at runtime, without compiling any java code at all.
DTS is fast, and pretty good at what it does, but its certainly not a flexible, extensible transformation architecture.
I have to say, personally though, I do not like using XML/XSL for rendering a display - I use it primarily for data transformation pipelines and syndication. Not to say I think its wrong, its just not what I use it for.
> DTS is fast, and pretty good at what it does, but its certainly not a flexible, extensible transformation architecture.
Thats right. I have used that a long time, and you need to create a lot of VBscript / components because its just usefull for simple transformation mappings
"I had to use scripting for transformation mappings"
Any "non-trivial data transformation" have to include custom code, that is self-evident, what else?
it is simply too limiting
How can it be too limiting when you anytime can extend the package with for example XSLT transformations and/or custom-made code?
For example, using SQLServer, how would you, say, render a JPEG from a set of coordinates and ftp it to a remote server? Using XSL, this is not only trivial, but I can change it at runtime, without compiling any java code at all
I do not understand what you mean with "render a JPEG from a set of coordinates". Ftp a JPEG to a remote server is trivial with "FTP task".
How can you compare a little know product like Babeldoc, with very little activity and low "Usage Statistics", maintained by 4 developers with no online Javadoc for its API and with all in all 45 post together in "help" and "Open Discussion" forums - with a mainstream product like MS SQLDTS, used by ten of thousands of developers and backed by the full force of MS and the SQL Server team?
A sense of proportions is needed.
Rolf, I was not wishing to hold Babeldoc up as an equivalent to SQL
Server DTS (although rereading your original post, that's what you were
asking for it seems), merely an example of the kind of data
processing/transformation pipelining tool that can be freely obtained on
the open source world (there are others, this is just my current
favourite). I'm also not saying I've switched from DTS to Babeldoc for
everything, and the rest of the world should follow suit. I use
pipelining like this when appopriate, on projects that suit it, and it
is nice to have a flexible, free option available when necessary.
> How can you compare a little know product like Babeldoc,
> ... with a mainstream product like MS SQLDTS,
> used by ten of thousands of developers and backed
> by the full force of MS and the SQL Server team?
Well I didn't originally intend to (I think the two are very different
and I have used both in tandem before), but you haven't offered a
particularily compelling argument there - it's kind of like saying, how
can I prefer my home-grown organic vegetables, that nobody else has ever
eaten, to having a nice Macdonalds that millions of people enjoy
everyday. Its an argument based on twisted logic that proves nothing.
Fine argue on technical merit, point out shortcomings (I'm well aware they exist), stress the different focus of the tools. But usage statistics
and market penetration? Sounds like a sales pitch to me. And asking for
an example of an open source processing pipeline, and then rubbishing it
for not having as much muscle behind it as a microsoft product is a poor
I could care less about the thousands of developers using DTS and the
development clout behind it if it doesn't do what I want - which quite
frequently it doesn't. In those instances I will gladly set aside my
hammer, and pick up a screwdriver instead. Different tools, with a
different focus and different capabilities.
Irrespective of the usage statistics, profile and homepage quality,
Babeldoc is very nice to use, under active development and comes with a
hell of a lot of very useful features out of the box (unlike many, more
active sourceforge projects). Yes, of course a free tool will look
shabby compared to the clout of an MS developed tool and its fancy
designer - these guys are developing stuff that interest them, they're
not interested in a DTS competition. But their tool suits a lot of what
I do, and therefore I use it, recommend it and applaud their efforts
(and if I get a chance, maybe I'll contribute).
And do you know what? There's situations where Babeldoc just doesn't cut
it - and so I use something else instead (maybe DTS if the job's right).
I am fortunate in that I have the freedom to be able to do that, rather
than continuing to bash screws into the woodwork with my DTS-shaped
By the way, what I meant by rendering a JPEG was something like - take an XML format
document or SQL query, transform that into, say, a set of graph coordinates,
transform that into an SVG, rasterize it as a JPEG, ftp (or JMS) it to
your frontend server. All defined in plain, easy to understand, easy to modify XML.
DTS is a tool just like any other, and one which I am happy to use as
long as it fits the job - which is not always the case. Perhaps you can
get by with having to find kludgy workarounds for problems like being limited
to one datarow in / one datarow out when doing a datapump, or being
forced to use a named MAPI profile for sendmail. Perhaps everything you
do is a good fit for DTS, in which case fair play to you. When it comes to the point where almost all of my functionality is provided entirely by custom scripts, I have to ask myself what exactly I am buying by
continuing to use a tool.
Personally, I like the fact that, since Babeldoc's process flow is
defined in XML (unlike DTS's binary... thing) I can generate different
process flows, from scratch, on the fly, using XSL. Who cares if its
missing a visual designer if you've got elegance like that? Besides,
DTS's visual designer is pretty limited (and phenomenally slow) for most non-trivial tasks
anyway - you end up pratting around in the code, trying to get it to do what
Babeldoc does have online Javadoc documentation. Check out http://www.babeldoc.com/development/build/javadoc
I know Babeldoc is not in top 10 projects on SurceForge but we do have mailing list far more active then our forums on Source Forge. Keep in mind that Babeldoc is still in beta (1.0 version will be released in May) and that project has started less then one year ago.
But we do have more and more users every day and many of them are contributors, too. That is beauty of Open Source - if you want some feature that is not implemented yet, you are free to implement it! Don't need to wait Microsoft to do it! Of course we cannot compare to comercial produts but Babeldoc solves problems and it is free.
Babeldoc user and developer
How would you compare Babeldoc to the recent updates to Cocoon? At someone else's suggestion, I checked out Cocoon2 and it seems to be focusing more on generic pipelining and data transformations than xml serialization with xsl translation. I haven't read your whitepapers yet (I don't have openoffice installed; pdf versions would be great!) to get a full understanding of your overall design but it appears (at cursory glance) to share some architectural concepts with Cocoon2. What are some of your differentiating features?
I guess I should explain why I'm so interested in this. I worked in EAI for several years and never found anything 100% satisfactory, or even 80% satisfactory with room for extension. I'd long thought about writing my own transformation mechanism but never had the time. I really think that a well-designed, easy to manage, OPEN (Rolf!) and extensible transformation mechanism is one of the most powerful tools an enterprise can have. Combine that with a good asynchronous bus (recent postings on TSS make me interested in Sonic) and you've got a tremendously powerful and flexible mechanism for doing really effective systems integration. The great thing is that something like this could offer a solution comparable to what people like WebMethods, Tibco, and Vitria have been charging an arm and a leg for. As long as the bus was fast and reliable, and you had control over the transformation points, you could turn just about anything into anything else and link things up pretty nicely.
I worked in EAI for several years and never found anything 100% satisfactory, or even 80% satisfactory with room for extension. I'd long thought about writing my own transformation mechanism but never had the time
You're welcome? Your point?
I think Babeldoc and Cocoon has some similiar architecture concepts altough these products serves different purposes.
I really haven't worked a lot with Cocoon but as I could see it is primarly used for generating sitemaps and is mostly request/response oriented. Sorry if I am wrong!
On the other hand Babeldoc is more document oriened. There are many ways for feeding pipelines - using stand alone applications, scanners, SOAP... Scanners can be used for scanning file systems, ftp servers, mail servers ...
XSL transformation is just one feature (pipeline stage) that you may use or not. In fact documents don't need to be xml at all. You can process both binary and ascii documents. There are cases where you just want to automate some process (example: getting an attachment from e-mail and sending it to few addresses or uploading to FTP server, or just inserting record in db) where you don't need XSL at all. But in most other cases you will end up using it at one way or another.
Also Babeldoc has Jornal mechanism so you can always know what stages has been finished successfully and which has failed. You can store whole document at one stage in Journal so in case of failures you can re-introduce document from last successfull pipeline stage. For example if you use stage for uploading document to ftp server, you can store document into Jornal at some stage before. So if ftp server is down for a while this stage will fail and processing could be stopped. But later when ftp server is up again you can "replay" processing from last successfull stage with stored document.
There are many cases where you can use Babeldoc. A lot of users have contributed by writing their own pipeline stages implementing features that wasn't planned at all in the begging. I have started as user (probably the first one!) but pretty soon I needed some features that wasn't there and started to write them. Pretty similiar things happens with other users. So keep in mind that you won't find everything you need in Babeldoc but you can always extend it.
Babeldoc is a pipelined transformation engine. It is configurable through property files or SQL tables. Here are the following aspects to the project:
1. Used by a number of very large financial / banking corporations.
2. Java, open source project
3. flexible and reconfigurable
4. Document based pipeline transformation engine
5. Primarily, but not limited to, XML based operations.
6. Active developer community.
At the moment we are finishing up version 1.0. After this is done, we can start addressing the following aspects:
1. J2EE integration (already mostly done)
2. GUI builder and management tools.
I think Babeldoc and Cocoon has some similiar architecture concepts altough these products serves different purposes.
As far as I know, Cocoon is the only transformation tool (pipelined or not) that allows blocks of Java statements within an XSL stylesheet. That's significant.
And I don't consider Velocity to be as technologicly advanced as JSP or Cocoon's XSP. Perhaps Velocity's greatest fault is its failure to use XML for its template language. Just my opinion.
And you are correct that Velocity is not as technologically advanced as JSP or XSP, it doesn't have to be. It is clean, simple runs fast and that in my book, is worth a lot. I have seen my fill of ugly as sin jsp and xml and you can keep it. "Technologically advanced" can be a trap for the naive.
XSL is great. Its just that its not the only tool in the chest. And that is the babeldoc philosophy.
As far as I know Xalan only allows non-XSL script to be inlined within a <xalan:script> top-level element, but not directly within an <xsl:template>. Cocoon allows both. You might consider it a nitpick, but many real-world transformations surely could benefit by inlining an industrial programing language directly within the XSLT logic. I appreciate the symmetry of being able to embed Java anywhere within XSLT and vice versa. See for example some XSL tags within a Java block within an <xsl:template> at,
The correct spelling is "babeldoc", not "babledoc". Try your google search again.
THanks for the link. That's kind of what I was looking for. I like how the author said "Not a replacement for webmethods (yet)", because webmethods is probably the best example of this sort of thing that I've seen thus far. It's just too expensive and locked into the webMethods server (I can't easily get a webmethods transformation running from a simple socket/console app, for example).
Its not a replacement for webmethods because of the infrastructure that webmethods has. It has all kinds of help with EDI documents, it has a graphical builder (which I find to be a PITA), FTP / HTTPS integration and loads of documentation. But at the core, babeldoc does what webmethods does - it allows for a linear stepwise transformation of documents.
Agreed on the WM GUI. I think that your biggest challenge with babeldoc will be to present a good UI. Of course, I think you've taken the right approach in addressing configurability from config files first. Past a certain point of complexity and past the newbie stage, UIs can often slow down the configuration process (and of course usually involve configurations that are hard to version control). I think the problem with WM's UI is that it was designed first as a UI to a complex configuration language (FLOW or something like that?). What that does is it makes your first few months of using the tool relatively easy. But past that, when you need to start doing really complex mappings or you need to start working with them on a more regular basis and need to be able to quickly make changes, you start to get hampered. If you make the configuration language more legible, so that it's possible to get good with using it, then your first few months might be more difficult but after that, you'll be much more able to deal with complexity. Then, you can still come out with a GUI tool to address the simple cases.
Another way to sum that up is that if you plan on making the simple jobs easy, everything will be great until you need to do something complicated. If you plan on making the complicated jobs easy(ier), then you can always go back and add wizards or gui helpers to make the simple jobs easier, too.
First things first: Rolf, please do NOT turn this into yet another Java versus MS debate. YOu're way out of your league here because I have extensive experience both with the Java side AND with MS products such as DTS and BizTalk. I have used them and know what you're talking about but they were not sufficient for my purposes when I was working on the project in the past and I would not find them sufficient for use in the future.
It is correct that using DTS does require a lot of hacking around to get anything more than a simple transformation done. If you're mapping from "Vendor1AddressRecord" to "Vendor2AddressRecord", then fine. If you have one vendor with a completely normalized representation of a customer (using a contact point pattern) to another vendor's customer record with a relatively flat structure, you end up having to do a lot of hacking around. Unfortunately, that's a very common scenario when doing a lot of integration transformations. Also, you get into especially hot water when attempting to do something even more complicated, such as the runtime SVG creation the other use talked about.
I don't think DTS is a bad tool; it certainly presents a nice GUI. BUt I think that when you look at it at first, you think it matches the 80/20 rule (80% of what you want to do is easy, the other 20% hard). On extended use, the ratio is more like 40/60, which is just not going to cut it when you have a lot of transformations that you're trying to manage and pipeline through each other. BizTalk mapper is the same way, but unlike DTS, at least it's format is XSL and somewhat open.
The other problem you seem to forget about is that DTS is NOT OPEN. In the situation I mentioned, we were deploying to an HP UNIX box because we had to. I cannot run DTS on that. YOu can bitch and moan about MS's merits all you want but it will not run on UNIX. Not now, and no, even with "Mono", it NEVER WILL.
I don't want to shortchange DTS or biztalk mapper because they are fine tools, just not as full-featured as I'd be looking for. What I do want to do is avoid spiraling into another useless MS vs Java debate that avoids the key issue. If you're on a Wintel platform and DTS cuts it for you, fine, and I'd be glad to hear why. I have no desire or patience to hear about how DTS is the greatest thing ever and after doing a job search on hotjobs you've determined that anyone who doesn't use DTS for transformation services is a fool who will soon be consigned to the dustbin of history. I've heard it from you too many times in the past.
What do you mean by "pipelining the transformations"?
has many more examples.
How does cocoon handle transformation mappings? Does it provide a way to do complex mappings between object trees of hundreds of nodes (e.g., a telecom provisioning order in various flavors)? A way to have a mapping potentially decompose one object tree into a collection of others, of various types? Or combine object trees into one on the other side of the pipeline? How easy is it to work with the files or other information that determines the mapping?
I don't say this as a challenge, just out of curiousity. You say Cocoon2 which may have evolved quite a bit since the original Cocoon I was familiar with. My understanding is that the original Cocoon was mainly using XSL as a way to transform objects serialized to XML into some sort of XML (or HTML, or ...) output on the other side. If that's changed, I'd be curious to hear more.
<drew>How does cocoon handle transformation mappings? </drew>
Cocoon 2 sets up a SAX-based transformation pipeline. How exactly the SAX stream is manipulated is up to the transformers you define. A transformer is a SAX ContentHandler (and LexicalHandler) takes SAX events and sends SAX events to the next transformer. Probably the most common transformer is the XSL transformer, but its really easy to write your own that uses your own algorithms.
<drew>Does it provide a way to do complex mappings between object trees of hundreds of nodes (e.g., a telecom provisioning order in various flavors)? A way to have a mapping potentially decompose one object tree into a collection of others, of various types? Or combine object trees into one on the other side of the pipeline? How easy is it to work with the files or other information that determines the mapping? </drew>
Cocoon only handles XML via SAX events. As I mentioned earlier, Cocoon uses a generator to start the SAX events. Again, really easy to write your own generator. Now, Cocoon also has various techniques to aggregate the results of pipelines, split pipelines, and other such more complicated actions I haven't messed with much. If anything, Cocoon is _too_ powerful.
<drew> I don't say this as a challenge, just out of curiousity. You say Cocoon2 which may have evolved quite a bit since the original Cocoon I was familiar with. My understanding is that the original Cocoon was mainly using XSL as a way to transform objects serialized to XML into some sort of XML (or HTML, or ...) output on the other side. If that's changed, I'd be curious to hear more.
I'm not sure how uptodate this is, but it should explain how Cocoon 1 migrated into Cocoon 2: http://xml.apache.org/cocoon/userdocs/concepts/index.html
I admit I'm relatively new to Cocoon, but I've yet to find anything that is better at manipulating XML.
Cool, thanks. I'll check it out.
In the book "J2EE & XML", Gabrick and Weiss point out that Cocoon could be integrated directly as XSLT process mechanism in a XML/XSLT presentation framework.
Drew expresses the most common XSLT complaint, against XSLT performance:
" 2) It is SLOW SLOW SLOW. I have seen many performance reports that indicate this and my saw first-hand an XSL-based architecture at my last fulltime job that was just a performance nightmare. Not only do you have to deal with the problems of XML serialization, but XSL processors just run a lot slower than other methods of putting together a web page. "
This is certainly the case with the free software engines. But take a look at DataPower's XA35 XML Accelerator, a hardware device which can parse XML and perform XSLT at 100's of megabits per second. That means doing XSL as fast as the wirespeed for your Fast Ethernet network -- likely faster than other methods of putting together a web page. Can your current server-side system fill up a 100 Meg network link with generated pages? Often not.
The idea is that the XA35 can get you all those benefits of XML or XSLT that others in this thread have pointed out (crossplatform, separation of content from presentation, single-source publishing, etc.) without paying the "SLOW, SLOW, SLOW" performance price.
Since XSLT is portable, you can build the app with open-source java engines (e.g., Xalan) and then switch to hardware acceleration for production deployment if necessary.
Check out this http://www.datapower.com/products/xa35.html - XSLT Acceleration
for more info.
There are some capabilities in this area, but the big question is: why do you need scripting? XSLT is Turing-Universal, you could write MS Word in it if you wished. There are some inconvienient shortcomings in XPath 1.0 with date-formatting and other frequently-used data manipulations, but that is being addressed in upcoming XPath 2.0 and exslt (www.exslt.org). Just a matter of adding some additional extension elements / functions to make it more convenient to do certain kinds of processing.
I've been using Cocoon 2 (http://cocoon.apache.org
) for several years and every day I'm more convinced that it IS the RIGHT choice. It is THE framework for XML applications because:
- Separation of concerns (you can separate between data, view and flow logic --> do the things where they should be done)
- very fast because it is SAX based and supports caching at many places
- it can be easily extended (it is 100 % Java) but the most things can be done without writing a sinlge line of Java code
- the core of Cocoon is its sitemap. the sitemap makes it easy to manage the URI space
And there are many reasons more!
See for example XSL taglib from Coldtags suite:
You can provide XSL data as JSP code
With grid computing, the whole "don't have enough CPU" problem goes away for XSLT. It's not too hard or expensive to put together a teraflop grid of XSLT processing. We have customers using Coherence to do it, but you could also use JavaSpaces, RMI, or even a Tomcat farm behind a h/w load balancer!
: Easily share live data across a cluster!
Everybody interested in considering XML and XSLT as an alternative or
complement to JSP / Struts should consider OXF. The core of OXF is
XPL, an XML pipelining language that allows you to combine XML
transformations of any kind. XPL is very easy to learn and provides
built-in conditionals, aggregation, XPointer support, WXS and Relax NG
validation, caching, and more. In OXF, XPL is built on top of SAX for
maximum memory efficiency.
OXF can be used to build Web apps, standalone apps, or can be embedded
in any Java application that needs XML processing without the burden
of directly using JAXP. For Web apps, you can use OXF in conjunction
with Struts (an architecture we call Model 2X), or standalone, thanks
to a series of XML processors including XSLT, SQL, XForms, basic
serializers and generators, with many more coming up in the next
version. You can also write your own XML processors in Java.
One of the biggest problem users face when considering XML/XSLT for
their presentation layer is the lack of methodology. With OXF, we
provide a clear architecture, either when combining XML/XSLT with
Struts, or when using OXF standalone.
For more information: http://www.orbeon.com/oxf/
XSLT performance has rarely been an issue for us. As always it depends
on what type of application you want to build. Building a Web app for
20 users is not the same as building a public Web site with hundreds
of thousands of users.
See also the article about JavaServer Faces and XSLT integration that
we published on TheServerSide a couple of months ago:
where to find the source files of the examples?
I'll be bold and place another plug: we will have a JavaOne
session on the integration of XML/XSLT and JSP/JSF. More information is available here
Is it possible to get the source distribution for the Skin Emax article? Please
include a document describing the Jar files used so I can recreate this. Thanks.