-
Data Transformation & Processing with Smooks v1.0 (16 messages)
- Posted by: Tom Fennelly
- Posted on: May 08 2008 11:05 EDT
The Smooks team is proud to announce the release of Smooks v1.0. The most commonly accepted definition of Smooks would be that it is a "Transformation Engine". However, at it's core, Smooks is just a "Structured Data Event Stream Processor". The core code makes no mention of "data transformation". It is designed simply to support hooking of custom "Visitor logic" into an Event Stream produced by a data Source of some kind (XML, CSV, EDI, Java etc). Of course, the most common application of this will be in the creation of Transformation solutions i.e. implementing Visitor logic that uses the Event Stream, produced from a Source of one type (XML, EDI, CSV, Java etc), to produce a Result of some other type (XML, EDI, CSV, Java etc). These Event Stream Processing capabilities enable more than just Message Transformation. We have implemented a range of other solutions on top of this basic processing model: 1. Java Binding Framework: Use the Event Stream to create and populate a Java Object Model. It can actually create and populate multiple Object Models concurrently (i.e. in a single pass of the message), which can be very useful when splitting messages (see below). It can create and populate Object Models whose hierarchies don't “line up” with that of the Source message. 2. Java to Java Transforms: Transform between Java Object Models of different types. 3. Message Splitting & Routing: Split up a message and route the “split messages” to different destinations (with native support for File, JMS and Database destinations). Supports conditional routing (Content Based Routing) of each split message to multiple destinations of different types and in different formats (concurrently) e.g. XML1 to D1, Java1 to D2, Java2 to D3, EDI to D4 etc. Supports complex splits, where each split message contains data from different sub-hierarchies of the Source message i.e. not just dumb XPath based fragment splitting. 4. Huge Message Processing: Process GB size messages through Transformation, Splitting & Routing, or Persistence. 5. Message Enrichment: Enrich messages with data from external sources (e.g. a Database). Using a Splitting & Routing example, imagine splitting Order-Item messages out of an Order message, where the Customer details in each Order-Item split-message needs to be enriched with additional Customer info (e.g. addressing info), before routing the Order-Item split-message to a partner interface. 6. Fragment Based Transforms: Develop modular transformation logic, using a number of supported technologies (FreeMarker, XSLT, StringTemplate, Java, Groovy) and target them at message fragments. Avoid monolithic transformation solutions that are difficult to maintain. Avoid resource-hungry transformation pipelines. Supports mixing of technologies within the context of a single message transform. Smooks is proving to be a very useful tool in the ESB/SOA world. It has been part of JBossESB since the early days and has, more recently, been made available in a number of other ESB Platforms (Mule and Apache Synapse/WS02, with others to follow). For more on Smooks and the features outlined above, visit the project pages and download the v1.0 distribution. There are lots of well documented examples you can run out-of-the-box. Your feedback will be greatly appreciated!!Threaded Messages (16)
- Smooks project page by Daniel Bevenius on May 08 2008 11:53 EDT
- Re: Smooks project page by Joseph Ottinger on May 08 2008 12:14 EDT
- How does it compare with JIBX and JAXB by chetan sathya on May 11 2008 02:43 EDT
- Re: How does it compare with JIBX and JAXB by Maurice Zeijen on May 13 2008 03:12 EDT
- Performance by Marc Stock on May 08 2008 14:36 EDT
- Re: Performance by Tom Fennelly on May 08 2008 17:35 EDT
- Re: Performance by Mark Little on May 08 2008 17:36 EDT
-
Re: Performance by Tom Fennelly on May 08 2008 05:46 EDT
- Re: Performance by Mark Little on May 08 2008 05:52 EDT
-
Re: Performance by Tom Fennelly on May 08 2008 05:46 EDT
- XStream, java queues and db optimization by Sunil n Abinash - on May 08 2008 22:23 EDT
- Re: XStream, java queues and db optimization by Tom Fennelly on May 09 2008 02:51 EDT
- Re: XStream, java queues and db optimization by Maurice Zeijen on May 09 2008 03:41 EDT
- Re: XStream, java queues and db optimization by Tom Fennelly on May 09 2008 02:51 EDT
- RE: Comparable tool by Sarwar Bhuiyan on May 09 2008 11:38 EDT
- Re: RE: Comparable tool by Tom Fennelly on May 09 2008 14:33 EDT
- Re: RE: Comparable tool by Tom Fennelly on May 10 2008 11:30 EDT
- Re: RE: Comparable tool by Tom Fennelly on May 09 2008 14:33 EDT
- RE: Comparable tool by Sarwar Bhuiyan on May 09 2008 11:40 EDT
-
Smooks project page[ Go to top ]
- Posted by: Daniel Bevenius
- Posted on: May 08 2008 11:53 EDT
- in response to Tom Fennelly
Looks like something went wrong with the links in the article...The Smooks project page can be found here: http://milyn.codehaus.org/Smooks Regards, Daniel -
Re: Smooks project page[ Go to top ]
- Posted by: Joseph Ottinger
- Posted on: May 08 2008 12:14 EDT
- in response to Daniel Bevenius
Looks like something went wrong with the links in the article...The Smooks project page can be found here:
Bleah. Know what happened? Those stupid &rquo; things happened - all of the links had the entities around the attributes instead of the regular ascii quotes. Sorry about that. It's been fixed.
http://milyn.codehaus.org/Smooks
Regards,
Daniel -
How does it compare with JIBX and JAXB[ Go to top ]
- Posted by: chetan sathya
- Posted on: May 11 2008 02:43 EDT
- in response to Daniel Bevenius
Hi Smooks sounds like a great alternative to JIBX that I am using in one of our projects. How does it compare with JIBX? -
Re: How does it compare with JIBX and JAXB[ Go to top ]
- Posted by: Maurice Zeijen
- Posted on: May 13 2008 03:12 EDT
- in response to chetan sathya
Hi
We haven't done comparisons yet between the Javabean cartridge and other comparable libraries like JIBX and JAXB. You must know that the Javabean cartridge is only a part of Smooks, mainly used within a Transformation. For instance from EDI -> Javabeans -> XML. We should however do some comparisons some day. I haven't actually worked with JIBX yet. But I took a quick look on their website. Here are some differences between JIBX and Smooks: - JIBX uses a compiler to enhance the bean classes to be able the marshal and unmarshal XML. Smooks uses (cached) reflection to call the bean methods. In the next version we will probably use Byte Code Manipulation to generate classes that call the bean methods, which is a lot faster. - With JIBX you can directly marshal objects to XML. With Smooks you need to create a template file, using Freemarker or StringTemplate, to be able to write XML. - I am not sure if JIBX can handle the flexible selection of nodes as Smooks can. What I mean is that I think that Smooks can handle a better mismatch between the XML and Javabeans. - JIBX can only work with XML and Smooks can work with about everything structured. The following advice is based on my gut feeling, because I haven't done an actual benchmark. If you simply want to marshal and unmashal XML and you don't mind the compile step then use JIBX (or JAXB) but if you need more flexibility, more control or need to process something else then XML, use Smooks.
Smooks sounds like a great alternative to JIBX that I am using in one of our projects.
How does it compare with JIBX? -
Performance[ Go to top ]
- Posted by: Marc Stock
- Posted on: May 08 2008 14:36 EDT
- in response to Tom Fennelly
Are there any performance metrics available for this framework? -
Re: Performance[ Go to top ]
- Posted by: Tom Fennelly
- Posted on: May 08 2008 17:35 EDT
- in response to Marc Stock
We don't have a formally documented set of metrics. We've performed quite a few tests (of different kinds) in order to give ourselves an idea as to the overhead. For example, comparing a very simple streamed XSL transform inside and outside Smooks, our tests show Smooks adding approx 5%. Two points to note here. however: 1. The XSL in question was very simple and the test was loaded in favor of the XSL processor. If the transformation becomes more complex, requiring more random access, or simply requiring transforms not easily performed with XSLT e.g. date/string manipulation, Smooks offers more options re performing the transform in a more performant manner. In this case, we were basically interested in seeing the worse case scenario from a Smooks perspective. 2. All non-commercial XSL Processors we tried broke down once the input message reached a certain size (< 100Mb). Smooks was able to implement and perform the same transform up to 4Gb (was as far as we went) by implementing fragment based transforms. Greater than 100Mb, you might say! Well sure, not many need this capability but that's not the only point there IMO. Smooks ability to offer alternatives and to keep going is in itself the interesting point (for me at least :-) ). We implemented performance tests of other kinds too, but the problem is that we don't seem to have anything to compare against in order to make real sense of any numbers. We've also done extensive profiling on Smooks in an effort to eliminate memory leaks etc. The code is written with a close eye to performance (the whole visitor model is stateless etc). Please take a look!! There are of course many more tests we could run e.g. comparing the Java Binding functionality with frameworks such as XStream, JAXB etc, but the bottom line is... Smooks is in use in quite a few mission critical envs now and performance is not something we've had any "complaints" about (quite the opposite in fact). This of course is a somewhat hollow statement so one would really need to take Smooks and try it in their solution. I'm sure anyone that tries it will not be unhappy. We're not claiming to be faster than X or Y. We're hoping people can look at Smooks and see that it has many quite cool capabilities that can not be achieved easily elsewhere (at least in open source). We're convinced that the flexibility, maintainability etc that it offers will more than compensate for any performance related shortcomings it *might* have. -
Re: Performance[ Go to top ]
- Posted by: Mark Little
- Posted on: May 08 2008 17:36 EDT
- in response to Marc Stock
Tom did some performance comparisons with XSLT a year ago. Not sure if he has anything more recent. But I'm sure if he has he'll post here. -
Re: Performance[ Go to top ]
- Posted by: Tom Fennelly
- Posted on: May 08 2008 17:46 EDT
- in response to Mark Little
Tom did some performance comparisons with XSLT a year ago. Not sure if he has anything more recent. But I'm sure if he has he'll post here.
Hey Mark... thanks :-) So those XSLT comparison tests were performed using the Smooks v0.9 codebase, which didn't support the SAX based processing model. Now that Smooks supports a SAX processing model, those figures have swung back in Smooks favor (considerably :-) ). -
Re: Performance[ Go to top ]
- Posted by: Mark Little
- Posted on: May 08 2008 17:52 EDT
- in response to Tom Fennelly
Yeah, I figured they'd now represent a worst-case scenario. But I knew you'd have something ready :-) -
XStream, java queues and db optimization[ Go to top ]
- Posted by: Sunil n Abinash -
- Posted on: May 08 2008 22:23 EDT
- in response to Tom Fennelly
Looks good, it solves a very core processing issue in many systems - especially financial world. Data is transformed and enriched as it passes through multiple departments (ledger and subledgers) and reconciled across departments. Within a single application data undergoes through multiple transformations. So support for java queues as a destination will enhance the throughput - transaction semantics and failover can be taken care of at the boundaries. When database is a destination throughput becomes a concern if records are written one after another - batching is one solution. Some benchmarks we have done internally on db performance are at http://onelinejdbc.wiki.sourceforge.net/PerformanceComparison. Finally on comparing with tools like XStream. We use XStream in our product in places where data to be dealt with is low. On comparing XStream with a straight java reflection based binding (public fields or setters) reflection based binding is twice as fast as XStream (with XStream using XPP). So if Smooks is using reflection where field and setter method objects are cached it should be fine. Thanks Sunil & Abinash http://sunilabinash.vox.com/ http://oneline.wiki.sourceforge.net/index.html -
Re: XStream, java queues and db optimization[ Go to top ]
- Posted by: Tom Fennelly
- Posted on: May 09 2008 02:51 EDT
- in response to Sunil n Abinash -
Finally on comparing with tools like XStream. We use XStream in our product in places where data to be dealt with is low. On comparing XStream with a straight java reflection based binding (public fields or setters) reflection based binding is twice as fast as XStream (with XStream using XPP). So if Smooks is using reflection where field and setter method objects are cached it should be fine.
Maurice Zeijen is currently looking at optimizing this using Javassist. We hope this will boost Java Binding performance considerably. -
Re: XStream, java queues and db optimization[ Go to top ]
- Posted by: Maurice Zeijen
- Posted on: May 09 2008 03:41 EDT
- in response to Tom Fennelly
In Smooks 1.0 setter methods are being cached. I am currently working on a BCM enhanced setter mechanism. It looks like that calling the setter methods with a Javassist generated class, instead of a reflective method, makes the invocation of the setter method 10x faster. But I am still at an early stage... -
RE: Comparable tool[ Go to top ]
- Posted by: Sarwar Bhuiyan
- Posted on: May 09 2008 11:38 EDT
- in response to Tom Fennelly
Sounds a lot like what OpenAdaptor does (http://www.openadaptor.org) Sarwar -
Re: RE: Comparable tool[ Go to top ]
- Posted by: Tom Fennelly
- Posted on: May 09 2008 14:33 EDT
- in response to Sarwar Bhuiyan
Sounds a lot like what OpenAdaptor does (http://www.openadaptor.org)
Hey Sarwar. Looks like an interesting project and looks to be covering some of the same usecases. I'll download and have a look at it. Thanks for pointing it out :-) -
Re: RE: Comparable tool[ Go to top ]
- Posted by: Tom Fennelly
- Posted on: May 10 2008 11:30 EDT
- in response to Tom Fennelly
I had a quick look at openadaptor and it looks to me as though it has more in common with ESB i.e. exposing endpoints of different types (Http, JMS...), with routing and of course it has some conversion capabilities.Sounds a lot like what OpenAdaptor does (http://www.openadaptor.org)
Hey Sarwar.
Looks like an interesting project and looks to be covering some of the same usecases. I'll download and have a look at it. Thanks for pointing it out :-) -
RE: Comparable tool[ Go to top ]
- Posted by: Sarwar Bhuiyan
- Posted on: May 09 2008 11:40 EDT
- in response to Tom Fennelly
OpenAdaptor has been rewritten recently to be a spring application that relies on ApplicationContext.xml to declare all the connectors, processors, convertors and the routing logic. There is also support for enriching data through a scriptprocessor bean using javascript. For most needs, the out of the box components are sufficient and do not require any custom java code to be written. However, the interfaces are there to be implemented to write custom components if required. Sarwar