Data Transformation & Processing with Smooks v1.0

Discussions

News: Data Transformation & Processing with Smooks v1.0

  1. The Smooks team is proud to announce the release of Smooks v1.0. The most commonly accepted definition of Smooks would be that it is a "Transformation Engine". However, at it's core, Smooks is just a "Structured Data Event Stream Processor". The core code makes no mention of "data transformation". It is designed simply to support hooking of custom "Visitor logic" into an Event Stream produced by a data Source of some kind (XML, CSV, EDI, Java etc). Of course, the most common application of this will be in the creation of Transformation solutions i.e. implementing Visitor logic that uses the Event Stream, produced from a Source of one type (XML, EDI, CSV, Java etc), to produce a Result of some other type (XML, EDI, CSV, Java etc). These Event Stream Processing capabilities enable more than just Message Transformation. We have implemented a range of other solutions on top of this basic processing model: 1. Java Binding Framework: Use the Event Stream to create and populate a Java Object Model. It can actually create and populate multiple Object Models concurrently (i.e. in a single pass of the message), which can be very useful when splitting messages (see below). It can create and populate Object Models whose hierarchies don't “line up” with that of the Source message. 2. Java to Java Transforms: Transform between Java Object Models of different types. 3. Message Splitting & Routing: Split up a message and route the “split messages” to different destinations (with native support for File, JMS and Database destinations). Supports conditional routing (Content Based Routing) of each split message to multiple destinations of different types and in different formats (concurrently) e.g. XML1 to D1, Java1 to D2, Java2 to D3, EDI to D4 etc. Supports complex splits, where each split message contains data from different sub-hierarchies of the Source message i.e. not just dumb XPath based fragment splitting. 4. Huge Message Processing: Process GB size messages through Transformation, Splitting & Routing, or Persistence. 5. Message Enrichment: Enrich messages with data from external sources (e.g. a Database). Using a Splitting & Routing example, imagine splitting Order-Item messages out of an Order message, where the Customer details in each Order-Item split-message needs to be enriched with additional Customer info (e.g. addressing info), before routing the Order-Item split-message to a partner interface. 6. Fragment Based Transforms: Develop modular transformation logic, using a number of supported technologies (FreeMarker, XSLT, StringTemplate, Java, Groovy) and target them at message fragments. Avoid monolithic transformation solutions that are difficult to maintain. Avoid resource-hungry transformation pipelines. Supports mixing of technologies within the context of a single message transform. Smooks is proving to be a very useful tool in the ESB/SOA world. It has been part of JBossESB since the early days and has, more recently, been made available in a number of other ESB Platforms (Mule and Apache Synapse/WS02, with others to follow). For more on Smooks and the features outlined above, visit the project pages and download the v1.0 distribution. There are lots of well documented examples you can run out-of-the-box. Your feedback will be greatly appreciated!!

    Threaded Messages (16)

  2. Smooks project page[ Go to top ]

    Looks like something went wrong with the links in the article...The Smooks project page can be found here: http://milyn.codehaus.org/Smooks Regards, Daniel
  3. Re: Smooks project page[ Go to top ]

    Looks like something went wrong with the links in the article...The Smooks project page can be found here:

    http://milyn.codehaus.org/Smooks

    Regards,

    Daniel
    Bleah. Know what happened? Those stupid &rquo; things happened - all of the links had the entities around the attributes instead of the regular ascii quotes. Sorry about that. It's been fixed.
  4. Hi Smooks sounds like a great alternative to JIBX that I am using in one of our projects. How does it compare with JIBX?
  5. Hi

    Smooks sounds like a great alternative to JIBX that I am using in one of our projects.

    How does it compare with JIBX?
    We haven't done comparisons yet between the Javabean cartridge and other comparable libraries like JIBX and JAXB. You must know that the Javabean cartridge is only a part of Smooks, mainly used within a Transformation. For instance from EDI -> Javabeans -> XML. We should however do some comparisons some day. I haven't actually worked with JIBX yet. But I took a quick look on their website. Here are some differences between JIBX and Smooks: - JIBX uses a compiler to enhance the bean classes to be able the marshal and unmarshal XML. Smooks uses (cached) reflection to call the bean methods. In the next version we will probably use Byte Code Manipulation to generate classes that call the bean methods, which is a lot faster. - With JIBX you can directly marshal objects to XML. With Smooks you need to create a template file, using Freemarker or StringTemplate, to be able to write XML. - I am not sure if JIBX can handle the flexible selection of nodes as Smooks can. What I mean is that I think that Smooks can handle a better mismatch between the XML and Javabeans. - JIBX can only work with XML and Smooks can work with about everything structured. The following advice is based on my gut feeling, because I haven't done an actual benchmark. If you simply want to marshal and unmashal XML and you don't mind the compile step then use JIBX (or JAXB) but if you need more flexibility, more control or need to process something else then XML, use Smooks.
  6. Performance[ Go to top ]

    Are there any performance metrics available for this framework?
  7. Re: Performance[ Go to top ]

    We don't have a formally documented set of metrics. We've performed quite a few tests (of different kinds) in order to give ourselves an idea as to the overhead. For example, comparing a very simple streamed XSL transform inside and outside Smooks, our tests show Smooks adding approx 5%. Two points to note here. however: 1. The XSL in question was very simple and the test was loaded in favor of the XSL processor. If the transformation becomes more complex, requiring more random access, or simply requiring transforms not easily performed with XSLT e.g. date/string manipulation, Smooks offers more options re performing the transform in a more performant manner. In this case, we were basically interested in seeing the worse case scenario from a Smooks perspective. 2. All non-commercial XSL Processors we tried broke down once the input message reached a certain size (< 100Mb). Smooks was able to implement and perform the same transform up to 4Gb (was as far as we went) by implementing fragment based transforms. Greater than 100Mb, you might say! Well sure, not many need this capability but that's not the only point there IMO. Smooks ability to offer alternatives and to keep going is in itself the interesting point (for me at least :-) ). We implemented performance tests of other kinds too, but the problem is that we don't seem to have anything to compare against in order to make real sense of any numbers. We've also done extensive profiling on Smooks in an effort to eliminate memory leaks etc. The code is written with a close eye to performance (the whole visitor model is stateless etc). Please take a look!! There are of course many more tests we could run e.g. comparing the Java Binding functionality with frameworks such as XStream, JAXB etc, but the bottom line is... Smooks is in use in quite a few mission critical envs now and performance is not something we've had any "complaints" about (quite the opposite in fact). This of course is a somewhat hollow statement so one would really need to take Smooks and try it in their solution. I'm sure anyone that tries it will not be unhappy. We're not claiming to be faster than X or Y. We're hoping people can look at Smooks and see that it has many quite cool capabilities that can not be achieved easily elsewhere (at least in open source). We're convinced that the flexibility, maintainability etc that it offers will more than compensate for any performance related shortcomings it *might* have.
  8. Re: Performance[ Go to top ]

    Tom did some performance comparisons with XSLT a year ago. Not sure if he has anything more recent. But I'm sure if he has he'll post here.
  9. Re: Performance[ Go to top ]

    Tom did some performance comparisons with XSLT a year ago. Not sure if he has anything more recent. But I'm sure if he has he'll post here.
    Hey Mark... thanks :-) So those XSLT comparison tests were performed using the Smooks v0.9 codebase, which didn't support the SAX based processing model. Now that Smooks supports a SAX processing model, those figures have swung back in Smooks favor (considerably :-) ).
  10. Re: Performance[ Go to top ]

    Yeah, I figured they'd now represent a worst-case scenario. But I knew you'd have something ready :-)
  11. Looks good, it solves a very core processing issue in many systems - especially financial world. Data is transformed and enriched as it passes through multiple departments (ledger and subledgers) and reconciled across departments. Within a single application data undergoes through multiple transformations. So support for java queues as a destination will enhance the throughput - transaction semantics and failover can be taken care of at the boundaries. When database is a destination throughput becomes a concern if records are written one after another - batching is one solution. Some benchmarks we have done internally on db performance are at http://onelinejdbc.wiki.sourceforge.net/PerformanceComparison. Finally on comparing with tools like XStream. We use XStream in our product in places where data to be dealt with is low. On comparing XStream with a straight java reflection based binding (public fields or setters) reflection based binding is twice as fast as XStream (with XStream using XPP). So if Smooks is using reflection where field and setter method objects are cached it should be fine. Thanks Sunil & Abinash http://sunilabinash.vox.com/ http://oneline.wiki.sourceforge.net/index.html
  12. Finally on comparing with tools like XStream. We use XStream in our product in places where data to be dealt with is low. On comparing XStream with a straight java reflection based binding (public fields or setters) reflection based binding is twice as fast as XStream (with XStream using XPP). So if Smooks is using reflection where field and setter method objects are cached it should be fine.
    Maurice Zeijen is currently looking at optimizing this using Javassist. We hope this will boost Java Binding performance considerably.
  13. In Smooks 1.0 setter methods are being cached. I am currently working on a BCM enhanced setter mechanism. It looks like that calling the setter methods with a Javassist generated class, instead of a reflective method, makes the invocation of the setter method 10x faster. But I am still at an early stage...
  14. RE: Comparable tool[ Go to top ]

    Sounds a lot like what OpenAdaptor does (http://www.openadaptor.org) Sarwar
  15. Re: RE: Comparable tool[ Go to top ]

    Sounds a lot like what OpenAdaptor does (http://www.openadaptor.org)
    Hey Sarwar. Looks like an interesting project and looks to be covering some of the same usecases. I'll download and have a look at it. Thanks for pointing it out :-)
  16. Re: RE: Comparable tool[ Go to top ]

    Sounds a lot like what OpenAdaptor does (http://www.openadaptor.org)


    Hey Sarwar.

    Looks like an interesting project and looks to be covering some of the same usecases. I'll download and have a look at it. Thanks for pointing it out :-)
    I had a quick look at openadaptor and it looks to me as though it has more in common with ESB i.e. exposing endpoints of different types (Http, JMS...), with routing and of course it has some conversion capabilities.
  17. RE: Comparable tool[ Go to top ]

    OpenAdaptor has been rewritten recently to be a spring application that relies on ApplicationContext.xml to declare all the connectors, processors, convertors and the routing logic. There is also support for enriching data through a scriptprocessor bean using javascript. For most needs, the out of the box components are sufficient and do not require any custom java code to be written. However, the interfaces are there to be implemented to write custom components if required. Sarwar