Processing EDI, XML, CSV and more with Smooks v1.2

Discussions

News: Processing EDI, XML, CSV and more with Smooks v1.2

  1. Processing EDI, CSV and other non XML data (and XML) can often be a real headache. The Smooks project has added some very interesting enhancements in these and other areas in the v1.2 release:
    • EDIFACT Java Compiler (EJC): EJC greatly simplifies the process of binding EDI data to a Java object graph. EJC is similar to JAXBs XJC, accept for EDI messages. This is just the first of a number of developer optimization features we are in the process of adding in the area of EDI message processing.
    • Entity Persistence Framework Support: Reuse your Entity Persistence resources (like Hibernate, iBATIS or any JPA compatible resource) to persist or enrich messages of any format (EDI, XML etc). Access a database and use it’s query language, or CRUD methods, for reading and writing to the database.
    • Validation: Perform Rule based message fragment validation on messages of any format (EDI, XML etc). Supports Data Field level validation using regular expressions, or Business Rule compliance validation using MVEL expressions.
    • Simplified CSV Processing: Bind CSV records to Java objects in seconds using a much simplified XML configuration, or using a very simple programmatic API.
    • Improved Programmatic APIs: Prior to Smooks v1.2, programmatic configuration was not one of Smooks’ strengths. In Smooks v1.2 we’ve made significant improvements in this area. Most Smooks features can now be utilised through Java, without writing a line of XML.
    • As well as these new features, Smooks v1.2 includes numerous bug fixes. Please download Smooks v1.2 and let us know what you think. We hope you find it useful!!

    Threaded Messages (22)

  2. Congratulations. Keep up the good work! --Kurt
  3. Looks good, esp. EDIFACT support!
  4. This looks cleaner than the usual BPEL transformation way with xpath. Anyone used Smooks with JBI?
  5. Not to take anything away from your accomplishments but you'd be much better off using JiBX (or hibernate, for that matter) as an example for object binding than XJC. XJC was and still is a terrible idea and teams that use it are generating themselves into a maintenance catastrophe. I was really interested when I saw something about EDI support but I can't stand the XJC approach.
  6. Not to take anything away from your accomplishments but you'd be much better off using JiBX (or hibernate, for that matter) as an example for object binding than XJC.

    XJC was and still is a terrible idea and teams that use it are generating themselves into a maintenance catastrophe.

    I was really interested when I saw something about EDI support but I can't stand the XJC approach.
    I hear you James. However, EJC (the "XJC approach") is just one option re binding EDI messages to Java using Smooks. It's not mandated in any way. Smooks does also support an approach more akin to JiBX and in fact, prior to Smooks v1.2, this was the only approach available. The JiBX type approach is often more appealing where the target object model is already in existance and, perhaps, doesn't "line up" with the source EDI message model. Having looked at some of the existing EDI message standards and how they are specified, I think a JiBX style approach (not picking on JiBX in any way - just that it was the example you used :) ) is often quite infeasible.
  7. Smooks does also support an approach more akin to JiBX and in fact, prior to Smooks v1.2, this was the only approach available.

    That's good to know. I'll keep Smooks in mind then.
    The JiBX type approach is often more appealing where the target object model is already in existance and, perhaps, doesn't "line up" with the source EDI message model. Having looked at some of the existing EDI message standards and how they are specified, I think a JiBX style approach (not picking on JiBX in any way - just that it was the example you used :) ) is often quite infeasible.
    Can you give an example of when the JiBX approach would be infeasible? EDI documents are structured hierarchically, are they not? The only format I really have had to deal with is X12 and I don't see how the JiBX approach wouldn't work.
  8. The JiBX type approach is often more appealing where the target object model is already in existance and, perhaps, doesn't "line up" with the source EDI message model. Having looked at some of the existing EDI message standards and how they are specified, I think a JiBX style approach (not picking on JiBX in any way - just that it was the example you used :) ) is often quite infeasible.


    Can you give an example of when the JiBX approach would be infeasible? EDI documents are structured hierarchically, are they not? The only format I really have had to deal with is X12 and I don't see how the JiBX approach wouldn't work.
    I'm not saying the JiBX approach wouldn't/couldn't work. I acknowledge that it can "work"... people have used the "traditional" Smooks approach (which is similar to that of JiBX) to define mappings for a range of different EDI formats to their own Java Object models and have been happy with that (Hl7 and others). My feelings on a "generated model" approach (XJC or EJC) are that (suckie models aside), they can make sense in situations where:
    1. The message definition is very stable and so you're not as likely to run into the issues of supporting multiple versions of the message.
    2. The message definition is very complex, both in terms of how it is defined, and the sheer level of detail involved. Look at the UN/EDIFACT message definitions - unfamiliar/non-standard definition formats... not talking about a "commons" standards structured definition like an XSD (not that XSD is all that nice).
    This is the first Smooks release to contain EJC. We may be wrong and time will prove that one way or the other. I definitely think the EJC approach is worth pursuing for certain use cases for a release or two... see how users use it and how it works on well defined message formats. I totally agree with you re the crap models that can result from generated approaches. This is certainly a major challenge for EJC, but may be something that many people are willing to live with when offered the alternatives of unraveling complex and detailed message definitions that are defined in a very alien format.
  9. I totally agree with you re the crap models that can result from generated approaches. This is certainly a major challenge for EJC, but may be something that many people are willing to live with when offered the alternatives of unraveling complex and detailed message definitions that are defined in a very alien format.
    Even for stable message formats, I don't really think it makes sense to generate objects that match the message format. You basically end up with two options if you generate code from a message format: 1. Use the generated objects as your domain model. 2. Map from your real domain model to the generated objects. The first option is unappealing because unless you are lucky enough to have a message format that exactly matches your domain model in all cases, you end up with an artificial domain model that will be at best awkward to work with. It will make it almost impossible to use good OO design techniques. The second option renders the solution moot. If you are generating code and then manually mapping you domain model to those objects (which is what I usually see) then what has been gained by generating the objects? I will say that I find XML mapping specifications to be less than optimal. I think a very targeted DSL would be the best option but that also has trade-offs.
  10. I totally agree with you re the crap models that can result from generated approaches. This is certainly a major challenge for EJC, but may be something that many people are willing to live with when offered the alternatives of unraveling complex and detailed message definitions that are defined in a very alien format.


    Even for stable message formats, I don't really think it makes sense to generate objects that match the message format.

    You basically end up with two options if you generate code from a message format:

    1. Use the generated objects as your domain model.
    2. Map from your real domain model to the generated objects.
    If you don't think it makes sense, then you always have the "traditional" approach available with Smooks i.e. define your own model and manually define the bindings. For the most part... I agree that option #2 above would not make sense, but only if you have another "way" of getting the data out of the EDI message stream and into your home grown Java Object model (which you do with the "traditional" approach). If you don't have another way, what would you do? I've no doubt people using EJC will not be happy with parts of the generated model. I'd also be fairly sure there will also be people that will be perfectly happy to use the generated model (as their app domain model) when the alternative is a heavy slog manually defining and testing manual bindings to their perfectly handcrafted model. At the end of the day, the user can make the choice and I can see merit in both points of view
  11. I totally agree with you re the crap models that can result from generated approaches. This is certainly a major challenge for EJC, but may be something that many people are willing to live with when offered the alternatives of unraveling complex and detailed message definitions that are defined in a very alien format.


    Even for stable message formats, I don't really think it makes sense to generate objects that match the message format.

    You basically end up with two options if you generate code from a message format:

    1. Use the generated objects as your domain model.
    2. Map from your real domain model to the generated objects.


    If you don't think it makes sense, then you always have the "traditional" approach available with Smooks i.e. define your own model and manually define the bindings. For the most part... I agree that option #2 above would not make sense, but only if you have another "way" of getting the data out of the EDI message stream and into your home grown Java Object model (which you do with the "traditional" approach). If you don't have another way, what would you do?
    There are innumerable ways to do it. The mapping approach has been the best option up to this point in my experience. I would turn it around and say: if you have the mapping option, why would you use generation?


    I've no doubt people using EJC will not be happy with parts of the generated model. I'd also be fairly sure there will also be people that will be perfectly happy to use the generated model (as their app domain model) when the alternative is a heavy slog manually defining and testing manual bindings to their perfectly handcrafted model. At the end of the day, the user can make the choice and I can see merit in both points of view
    What heavy slog? The amount of work is basically the same. I've done both JAXB and JiBX and JiBX is by far less work and far easier to maintain (and it's faster at runtime, go figure.) It's just that one is in a mapping file and the other is in Java code. If your message format is stable, you won't need to do it again unless your domain model changes. It really makes no sense at all to use the message format to structure your object model. Would you base the structure of business objects in a web app on HTML? And the other things is that when you generate the objects base on the message, you often end up even more work when doing things like DB persistence (unless you structure your database on the message format too, I guess.)
  12. The mapping approach has been the best option up to this point in my experience. I would turn it around and say: if you have the mapping option, why would you use generation?
    I thought I did that already, on at least 2 occasions :( Anyway, I think we could bat this one back and fort forever. Bottom line... you have both options with Smooks i.e. the manual mapping/JiBX approach, or EJC. If you don't like EJC... use the manual mapping approach. - An example of the manual mapping approach. - An example of the EJC approach. You can always give EJC a try in a given situation.... if you're happy with the results, then use it... if not you can scrap it and handcraft your prefered model, or tweak the EJC generated artifacts to something you're happier with etc. Options are not a bad thing, are they??
  13. The mapping approach has been the best option up to this point in my experience. I would turn it around and say: if you have the mapping option, why would you use generation?


    I thought I did that already, on at least 2 occasions :(
    Right and that's what I was speaking to. This was in response to your question. I really was trying to address the questions Nicolas had asked. The upshot is that I'm interested in the less manual process to mapping (a.k.a JiBX) style. You have that: awesome! If you want to provide an XJC style, that's really no concern of mine. I'm not going to choose to use it and I will continue to recommend not using that approach. I think maybe you feel I am attacking you for providing this. I do not at all. That Smooks provides more options is a good thing. That I don't like one option doesn't mean I think Smooks is bad. I just think that the generated code option feels safe and comfortable to novice-to-intermediate developers but really makes very little sense at a high-level.
  14. I think maybe you feel I am attacking you for providing this. I do not at all.
    Not at all James... I really appreciate you making the effort to share your thoughts on the subject. And I hope you don't think I was being over-defencive, because that's not my intention... I need to be open to all comment and I try to do that :) You have given me lots to think about, so it's not like I wasn't listening. Thanks for that!!
    I just think that the generated code option feels safe and comfortable to novice-to-intermediate developers but really makes very little sense at a high-level.
    I think everything you are saying makes total sense. I acknowledge the issues with the generated approach. As I said earlier... I was also burned badly by JAXB!! I also had the "pleasure" of digging through a fair bit of the JAXB source to create a hack workaround for a problem we had, which was fun. We are trying to implement EJC differently to JAXB/XJC (perhaps we should have never chosen the name "EJC", linking it with XJC/JAXB and all the baggage/connotations that carries lol). Hopefully it will be a lot easier to influence what is generated (no annotations or the other crap), but at the end of the day it will never be as clean as a hand crafted solution and I'd never try to suggest that it could be. But I do think there are plenty of people that will be happy with what it does for them (I hope :) ). Thanks again James and please give Smooks a try, provide more feedback etc. And remember... there's a lot more to it than binding EDI to Java! :)
  15. You basically end up with two options if you generate code from a message format:

    1. Use the generated objects as your domain model.
    2. Map from your real domain model to the generated objects.
    In most case i end up with option 2 : I define an xsd for the XML. So when I edit the XML, i have completion and validation. I can check the format XML at runtime too, so i do not have to code error detection. Then with jaxb, i have a java model for free, and i can map it to my own business model in no time. I do not have to bother with the XML parser, DOM or things like that. Using JaxB and then map to your object domain is the same question as to directly use the object mapped by hibernate (or why not even JDBC directly) or transform it to an application object, and after in a bean / value object. I do say that in a sence, directly use SQL (via JDBC) is not as bad as it appear, and you can save lot of time in the end. After all that the design of many projects coded in PHP/MySQL. So it's a matter of what you do, what you need... The question is not what you do, but why.
  16. You basically end up with two options if you generate code from a message format:

    1. Use the generated objects as your domain model.
    2. Map from your real domain model to the generated objects.


    In most case i end up with option 2 :

    I define an xsd for the XML. So when I edit the XML, i have completion and validation. I can check the format XML at runtime too, so i do not have to code error detection.
    The problem with schema validation is that it tends to produce useless or misleading error messages. It's convenient only to the developer. If you are trying to produce high quality software, you should try to produce meaningful fault explanations.
    Then with jaxb, i have a java model for free, and i can map it to my own business model in no time. I do not have to bother with the XML parser, DOM or things like that.

    Using JaxB and then map to your object domain is the same question as to directly use the object mapped by hibernate (or why not even JDBC directly) or transform it to an application object, and after in a bean / value object.
    If you use JiBX, you just define the mapping from the XML to your domain objects. It's similar to the way hibernate works. You don't have to write any code to do the mapping. JiBX doesn't make you think about DOM or SAX or any of that. You have nothing in your code related to XML or schemas at all (unlike XJC generated code).
  17. XJC is just JAXB, isn't it?
  18. @James Watson Hi James, i use JaxB in several projects and all is fine... Could you add some explanations of why Xjc is not a good idea ? And what XML solution do you use instead ?
  19. @James Watson

    Hi James, i use JaxB in several projects and all is fine...

    Could you add some explanations of why Xjc is not a good idea ?

    And what XML solution do you use instead ?
    I know I'm not James, but if I could give one perspeective on this... Where I've seen JAXB become a maintainance nightmare is where it was used on data/message types that are quite fluid in nature and on an application that needs to be compatible with multiple versions of that data/message. In this case, we ended up with multiple versions of the model and it was a total mess. What we wanted in this situation as a single model that covered all versions, and a framework for binding to that model (from any versions) that took care of the deltas between the versions in a clean way. JAXB (and XJC) were not that solution, at the time at least. I'm sure there are plenty of other use cases in which JAXB works very well e.g. with JAX-WS based solutions. I think the same will apply to Smooks EJC... there will be places where it will work well, and places where it's not such a good idea. Smooks offers approaches along the lines of JiBX (etc) if you feel EJC is not a good longterm solution. Where we hope EJC will be a big benefit will be on complex standards based messages (X12, SWIFT etc). IMO, manually defining the bindings for these messages would be a nigthmare of a different kind. We hope to be able to build more tools on top of EJC that can consume the standard definitions for these message types. I'm hoping the generated model approach (XJC/EJC) is valid in this type of situation considering the complexity of the messages and the fact that they are not "as" fluid. At the end of the day... having choices is not a bad thing I think :)
  20. Tom, thanks for your reply. I really understand that having to support several XML version of the same message may be complex in all cases and that jaxb may not be the solution. At least you can think about having xsd/generated code for each version of the message. It can work if you have a few version of it and can extract the version without so much effort... Or simply if the new message format (defined by an xsd) support all possible versions. I guess that if the format has many more version and the format is very flexible jaxb will not help you at all. i realise that in my case, we use XML has configuration files most of the time, no data exchange with the rest of the world. We don't have to bother with old configuration file format. When we change it, we try to keep it compatible as possible, but that's all.
  21. @James Watson

    Hi James, i use JaxB in several projects and all is fine...

    Could you add some explanations of why Xjc is not a good idea ?

    And what XML solution do you use instead ?
    I agree with Tom up until the part about JAXB working for JAX-WS. The same problems exist there. In a nutshell, JAXB generates objects that line up with your schema i.e. your schema and object model are coupled together extremely tightly. 1. The first problem with this is that good schema design and good OO design are very different things. It's very difficult to end up with a good schema and a good object model. 2. The above leads to (in my experience) most everyone creating a second object model. The 'real' object model. Then you have to get the data from the 'real' object model to the JAXB object model. That's pretty pointless. You really haven't mapped anything. You've just shuffled the problem into a domain that java programmers are more comfortable with. I work with contractors now that wrote their own mapper from the generated types into their domain model. Doesn't that seem silly to anyone else? Then there are the maintenance issues: 1. You need to support multiple formats of XML. If you did number 2 above, you'll be OK but it's a lot of extra code that has no need to exist. 2. You want to create a new version of your XML. Really this is the same as the previous one but comes up the most often. 3. You want to convert from one format to another. If you use XJC and EJC you'll almost surely need two sets of generated classes and code to move the data around. Why not just create the object and use mappings to and from the various formats. You can have one type and map it to the DB, XML, EDI, JSON. Instead with tools like JAXB, this takes ages of development time because it doesn't really do anything of value. 4. JAXB tends to produce crap code. I've never tried but I've read that trying to use XJC with one of Amazon's webservices produces more than 1000 classes. If you aren't careful about how you write your schemas and use namespaces, you will end up with lots of redundant and non-reusable code.
  22. Create EDI[ Go to top ]

    What is the best way to create EDI files from Java objects using Smooks. Are there any examples available?
  23. Re: Create EDI[ Go to top ]

    What is the best way to create EDI files from Java objects using Smooks. Are there any examples available?
    At the moment, an EDI Result can be generated by simply applying a template. We are planning on extending EJC to support serialization of the EJC generated object model. Hopefully that will be in the next release.