Discussions

J2EE patterns: Data Binding Object Pattern (DBO)

  1. Data Binding Object Pattern (DBO) (32 messages)

    Context

    Data needs to be exchanged across tiers. In Java, binary serialization of Transfer Objects is essentially free. However, if we want text over HTTP, i.e. XML, then several possibilities exist. Accessing data from Transfer Objects varies depending on the Transfer Object. Serializing Transfer Objects to XML varies greatly depending on the approach (document model, code generated data binding, mapping based data binding, and so forth) and further complicated by the API chosen.

    Problem

    Text serialization using XML over HTTP is often a consideration if your J2EE application is concerned with interoperability with other applications, among other reasons. Much like there are many different persistent storage mechanisms, there are also many different mechanisms for working with XML and marked differences in the way these APIs are utilized.

    The way your application will work with XML will vary depending on the approach chosen. For example, if you take a document model approach, then your application will be very concerned about the structure of the XML document. To get to the data you want you’ll need to navigate parent-child and sibling relationships up and down the document tree.

    If you take an XML binding approach then you are much less concerned about, and less tightly coupled to, document structure. Generally, the XML document is transparent in a conversation with this approach.

    Examples of APIs for a document model approach are the W3C standard interfaces like DOM, and simpler Java centric solutions such as JDOM. For XML data binding we have even greater possibilities, which further complicate the issue. We can take a code generated approach such as JAXB and JBind or a mapping style based approach such as Castor XML Mapping.

    The many approaches and APIs to choose from offer challenges to the application and, as is often the case, potentially creates a dependency between the Transfer Object that is concerned with serialization and the XML serialization mechanism. If a Transfer Object needs to serialize or deserialize in XML, they can use the appropriate API. However, once we begin including the serialization code inside the Transfer Object then we establish a tight coupling between the application and the serialization mechanism.

    Forces

    • Transfer Objects (or Value Objects) want XML text serialization in addition to binary serialization.
    • Several approaches exist for working with XML such as data binding and document model.
    • APIs differ greatly depending on the approach taken to work with XML.
    • These APIs are not uniform.
    • Transfer Objects typically use proprietary APIs to perform serialization tasks.
    • Transfer Objects need to be transparent to the actual approach and API implementation to provide easy migration to other approaches and APIs.

    Solution

    Use a Data Binding Object (DBO) to abstract and encapsulate working with XML and handle the serialization tasks. The DBO manages working with the XML API to perform serialization and deserialization.

    The DBO implements the XML serialization (toXML) and deserialization (fromXML) tasks required to work with the XML API. This API could be a document model approach API such as JDOM, an XML Binding code generated approach such as JAXB, or an XML Binding mapping style approach such as Castor.

    The Transfer Object that relies on the DBO uses a simpler DBO interface that is exposed to its clients. The DBO will completely hide the implementation details from its clients since the interface exposed to the clients will not change when the underlying implementation changes. This pattern allows the DBO to adapt to different approaches and APIs without directly effecting the Transfer Object. The DBO is basically an adapter between Transfer Objects that need serialization and the XML API used to perform this serialization.

    (People generally haven’t been using UML, I suppose I could provide a link to some in the future)

    Structure

    A Transfer Object ‘uses’ a Data Binding Object that ‘encapsulates’ an XML Serialization API.

    Participants and Responsibilities

    Transfer Object: Also known as Value Object. This represents a Transfer Object used as a data carrier. It is the object, that for some useful purpose, we are concerned with serializing in XML text.

    Data Binding Object: The primary object in this pattern. Abstracts away the XML serialization approach and API utilized.

    XML Serialization API: Represents XML Serialization implementation. This could be JDOM, Castor, JAXB, or a host of others.

    Strategies
    Factory for Data Binding Objects strategy

    I tend to use the Abstract Factory pattern [Gang of Four] to make this pattern extremely sinuous to the application. Create a base DBO factory that is implemented by concrete DBO factories each supporting a different XML Serialization implementation. For example a Transfer Object could obtain a JDOMDBOFactory and use it to obtain the concrete DBO for that object which works with JDOM.

    Consequences

    • Text serialization transparency: TransferObjects can use the XML serialization API without knowing specific implementation details.
    • Easy migration to other APIs.
    • Reduces code complexity of Transfer Objects: All XML serialization is handled inside the implementing DBO.
    • Text serialization is centralized into another layer. If you’ve worked with some of the more complex serialization APIs you’ll appreciate this.
    • Adds extra layer. C’mon this is J2EE we’re in love with layers.

    Sample Code

    Finally some code… I’ll skip the DBO Factory stuff and assume you are familiar with Abstract Factories.

    For our sample we’ll examine a Transfer Object, Customer in Example 1 that wants to serialize itself into XML text. Barely more than a hello world type example, I’ll use some pretty trivial XML and Schema and some barely legal Java.

    Example 1

    public class Customer implements java.io.Serializable {
      // member variables
      String name;
      //
      // getter and setter methods...
      ...

      // performs XML text serialization
      public void toXmlString(Writer output) throws DataBindingException {
           CustomerDBO dbo
              = DBOFactory.getInstance().getCustomerDBO();
          dbo.toXmlString(this, output);
      }

      // perofmrs XML deserialization into Customer object
      public static Customer readXmlString(Reader input)
          throws DataBindingException {
          CustomerDBO dbo
              = DBOFactory.getInstance().getCustomerDBO();
          return dbo.readXmlString(input);
      }
    }

    Simple sample customer.xml:

    <?xml version="1.0" encoding="UTF-8"?>"
                "<Customer name="Name"/>

    Simple sample customer.xsd (cause we all like schemas):

    <?xml version="1.0" encoding="UTF-8"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:element name="Customer">
    <xs:complexType>
    <xs:attribute name="name" type="xs:string" use="required"/>
    </xs:complexType>
    </xs:element>
    </xs:schema>

    The CustomerDBO interface shown in Example 2 defines the XML serialization methods that are implemented by all concrete DBO implementations, such as JDOMCustomerDBO, CastorCustomerDBO, and JAXBCustomerDBO.

    Example 2

    // interface that all CustomerDBOs must support
    public interface CustomerDBO {

        public void toXmlString(Customer customer, Writer output)
            throws DataBindingException;

        public Customer readXmlString(Reader input)
            throws DataBindingException;
    }

    The JDOMCustomerDBO implements the CustomerDBO as shown in Example 3.

    Example 3

    public class JDOMCustomerDBO implements CustomerDBO {
        public void toXmlString(Customer customer, Writer output)
            throws DataBindingException {
            Element root = toXmlElement(customer);
            Document doc = new Document(root);
            XMLOutputter writer = new XMLOutputter();
            try {
                writer.output(doc, output);
            } catch (IOException e) {
                throw new DataBindingException("Error serializing " +
                    "customer", e);
            }
        }

        public Customer readXmlString(Reader input)
            throws DataBindingException {
            try {
                SAXBuilder builder = new SAXBuilder(
                    "org.apache.xerces.parsers.SAXParser", true );
                builder.setFeature(
                    "http://apache.org/xml/features/validation/schema", true );
                builder.setProperty(
                    "http://apache.org/xml/properties/schema/external-schemaLocation",
                    xsdURL );
                Document doc = builder.build(input);
                Element root = doc.getRootElement();
                Customer result = readXml(root);
                return result;
            } catch (MalformedURLException mue) {
                throw new DataBindingException("Could not get xsd url.", mue);
            } catch (IOException ioe) {
                throw new DataBindingException("Could not locate xsd.", ioe);
            } catch (JDOMException je) {
                je.printStackTrace();
                throw new DataBindingException("Error deserializing xml.", je);
            }
        }

        private Element toXmlElement(Customer customer) {
            Element root = new Element("Customer");
            root.setAttribute("name", customer.getName());
            return root;
        }

        private static Customer readXml(Element source) {
            Customer result = new Customer();
            result.setName(source.getAttributeValue("name"));
        }
    }

    Now let us take a look at an XML data binding mapping style approach using Castor in Example 4. We’ll omit the creation of the map here for simplicity.

    Example 4

    public class CastorCustomerDBO implements CustomerDBO {
      Mapping mapping = new Mapping();
        
    public void toXmlString(Customer customer, Writer output)
      // Load the mapping information from the file
      mapping.loadMapping( "customer_mapping.xml" );

      // Marshal the data
      Marshaller marshaller = new Marshaller(output);
      marshaller.setMapping(mapping);
      marshaller.marshal(customer);
    }

    public Customer readXmlString(Reader input)
          throws DataBindingException {
          // Load the mapping information from the file
         mapping.loadMapping( "mapping.xml" );

         // Unmarshal the data
         Unmarshaller unmar = new Unmarshaller(mapping);
         Customer customer = (Customer)unmar.unmarshal(new InputSource (input));
         return customer;
    }

    Using the DBO is a snap. You’ll probably most likely do this in some business object. This could be a remote façade of some sort or even a session bean. Example 5 ignores your implementation choice and strictly deals with the code to use the DBO for serialization.

    Example 5

    …
    String xmlData = null;

    public String someMethodToGetXML(Customer customer) {
        // serialize the customer and get the XML text.
        Writer output = new StringWriter();
        customer.toXmlString(output);
        xmlData = output.toString();
        return xmlData;
    }
    …

    Related Patterns

    Transfer Object
    Adapter
    Abstract Factory
    DAO (in spirit)

    Threaded Messages (32)

  2. Roland,

    First of all, let me be the first to congratulate you on this thought out, well formed post. It's structure and organization is far better than what I'm used to seeing in the patterns section on TSS.

    As for the pattern itself, I have some notes:

    - I don't think having a toXML method in the value object is good OO design. Turning a value object into XML is not an inherent operation of the object, but rather an operation than can be carried out *on* the object. Your problem description section shows this problem, by showing that there may be several different strategies to performing this operation. I don't think there is a need to hard-code one of them in the object itself, even if that is done by loading a DBO. What if I want to serialize the same value object with a different DBO? Why does the value object need to know which DBO I want to use?
    Just like you don't have a toBinary() method in the value object that serializes the object and returns it, I don't think you should have a toXML method. To continue this analogy, just like you have an ObjectOutputStream that does serialization, I think you should have a DBO that does XML serialization. This is a sometimes-missed fact, but ObjectOutputStream is actually an implementation of an abstract interface, ObjectOutput. This interface would be the analog of the abstract DBO interface.
    The call sequence should be:
    client -> DBO(serialize V), not:
    client -> V (serialize yourself) -> DBO (serialize V).
    The client can still fetch the DBO object from a factory.

    - The way you implement your DBO, it's very hard to use the DBOs recursively, which can be an important thing if you're using a hand-built solution (e.g, in your terms, a "document model" solution). It's hard because you pass along readers rather than some XML source. Each DBO opens a new parser and parses an entire document. This can't work recursively. Also, it makes it very costly and hard to create an "XML pipeline" where the XML output of one component is used as the XML input of another. For instance, if you want to use XSLT to make an HTML page out of your objects, you'll have to write the XML to some writer, then have the XSLT engine read and parse it all over again and then apply the transform.
    I think it's better to choose SAX as a way of moving XML around. It's very easy to go from SAX to reader/writer using a parser/SAX writer, and same goes for DOM. SAX can be used directly as a source of XML for other components in the pipeline as well. The reason I prefer SAX to DOM as the default, despite the fact that you can move between one and the other easily, is that it's cheaper. If you can use SAX instead of DOM, you should. Putting DOM in the interface forces you to go through a DOM tree and waste memory even if you don't need DOM's random access.

    - As an optimization, I think it's worth to consider writing thread-safe DBOs and having one instance of each rather than create a new one for each invocation. In my expirience these DBO objects don't need any member state that changes during their invocation, so they should have no problem with handling multiple threads at once. This is the case in your example DBOs.

    About the "adds extra layer" consequence:
    My first note makes this a bit better I think. The invocation sequence is not as complex, and the client has full control of the next non-trivial link in the invocation sequence (the DBO call). One of the problems of too many layers is that they make it harder for the client to control what happens lower down the invocation sequence. Any parameter the client wants to pass has to be passed all the way down the sequence. My suggestion brings the DBO and the functionality right up to the client's level. Configurations of the DBO are much easier to make this way (and in my expirience complex DBOs do tend to support quite a few different configuration options, and the values you want for each option change according to context).
    There is a saying, "every problem in software engineering can be solved by adding an extra layer of indirection". The cautious also add "except too many layers of indirection". Let's not fall too much in love with all these layers :)

    Gal
  3. Hi Gal,

    Thanks for taking the time to respond, I've benefited from several of your posts in the past and appreciate your input. Some additional comments:

    > - I don't think having a toXML method in the value object is good OO design. Turning a value object into XML is not an inherent operation of the object, but rather an operation than can be carried out *on* the object.

    In some respects I completeley agree with you. However, in my experience I've seen a need for this sort of 'built-in' serialization. When I indicated that we get binary serialization for free, I failed to point out that other platforms have this sort of XML serialization pre-fitted. Take .NET record sets for example. Data Transfer Objects (DTOs) are generally incomplex objects typically made up of simple accessors. DBOs can aid in a scenario where you need to build this automated mechanism yourself but don't want to tie your DTOs to any specific API. Fowler's Data Transfer Object for instance suggests the use of this sort of built-in mechanism to encapsulate serialization. The XML Business Delegate article on TSS also suggests the use of such design. DBOs can take these designs a step forward.

    But like I said, I'm also in agreement. A DBO certainly could be used in the scenario you describe: client -> DBO(serialize V). Maybe the pattern should be revised to have additional strategies outlining the issues we just described.

    > - The way you implement your DBO, it's very hard to use the DBOs recursively, which can be an important thing if you're using a hand-built solution (e.g, in your terms, a "document model" solution). It's hard because you pass along readers rather than some XML source.

    An excellent point. The devil is in the details isn't it? I struggled with the post in terms of the type of examples to give. I contemplated giving a pseudo code example much like they do on the Blueprints pages. Like:

    public class JDOMCustomerDBO {
    ...
        public void toXMLString(...) {
          // do some API specific stuff here
        }
    ...
    }

    What I'm trying to say is if you want to use an XML source other than a reader then that's cool. Recursion wasn't an issue for the implementation I copy and pasted from.

    > I think it's better to choose SAX as a way of moving XML around.

    I personally am leaning these days more towards the data binding approach myself. I use a document model approach in the initial stages via the DBO for spike solutions, etc. and can later easily migrate to the chosen binding approach. However, I do like JDOM because it is easy to use and fits into Java very nicely. It uses JAXP behind the scenes, and supports both SAX and DOM. Regardless of the API (SAX, DOM, JDOM, or XML Data Binding), and there are pros and cons for each implementation, a DBO gives you the flexibility of choice.
     
    > - As an optimization, I think it's worth to consider writing thread-safe DBOs and having one instance of each rather than create a new one for each invocation. In my expirience these DBO objects don't need any member state that changes during their invocation, so they should have no problem with handling multiple threads at once. This is the case in your example DBOs.

    Completely agree. In fact my DBOFactory is a Singleton (see DBOFactory.getInstance()) that caches the DBO factory objects. If I have a chance to revise I'll include the factory strategy in the example next time.

    > The cautious also add "except too many layers of indirection". Let's not fall too much in love with all these layers :)

    Hahaha, you're absolutley right. Too true, Gal!

    Roland
  4. Roland,


    In some respects I completeley agree with you. However, in my experience I've seen a need for this sort of 'built-in' serialization. When I indicated that we get binary serialization for free, I failed to point out that other platforms have this sort of XML serialization pre-fitted. Take .NET record sets for example. Data Transfer Objects (DTOs) are generally incomplex objects typically made up of simple accessors. DBOs can aid in a scenario where you need to build this automated mechanism yourself but don't want to tie your DTOs to any specific API. Fowler's Data Transfer Object for instance suggests the use of this sort of built-in mechanism to encapsulate serialization. The XML Business Delegate article on TSS also suggests the use of such design. DBOs can take these designs a step forward.


    I agree with the point you're making about automated DBOs, I just don't see why a "built-in" serialization is required in order to gain that. What can you do with a toXML method inside the DTO that you couldn't do with a seperate DBO? Maybe I'm missing your point because we have different expiriences. Can you please describe some specific example?


    What I'm trying to say is if you want to use an XML source other than a reader then that's cool. Recursion wasn't an issue for the implementation I copy and pasted from.


    That's fine, but don't forget that we are talking about the DBO interface, not implementation. What you decide here will bind any implementation you make.


    I personally am leaning these days more towards the data binding approach myself. I use a document model approach in the initial stages via the DBO for spike solutions, etc. and can later easily migrate to the chosen binding approach. However, I do like JDOM because it is easy to use and fits into Java very nicely. It uses JAXP behind the scenes, and supports both SAX and DOM. Regardless of the API (SAX, DOM, JDOM, or XML Data Binding), and there are pros and cons for each implementation, a DBO gives you the flexibility of choice.


    I use data binding as well whenever possible. However, when I said I prefer SAX as a way of moving XML around I wasn't referring to the implementation of the DBO, but to the way the DBO inputs/outputs data. You need to choose some form of data that will be given to the DBO and that the DBO will output. This format must be the same for all implementations, because it is part of the DBO interface. For this format, I've seen several choices in real life code: Strings, reader/writer, DOM, JDOM and SAX. You have to pick one to use in the interface. In my opinion the ideal choice is SAX, and I explained why: it's easy to translate between SAX and any of these formats if you have to, and if you don't have to then SAX is the quickest, least memory consuming, and offers the best support for XML pipelines. A data-binding solution can easily use SAX: any data binder I know will take SAX input and give SAX output, so data-binding and SAX are not mutually exclusive.

    Gal
  5. Hey Gal,

    This has turned into quite the thread... :-]

    > I agree with the point you're making about automated DBOs, I just don't see why a "built-in" serialization is required in order to gain that. What can you do with a toXML method inside the DTO that you couldn't do with a seperate DBO?

    Perhaps the reason why other patterns suggest the use of built-in mechanisms is because they wish to simplify the client call. That is, if you don't control both sides of the connection it might be more straight forward for the client to call the to/fromXML method rather than use a DBO. This way the serialization is transparent to the client and they need to know nothing of DBOs. However, if you have control of both sides using a DBO might make more sense and is certainly less OO ugly. I'm leaning more towards the latter myself in those situations.

    > I use data binding as well whenever possible. However, when I said I prefer SAX as a way of moving XML around I wasn't referring to the implementation of the DBO, but to the way the DBO inputs/outputs data.

    Point taken!

    Thanks,
    Roland
  6. Hey Roland, welcome back :)

    > Perhaps the reason why other patterns suggest the use of built-in mechanisms is because they wish to simplify the client call. That is, if you don't control both sides of the connection it might be more straight forward for the client to call the to/fromXML method rather than use a DBO. This way the serialization is transparent to the client and they need to know nothing of DBOs.

    I guess this is allways a trade-off you should consider. But I think it's important to recognize these trade-offs and seperate them from the pattern core. That way any developer can consider them seperately and make a choice.
    I would like to suggest an alternative way of simplifying the client's code. Instead of placing read/write methods in each DTO, I write a Singleton class with a "readXML" and "writeXML" methods that take a DTO and use DBOs under the covers. This simplifies things for the client, and I think it is still better than having methods directly in the DTO. For one thing, it promotes seperation of concerns, which is an important OO design principle. Serialization of XML is seperated into a DBO class in this pattern, so tying serialization back into the DTO class should be avoided. Also, by doing this, you really reduce the amount of work required to implement the pattern. As I said in another branch of this thread, with this approach, and using a mapper to do the XML work, you can implement this pattern very quickly without any boilerplate code, and still enjoy the consquences of this pattern. I also want to add, regarding this point, that I don't think you need a seperate DBO for every DTO. If one DBO can handle all your DTOs (which is usually the case with a mapper-based DBO) then one DBO is enogth. Same goes for interfaces. At the basic level, you need just one interface taking a common DTO baseclass (java.lang.Object if you don't have one). And same goes for factories. You can have the factory's "newInstance" method take a Class instance indicating the type of DTO. This strategy is not the only one, but I think it does provide a lot of functionality with little work and should be strongly considered. The only problem I see with it is that it has less compile-time safety, but I don't think this is an issue. Let me know what you think.

    Regards
    Gal
  7. ... but I don't think this is an issue. ...


    Sorry, I should have said that I think this is usually not a big issue.

    Gal
  8. DBO[ Go to top ]

    Guys,

    This is an interesting thread.
    Quite an interesting argument about the Design.

    I agree with Gal in most of his points. TOs have no business of
    knowing how to be serialized to XML and from XML.

    I am not an object purist by any means, but I usually try to keep my
    objects simple and to assign a single type of responsibility, as much
    as possible, so it makes complete sense what Gal was mentioning with regards
    to call sequence of DBO.serialize(TO), as oppozed to TO.serialize() -> DBO.serialize(TO).

    As to the comment of why not generate TOs automagically using some of the
    available tools like Castor. I agree that that is possible, the only problem
    that I see with this approach, is that you may not always have a benefit
    of starting a project from scratch. Several projects had a well established
    Domain Object Model in place, with all of the TOs in place, so generating
    of TOs from Schema or DTD is not an option.

    The only point that I could disagree with Gal, is that DBOs need to somehow support recursive invokation. Again, if the problem you are trying to solve is
    serialization to and from datastore of a concrete TOs, I think that Reader and
    Writers are fine, when you trying to chain XML processing you are in a way going beyond serialization.

    Anyway, I love the post. In my opinion this is definetely could be called a pattern, especially if you take away XML, and particular data binding technologies. With XML in, it could be considered an idiom.

    Regards, Mikhail.


    Regards, Mikhail.
  9. DBO[ Go to top ]

    Mikahil,

    Thanks for your reply. Please let me give you two examples of what I mean by "recursive serialization" and "XML chaining".

    First, recursive serialization. Consider these two DTOs:

    Order:
      User user
      int copies
      ...

    User:
      String name
      String address
      ...

    Now suppose you want to serialize an Order instance as XML. It seems quite natural that you would want to say something like:

      DBOFactory.createDBO(User.class).serialize(user)
      serialize(copies)
      ...

    But if you pass writers you can't really do that, because each serialize method serializes an entire document, with <?xml> directive etc. If it doesn't serialize the <?xml> directive then somebody else has to, and you need to call another method. That is feasible, I guess. With readers it's more complicated, because how would you parse the XML when it isn't a part of a complete document? JAXP doesn't support it in a standard way. It is generally problematic because the document could have declared some internal entities or namespace declaration that would not be available when you just pass readers.

    About chaining, consider the following scenario. I want to use XSL to create a PDF file (/HTML/SVG/...) from by DTOs. I pass it my XML representaion, and it uses it to create the document. With SAX, I would just pass it a SAX source and that's it. No memory is wasted, no XML is being parsed. With writers, you have to write the entire XML into the memory, which wastes memory, and then the XSL processor has to re-parse it into a SAX stream. That just seems like a waste to me. SAX is a far more convinient way to move XML around, IMO.

    Gal
  10. Gal,

    - The way you implement your DBO, it's very hard to use the DBOs recursively, which can be an important thing if you're using a hand-built solution (e.g, in your terms, a "document model" solution). It's hard because you pass along readers rather than some XML source. Each DBO opens a new parser and parses an entire document. This can't work recursively.

    In this case, I am not quite clear about recursive use of DBOs. Could you help me out why there is diff DBO usage btweeen SAX and DOM XML implementation? Thanks advance.
  11. Luco,

    > In this case, I am not quite clear about recursive use of DBOs. Could you help me out why there is diff DBO usage btweeen SAX and DOM XML implementation? Thanks advance.

    I'm not sure I understand your question.

    If you are asking what the difference is between SAX/DOM and a DBO, well, they just don't play the same role. You can handle the DBO implementation internally with SAX, DOM, or whatever else you want. The DBO encapsulates the functionality of generating XML from objects (DTOs for instance). However, you do need to choose one standard format that will be used to give input and receive output from the DBO. I think the best choice is SAX, for the reasons I outlined in the post you quoted.

    If you are asking why SAX is better than DOM for recursive use of DBOs, the answer is it isn't (at least as far as I can tell). I prefer SAX to DOM for different reasons. I only said that the author's original implementation didn't support recursive use of DBOs because it used Readers. If it had used DOM there would indeed be far less issues with recursive chaining of DBOs.

    If I still missed your question, please try to explain the question, because I'm not sure I understood what you meant.

    Gal
  12. Hmm, well, I don't wish to sound overly negative, but this is a specific implementation approach to a few very well known patterns applied to XML. It's hardly a pattern in it of itself.

    Some other comments...


     o " binary serialization of Transfer Objects is essentially free". Free here is only true in terms of development time. It's rather expensive compared to the Externalizable interface. Be careful to use a context when using a word like free :-)

     o Your only benefit here is isolating XML serialization/deserialization. This means you can swap out different implementations, and then only with alot a work - you need new versions of DBO's for every TO class. There are several downsides. If you use a single factory class for DBO creation, you need a method per XML serialized class. This means your factory class is becoming a bottleneck to development. On top of all this, you need a _monster_ amount of code to implement this functionality, with the only benefit being isolation from the XML API.

     o In my experience, dev teams don't switch XML apis very often. And if they do, this approach will need _more_ work then just putting XML serialization into the TO to begin with, or just using a TO that extends the DBO and using that directly. I can't forsee a case where a project would use multiple DBO strategies, so there's no reason not to juse stick the XML serialization/deserialization into the TO or use an object with wraps/extends it and instantiate this directly.

    Personally, for small projects I generally just use a class that wraps or extends the TO's and use those directly. For a project with many TOs, I'll typically write a code generator that uses XML files to define a mapping between TO's and XML, and spits out an implementation against a specific implementation like SAX (but which can be easily swapped out for something else).

    In general, I only use factories when there's a clear need to hide the concrete classes in use - like a wide spread industry API which may have many implementations, or where using multiple implementations in one project is necessary. Using a factory in a case like this just leads to a code explosion with little realized benefit. In a nutshell, you're violating KISS here big time.

         -Mike
  13. Mike,

    I'm not the poster of the pattern, but I'll try to answer your post according to my understanding of the pattern,

    > Hmm, well, I don't wish to sound overly negative, but this is a specific implementation approach to a few very well known patterns applied to XML. It's hardly a pattern in it of itself.

    I'm usually sceptic about most of the patterns in the TSS patterns section, but in this case I happen to think the post really describes a general recurring practice. Good or not is for the readers to decide :) That's all a pattern is. It doesn't have to be complicated or hard-to-come-up-with. A lot of patterns are combinations of other patterns.

    >  o Your only benefit here is isolating XML serialization/deserialization. This means you can swap out different implementations, and then only with alot a work - you need new versions of DBO's for every TO class. There are several downsides. If you use a single factory class for DBO creation, you need a method per XML serialized class. This means your factory class is becoming a bottleneck to development. On top of all this, you need a _monster_ amount of code to implement this functionality, with the only benefit being isolation from the XML API.

    First of all, you don't need a seperate DBO for every class. If you are using a data binder you only need one (or a couple if you support different configurations). If you're using a reflection-based automatic solution you also need just one. If you have hand written code for every DTO you need a seperate DBO for every DTO, that's kind of point in a hand written solution isn't it?
    The example given in the original post uses a seperate factory for every DBO, so you don't get a huge factory. In some cases you can use a single factory: for instance if you load the implementation class names from some external config file. But anyway there's nothing to stop you from designing the factories in a reasonable way that won't create a bottleneck.
    I don't know why you think a lot of code is required. If you use a data-binder you don't need a lot of code at all, at least in my expirience. Of course if you use a hand-built solution you need a lot of hand-written code, that's the point in a hand-built solution. But choosing to go with a hand-built is a seperate choice and goes outside the scope of this pattern.

    >  o In my experience, dev teams don't switch XML apis very often. And if they do, this approach will need _more_ work then just putting XML serialization into the TO to begin with, or just using a TO that extends the DBO and using that directly. I can't forsee a case where a project would use multiple DBO strategies, so there's no reason not to juse stick the XML serialization/deserialization into the TO or use an object with wraps/extends it and instantiate this directly.

    Our expiriences on this are different. Cases I saw (in my project and others) include projects that start with some dirty hand-built solution and then switch to a binder, projects that start with a binder and then go to a hand-built solution for speed or flexibility, and projects that switch binders in the middle of development or between releases. Having the serialization code centralized in classes seperate from your DTO classes allows you to change implementations easily and also keep supporting several different serialization modes at the same time (as you may need to be able to read old XML written by older versions). You need to remember that even if you switch between binders, the resulting XML would probably be a bit different. Many projects can't afford to just throw out the old format and stop supporting it.

    > Personally, for small projects I generally just use a class that wraps or extends the TO's and use those directly. For a project with many TOs, I'll typically write a code generator that uses XML files to define a mapping between TO's and XML, and spits out an implementation against a specific implementation like SAX (but which can be easily swapped out for something else).

    That's a particular strategy that may work well for some projects. You don't need to use the pattern in it's most general form in all projects. You can skip the factories if you want, for instance. I still think you get an important benefit from seperating serialization code from the DTO (I really don't think serialization code belongs there - see my comments in my first post). And of course the pattern doesn't necessarily work well in all cases. Examine the problem and the consequences and choose for yourself if your particular projects would benefit from it.

    > In general, I only use factories when there's a clear need to hide the concrete classes in use - like a wide spread industry API which may have many implementations, or where using multiple implementations in one project is necessary. Using a factory in a case like this just leads to a code explosion with little realized benefit. In a nutshell, you're violating KISS here big time.

    I do think it is better to hide the implementation classes so they don't pop up in all sorts of places in the code. Again, maybe that's just my expirience with projects that do change serialization methods. But like I said, if you don't need factories, don't use them. It's not the heart of the pattern, it's just an additional way to minimize dependencies between the client and the serialization system. If the overhead of writing these factories is bigger than their advantage, don't use them. It's that simple. A pattern is not something that is carved in stone. It's just a useful solution to a reucrring problem.

    Gal
  14. \Gal\
    I'm usually sceptic about most of the patterns in the TSS patterns section, but in this case I happen to think the post really describes a general recurring practice. Good or not is for the readers to decide :) That's all a pattern is. It doesn't have to be complicated or hard-to-come-up-with. A lot of patterns are combinations of other patterns.
    \Gal\

    Well, if you take out the XML part, which is really an implementation detail, there's not much "pattern" left.

    I agree that patterns do not have to be complex - look at Singleton :-) But in this case, it's not the complexity or lack there of that bothers me, it's that there's nothing new here.

    \Gal\
    First of all, you don't need a seperate DBO for every class. If you are using a data binder you only need one (or a couple if you support different configurations). If you're using a reflection-based automatic solution you also need just one.
    \Gal\

    Generally, you do need a DBO for each class, to do the mechanics of the XML reading/writing and, more importantly, to take the proper TO object as input and return one as output. The read/write methods must be small, but because they each take/return a specific TO type, you usually need a specific method pair to do it. At minimum, you typically need one interface and one concrete class for each TO in this design. If you look at the examples, you'll see that this is true.

    \Gal\
    The example given in the original post uses a seperate factory for every DBO, so you don't get a huge factory. In some cases you can use a single factory: for instance if you load the implementation class names from some external config file. But anyway there's nothing to stop you from designing the factories in a reasonable way that won't create a bottleneck.
    \Gal\

    Actually, the original does use s single factory for all DBO creation - "DBOFactory.getInstance().getCustomerDBO()". I agree you don't have to do it that way, but that's what the original shows.

     \Gal\
    I don't know why you think a lot of code is required. If you use a data-binder you don't need a lot of code at all, at least in my expirience. Of course if you use a hand-built solution you need a lot of hand-written code, that's the point in a hand-built solution. But choosing to go with a hand-built is a seperate choice and goes outside the scope of this pattern.
    \Gal\

    Look at the explanation and examples again. You need read/write methods in the TO, which hit some factory, and an interface and a DBO class per TO. Most of this _can_ be boilerplate if you're using a data binder, but it's still there. And you need that boilerplate to avoid exposing the details of the binder. As I mentioned, I'd rather just expose the XML api and thereby drastically reduce the amount of code needed, or use a code generator tied to a mapping file and be done with it.

    \Gal\
    Our expiriences on this are different. Cases I saw (in my project and others) include projects that start with some dirty hand-built solution and then switch to a binder, projects that start with a binder and then go to a hand-built solution for speed or flexibility, and projects that switch binders in the middle of development or between releases. Having the serialization code centralized in classes seperate from your DTO classes allows you to change implementations easily and also keep supporting several different serialization modes at the same time (as you may need to be able to read old XML written by older versions). You need to remember that even if you switch between binders, the resulting XML would probably be a bit different. Many projects can't afford to just throw out the old format and stop supporting it.
    \Gal\

    If you do decide to switch APIs mid-project, this pattern involves the most work IMHO. The methods I describe let you switch APIs mid-stream alot more easily.

    As for running multiple XML versions simultaneously - I honestly haven't seen much call for this. It's a maintenance nightmare that most project teams avoid if they can. I just don't see much call for having an app support multiple XML APIs simultaneously, and that's the only real value this pattern brings. And, certainly, your comment that "many projects can't afford to just throw out the old format and stop supporting it" is rather over the top. We're talking about XML here, and switching APIs into/out of it, not switching formats altogether. Incompatibilities between XML output here is more easily solved by tweaking the new code than supporting multiple XML writers and readers simultaneously.

    \Gal\
    I do think it is better to hide the implementation classes so they don't pop up in all sorts of places in the code. Again, maybe that's just my expirience with projects that do change serialization methods. But like I said, if you don't need factories, don't use them. It's not the heart of the pattern, it's just an additional way to minimize dependencies between the client and the serialization system.
    \Gal\

    If you implement readXml/writeXml directly in the TO's, you have identical encapsulation from caller's perspective as using a "true" DTO. That's right - identical. And you get it with far less code. There certainly are cases where this approach is the only one feasible, but I don't think it's suitable for most applications that need to "serialize" out to XML and back again.

    \Gal\
    If the overhead of writing these factories is bigger than their advantage, don't use them. It's that simple. A pattern is not something that is carved in stone. It's just a useful solution to a reucrring problem.
    \Gal\

    I hear what you're saying, but at the same time you're generalizing and trivializing patterns to the point of making them useless.

         -Mike
  15. Well, if you take out the XML part, which is really an implementation detail, there's not much "pattern" left.

    I agree that patterns do not have to be complex - look at Singleton :-) But in this case, it's not the complexity or lack there of that bothers me, it's that there's nothing new here.


    I disagree. There is no point in arguing about whether or not this is a "pattern" when the term itself is so open to interpretation. But I will say that this pattern is no less or more complex than a DAO pattern. In fact it is the same concept, only applied to XML and dealing with some XML-related issues such as how to move XML around.

    Generally, you do need a DBO for each class, to do the mechanics of the XML reading/writing and, more importantly, to take the proper TO object as input and return one as output. The read/write methods must be small, but because they each take/return a specific TO type, you usually need a specific method pair to do it. At minimum, you typically need one interface and one concrete class for each TO in this design. If you look at the examples, you'll see that this is true.

    Perhaps I misread or over-interpreted the pattern description to match my own ideas. I don't think that it is necessary to make more implementing classes than you need. If you are using Castor for instance, a single DBO would do, taking general objects rather than specific DTOs. This class can serve as a DBO for any object in your system. As for writing interfaces for each class, I guess that depends on the specific application, but here too I wouldn't have any problem with declaring just one interface taking Objects if I don't need specific stuff for every DTO. Even if you do write a seperate interface, the single generic DBO can implement them all.

    I want to reiterate this: using a "generic" DBO implementation and interface all you need is just two classes for the whole project. And once you wrote them once, you can cut-paste them into another project in 30 seconds. Writing them initially takes 5 minutes, and most of their code is binder-related code you had to write anyway. You still get the benefits described in the consequences of the pattern. If you switch to a hand-built solution you may want to make a whole hierarchy of classes and interfaces, but you don't have to, and if you stick with binders you don't even need to. Supporting a new mapping file with an existing binder or a new binder is still a matter of adding just one class (another DBO implementation).

    Actually, the original does use s single factory for all DBO creation - "DBOFactory.getInstance().getCustomerDBO()". I agree you don't have to do it that way, but that's what the original shows.

    My mistake. The example only shows one DTO, but the name DBOFactory does suggest a single factory for all classes. Like you said - this is not necessary. In context of my previous comment, I also wouldn't have a problem with having one factory with one method taking, say, a Class object and returning an appropriate DBO supporting some general DBO interface.
    You get the same level of decoupling with this, and require less boilerplate code. You get a little less compile time safety, but not dangerously so, IMO.

    Look at the explanation and examples again. You need read/write methods in the TO, which hit some factory, and an interface and a DBO class per TO. Most of this _can_ be boilerplate if you're using a data binder, but it's still there. And you need that boilerplate to avoid exposing the details of the binder. As I mentioned, I'd rather just expose the XML api and thereby drastically reduce the amount of code needed, or use a code generator tied to a mapping file and be done with it.

    I generally reject the idea of a serialization method in the TO as I said in my first post. This, I agree, is just boilerplate code. As for exposing the XML api directly - that is what the pattern is trying to avoid. Whether or not you need to avoid that in your project is for you to decide.

    If you do decide to switch APIs mid-project, this pattern involves the most work IMHO. The methods I describe let you switch APIs mid-stream alot more easily.

    As for running multiple XML versions simultaneously - I honestly haven't seen much call for this. It's a maintenance nightmare that most project teams avoid if they can. I just don't see much call for having an app support multiple XML APIs simultaneously, and that's the only real value this pattern brings. And, certainly, your comment that "many projects can't afford to just throw out the old format and stop supporting it" is rather over the top. We're talking about XML here, and switching APIs into/out of it, not switching formats altogether. Incompatibilities between XML output here is more easily solved by tweaking the new code than supporting multiple XML writers and readers simultaneously.


    The way I implement this pattern did make it easier for me to switch an expiriment with several binders. I don't know why you think it involes more work.

    Supporting old formats is quite a common requirement in many projects, regardless of whether or not XML is involved. Further, when you switch mapping-based binders the format can change even though you switched just the API. Also when switching from hand-written to binder there can be some inconsistencies.
    I coudln't disagree more on your point about tweaking code in order to support old formats. In my expirience that is about the most catastrophic action you can take in that situation. It pollutes new code with a lot of historical stuff that just sits there and is hard to maintain seperately, it tends to create inconsistencies between formats because the "tweaking" is seldom perfect, it's harder to manage with code versioning systems (you can't check out a specific version of the old format code and compile it with the updated system), etc. Also, tweaking does take time. With this pattern, supporting multiple readers/writers simultaneously doesn't take any time. Just don't delete the old DBO implementations when you write the new ones :) You do need the initial investment of course to write the baseclass/es and factory/ies. Whether or not this tradeoff is worth it depends on the project at hand.

    Don't be so quick to think "I didn't need this, hence nobody ever needs it". Consider a projects that communicates XML through HTTP interfaces (or their cooler brothers, webservices :)) with business partners. As the project develops it starts exposing new types of orders and options for orders that were not included in the start to make things simple. So it starts to issue new versions of the format with more detailed information. But the business partners only need a fraction of this so they never update their system, they use hand-written SAX-based code that relies on an extremely specific format, their XML dev team was cut to half and has no time, and their managers are afraid to switch out of their old tested system. Any one of these reasons can mean you have to support the old format, maybe even decide at runtime which format to use.

    If you implement readXml/writeXml directly in the TO's, you have identical encapsulation from caller's perspective as using a "true" DTO. That's right - identical. And you get it with far less code. There certainly are cases where this approach is the only one feasible, but I don't think it's suitable for most applications that need to "serialize" out to XML and back again.

    Please review my comment, I was referring to the factory/no factory tradeoff. I don't think having readXml/writeXml directly in the TO is a good idea, but for different reasons that I allready pointed out. This comment was strictly regarding your comment about why using factories here is an overkill.

    I hear what you're saying, but at the same time you're generalizing and trivializing patterns to the point of making them useless.

    I don't think a recurring best practice in OO design is something trivial or useless. It is the definition given in the GoF book. Anyway, we allready have one block referring to this issue (the first one) so I won't repeat what I said there.

    Gal
  16. Gal, I think we're missing each other here and not accomplishing much :-/

    I generally understand what you're getting at, but what you're saying doesn't match the words & examples of the original, or it's obvious intent. For starters, in the context of what's been presented here, if you ever wish to allow the possibility of a hand-coded DTO, you _must_ at least have an extra interface per TO class. In the instances you cite those extra classes are a complete waste of time, yet they're required to make it work when you don't have a binder doing all of your work for you.

    \Gal on format lock-in, somewhat elided...\
    But the business partners only need a fraction of this so they never update their system, they use hand-written SAX-based code that relies on an extremely specific format, their XML dev team was cut to half and has no time, and their managers are afraid to switch out of their old tested system. Any one of these reasons can mean you have to support the old format, maybe even decide at runtime which format to use.
    \Gal\

    Gal, this has no basis in reality as I know it. We're talking XML here. Hand written SAX-based code will _not_ generally break the way you're describing. When you say "...an extremely specific format", I haven't the slightest idea what you mean here. We are talking XML, right? There is no "old format" - there's an XML DTD and data in that format. Accomodating a change in the DTD is generally a trivial task. What you're describing is more on par with completely different file formats with widely divergent semantics, not XML.

    Also, at the same time - the scenario you're describing is, in fact, the exception for most people in my experience, not the rule.

    \Gal\
    I don't think a recurring best practice in OO design is something trivial or useless. It is the definition given in the GoF book. Anyway, we allready have one block referring to this issue (the first one) so I won't repeat what I said there.
    \Gal\

    The definition is far more specific than that :-)

    And someone posting a "pattern", such as this, on theserverside.com hardly qualifies it as "best practice". If that was true that any practice you could think of could be deemed "best". :-/

         -Mike
  17. I generally understand what you're getting at, but what you're saying doesn't match the words & examples of the original, or it's obvious intent. For starters, in the context of what's been presented here, if you ever wish to allow the possibility of a hand-coded DTO, you _must_ at least have an extra interface per TO class. In the instances you cite those extra classes are a complete waste of time, yet they're required to make it work when you don't have a binder doing all of your work for you.


    I'm not sure why you think there is a real neccesity for having an extra interface per TO. You can have one interface that takes a general TO baseclass or even java.lang.Object. Like I said, this somewhat compromises compile-time type safety. But in certain systems that may not be acceptable.

    > Gal, this has no basis in reality as I know it. We're talking XML here. Hand written SAX-based code will _not_ generally break the way you're describing. When you say "...an extremely specific format", I haven't the slightest idea what you mean here. We are talking XML, right? There is no "old format" - there's an XML DTD and data in that format. Accomodating a change in the DTD is generally a trivial task. What you're describing is more on par with completely different file formats with widely divergent semantics, not XML.

    I don't why why you think XML is some sort of a magic cure that rids us of all format inconsistencies. What I mean when I say they rely on an extremely specific format is that if they happen to get another element that wasn't there before (which exposes information we didn't have in the format at the first version) they don't ignore it, they just signal an error. That sort of thing. Or maybe we had some unordered set of elements, but the first version put it out in some specific order, and the partner's code relied on that. Maybe some element name was corrected, or namespaces were added, or we switched from DTD to XML Schema, or we added an optional child element to one of the elements that comfuses the hand-written SAX parser. There are hundreds of things that can change between format versions. Just because it's easy for the other side to fix doesn't mean the other side will fix it. These things are more often than not affected by business factors. They won't change it because they won't change it, in fact we're not even going to ask them. You figure it out, what are we paying you for?! ;)

    > Also, at the same time - the scenario you're describing is, in fact, the exception for most people in my experience, not the rule.

    Maybe that is the case, I don't know. But even if it is, the way I implement this pattern takes me exactly 30 seconds to put in a new project (since I only copy-paste now that I wrote it once). And in the odd event that I do need to support several serialization methods (or several versions of one method) it doesn't take me any time. It's still a tradeoff, like most things in life. But I don't think it's such a bad one. I'd do it just for the "clean" design even if I didn't think I'll need to switch binders. Just like Object doesn't have a writeBinary method to serialize itself, and defers it to the ObjectOutput interface and it's implementation, ObjectOutputStream, I don't want a writeXml method in my object. I defer it to a DBO interface and a specific implementation. The rest is just talk.

    > And someone posting a "pattern", such as this, on theserverside.com hardly qualifies it as "best practice". If that was true that any practice you could think of could be deemed "best". :-/

    Whether or not this pattern is good or useful remains to be seen. I know it was for me, and the poster also seems to think so (although we may be talking about different versions of the pattern). If a lot of people find it useful it will become widely known. If not, it can still be useful for the few people who do happen to design projects that can benefit from it. I fail to see why you think that because it wasn't useful in your projects there is any problem with it. Every pattern has a specific context in which it is useful. The context of this pattern is that of a project where there may be a need to switch between serialization methods or support several serialization methods. I could argue that this context exists in most projects, because you can't know if you'll have to switch or not. But I won't. Even though most of the classes in the world are designed for reusability though they never get reused. That's not relevant. For the purpose of this discussion, what's relevant is whether or not the pattern is useful *in it's context*. If you never came across a situation where you had the right context, you never should have applied the pattern. If you really intend to keep arguing that this context never happens, then like you said, we won't get anywhere. Because I know for a fact that it does, simply because I've seen it. So please, let's argue about the applicability of the pattern in it's context, not about the context itself.

    Gal
  18. \Gal\
    I'm not sure why you think there is a real neccesity for having an extra interface per TO. You can have one interface that takes a general TO baseclass or even java.lang.Object. Like I said, this somewhat compromises compile-time type safety. But in certain systems that may not be acceptable.
    \Gal\

    Yes, you can do that, but that's not what the original pattern is about at all - which is where my objections are coming from. If I personally needed a fully isolated XML in/out system, I'd use a generic XML "context" exposed as an interface, and retrieved via a factory, which is passed into individual objects, so you get chaining. This context would contain the API-specific XML stuff. The factory would take a String denoting the type of XML context (dynamic, as you say, but worthwhile in this context). Within TO's themselves, they'd use this context, and grab the DTO also via a single factory class & method which takes the DTO "type" via String, and the class for the TO, and have a single DTO interface which stores/retrieves Object. This seems almost identical to what you're describing, and is a nice little framework. But I'd only bother if I really, really needed multiple XML in/out codebases running simultaneously. And in any case, this doesn't match the original pattern description at all. It's a completely different beasty using very different patterns and alot more reliance on dynamic behavior.

    \Gal\
    I don't why why you think XML is some sort of a magic cure that rids us of all format inconsistencies. What I mean when I say they rely on an extremely specific format is that if they happen to get another element that wasn't there before (which exposes information we didn't have in the format at the first version) they don't ignore it, they just signal an error. That sort of thing.
    \Gal\

    Oh, please. Ignoring unknown elements using a Sax parser is below trivial.

    \Gal\
     Or maybe we had some unordered set of elements, but the first version put it out in some specific order, and the partner's code relied on that. Maybe some element name was corrected, or namespaces were added, or we switched from DTD to XML Schema, or we added an optional child element to one of the elements that comfuses the hand-written SAX parser. There are hundreds of things that can change between format versions. Just because it's easy for the other side to fix doesn't mean the other side will fix it. These things are more often than not affected by business factors. They won't change it because they won't change it, in fact we're not even going to ask them. You figure it out, what are we paying you for?! ;)
    \Gal\

    Where we differ is where you say "There are hundreds of things that can change...". In my experience, the number of changes is usually small and quite managable, and XML is designed to make it very easy to accomodate "surprises". I think you're blowing this way out of proportion, and bulk of your objection seems to be incredibly bad parsers that don't know how to ignore unknown tags.

    As for "they won't change because they won't change", tell me - how are they using XML in the first place? It hasn't been around all that long, and no one's XML code can hardly be called "crusty" yet. Undoubtedly the company very recently changed _to_ XML, and will more than likely find it's much easier to accomodate unknown data using XML than whatever they used previously. And they'll find it in their best interest to write a parser that's at least bare-bones competent to avoid getting constant false errors on their end.

    In short - the types of issues you're talking about don't ensue multi-million dollar development efforts. It's generally a tiny change.

    \Gal\
    Maybe that is the case, I don't know. But even if it is, the way I implement this pattern takes me exactly 30 seconds to put in a new project (since I only copy-paste now that I wrote it once). And in the odd event that I do need to support several serialization methods (or several versions of one method) it doesn't take me any time. It's still a tradeoff, like most things in life. But I don't think it's such a bad one. I'd do it just for the "clean" design even if I didn't think I'll need to switch binders. Just like Object doesn't have a writeBinary method to serialize itself, and defers it to the ObjectOutput interface and it's implementation, ObjectOutputStream, I don't want a writeXml method in my object. I defer it to a DBO interface and a specific implementation. The rest is just talk.
    \Gal\

    Umn - this makes no sense on several levels. For starters, see also Externalizable.

    As for the rest - I'm afraid it's difficult to take your comments seriously when you say "30 seconds" and "it doesn't take me anytime". You XML-enable a whole project somewhere between zero time and 30 seconds - that's amazing!!!! And there are no development, speed, or space constraints - even better!!!

    Sorry for the sarcasm, but you've a) misread the original design completely, b) made claims with no code or specific examples to back it up, and c) keep showing how effortless this all is, when mere mortals do need _some_ effort in my experience.

        -Mike
  19. Yes, you can do that, but that's not what the original pattern is about at all - which is where my objections are coming from.


    I made a post after the pattern was posted where I described the type of pattern I use, and what I think needs to be different in the description given by the original author. This modified version is what I'm adovcating. It wouldn't make much sense for me to advocate the original pattern, having posted a list of problems with it, would it?

    > If I personally needed a fully isolated XML in/out system, I'd use a generic XML "context" exposed as an interface, and retrieved via a factory, which is passed into individual objects, so you get chaining. This context would contain the API-specific XML stuff. The factory would take a String denoting the type of XML context (dynamic, as you say, but worthwhile in this context). Within TO's themselves, they'd use this context, and grab the DTO also via a single factory class & method which takes the DTO "type" via String, and the class for the TO, and have a single DTO interface which stores/retrieves Object. This seems almost identical to what you're describing, and is a nice little framework. But I'd only bother if I really, really needed multiple XML in/out codebases running simultaneously. And in any case, this doesn't match the original pattern description at all. It's a completely different beasty using very different patterns and alot more reliance on dynamic behavior.

    That sounds ok, but far more complicated than what I was suggesting. Anyway since this isn't directly related to the pattern at hand I will with your premission leave this part of the discussion for some other time, as this thread is allready quite big.

    > Oh, please. Ignoring unknown elements using a Sax parser is below trivial.

    Maybe it is in some cases. In some cases, for instance where money is involved, the rule is usually that 100 false alarms are better than one error that goes with no alarm.

    > Where we differ is where you say "There are hundreds of things that can change...". In my experience, the number of changes is usually small and quite managable, and XML is designed to make it very easy to accomodate "surprises". I think you're blowing this way out of proportion, and bulk of your objection seems to be incredibly bad parsers that don't know how to ignore unknown tags.

    The number of changes is usually small. I didn't say "there are hundreds of chages", I said "there are hunderds of things that can change". I don't consider conservative parsers "incredibly bad". Sometimes you can ignore stuff you don't understand, sometimes it's better to signal an error. I didn't blow it into huge proportions, I'm just saying it happens.

    > As for "they won't change because they won't change", tell me - how are they using XML in the first place? It hasn't been around all that long, and no one's XML code can hardly be called "crusty" yet. Undoubtedly the company very recently changed _to_ XML, and will more than likely find it's much easier to accomodate unknown data using XML than whatever they used previously. And they'll find it in their best interest to write a parser that's at least bare-bones competent to avoid getting constant false errors on their end.

    In my expirience your approach here, even though it may make sense, is not allways reflected in real life. Companies interacting with your system will usually be reluctant to change their code, or do anything more than they have to, at least at start, from the second the system is running smoothly. As for conservative parsers not being "bare-bones competent", I think that's just narrow-sightedness on your part. Besides, only one of my examples could be solved by a parser that ignores elements it doesn't know.

    > Umn - this makes no sense on several levels. For starters, see also Externalizable.

    Externalizable is merely one way to tell a particular implementation of ObjectOutput, ObjectOutputStream, that you want to override it's default behaviour. I don't see what it has to do with my point.

    > As for the rest - I'm afraid it's difficult to take your comments seriously when you say "30 seconds" and "it doesn't take me anytime". You XML-enable a whole project somewhere between zero time and 30 seconds - that's amazing!!!! And there are no development, speed, or space constraints - even better!!!

    I'm not even sure if I should dignify that with an answer. I didn't say I XML-enable a project in 30 seconds. I said I implement the pattern in 30 seconds, with copy-paste. "Implementing the pattern" does not include walking my dog either, in case you were wondering. What it does include, as I have stated numerous times and even re-iterated in one post so it would be crystal clear, is:
    - Write a generic DBO interface with a couple of method (or rather, copy it)
    - Give a generic DBO implementation using a binder or any other solution to read/write XML.
    - Write a 4-line factory to return it.
    I don't include the binder-related code in the 30 seconds since you have to write that anyway, but it's also around 5-10 lines and can probably be copied as well if you're using something like Castor.

    Adding another serialization method is trivial. Once you implement a second DBO, all you have to do is change the factory to return it. It doesn't take any time. I didn't include writing the serialization code or walking the dog here either, since you have to do those things anyway.

    > Sorry for the sarcasm, but you've a) misread the original design completely, b) made claims with no code or specific examples to back it up, and c) keep showing how effortless this all is, when mere mortals do need _some_ effort in my experience.

    a) I said before I may have misread the original post. But I also posted my own corrections, and when I talk about the pattern I'm assuming these corrections are in place.
    b) I gave you examples to back my claims. I didn't post any code, but if you think that would help I would happily post examples from my own libs as soon as I have the time (I don't have them here, sorry).
    c) The most generic implementation is very simple. I don't know why that surprises you so much. Is a singleton that hard to implement? All I'm suggesting is an ObjectOutput/ObjectOutputStream analog for XML which uses binders or whatever you want to do all the actual work. Why should that be so complicated? It's a pretty trivial wrapper. If you want to write all the XML serialization code yourself that's another matter, and you may need a seperate DBO for every object and a whole lot of code. But that's not a part of the pattern itself. The pattern just sais you need, for every object you want to serialize, a DBO that can serialize it.

    Gal
  20. Roland,

    Rather than explicitly manage the XML binding characteristics of your transfer objects, why not use a tool like JAXB to generate the transfer objects to begin with? You get the benefits of traditional DTOs and XML serialization comes along for "free".

    This is the approach I've taken with a reporting application. At the project outset I defined an XML schema that matches the structure and relationships of the data transfer objects used by the system. I used JAXB to create the DTOs with java.io.Serializable enabled for the generated objects. This gives me maximum flexibility: I can either work with the DTOs directly or I can marshall them into XML for transformation by XSLT, etc. I also like the fact that my enterprise beans don't have to deal with XML at all.

    There are a few potential issues with this approach, but I think they are reasonably mitigated:

    1) Use of a system like JAXB implies a certain amount of "vendor lock in" for your DTO solution. This is true to a certain extent, but the JAXB binding customizations give the developer complete control over the naming of the classes and other characteristics. I was able to retrofit JAXB into a manually mantained DTO solution without too much effort (JAXB object creation patterns are the big difference).

    2) You lose the ability to explicity control the structure of your DTOs. Indeed, there is potentially a lot of cruft hidden in the DTOs for XML serialization purposes that you might not have otherwise used. But, that being said, there is a price to pay for simplicity. I believe the the benefits of not having to maintain the XML serialization aspect outweigh the hidden costs of the JAXB implementation.

    3) Finally, enabling of traditional serialization on JAXB derived objects is not technically enshrined in the standard. Currently it relies on an extension in the Sun reference implementation. That being said, it is recognized to be an important issue and will likely be part of a future revision of the JAXB standard.

    I certainly don't advocate JAXB for all DTOs, but if XML serialization is a must, it simplifies the process considerably.

    Cheers,

    Merrick
  21. Merrick,

    The approach you describe is mentioned in the pattern description. It is referred to as "code generated approach". JAXB is specifically mentioned. This pattern does not advocate hand-written serialization code. Even if you write your own DTOs, you can still use a mapping-based binder. This pattern describes an additional layer that isolates serialization code from the DTOs. You can use it with JAXB just as you can with any other approach.

    In many projects you don't even have a choice on which type of mapper to use. Either you have some predefined schema you have to conform with, or you have an existing set of objects (for instance EJB value objects) that you have to serialize as XML.
    If I do have a choice, I ask which of the following is more true: the XML is a description of the objects, or the objects are representations of XML. If the first is true, then the design is object-centric and I use a mapping-based mapper. In these cases XML is usually used as an added-value to store/load objects in a readable form, transform object graphs and query the objects. If the second is true, the design is XML centric, so I start with a schema and then generate objects for it using JAXB or another code generator.

    Regardless of the choice of mapper, you can use this pattern to isolate it from the DTOs.

    Gal
  22. hi,

    I was thinking about additional features I would like to see in XML Data Binding, but never really found them, one of them is explained below.

    I have been using Data Binding for some time now, from jNerd's xml2java to Castor and the early JAXB releases, but for some reason I never really felt the implementation was complete, probably related to the specific area in which I needed to apply it. Consider the following requirements:

    1. a swing client
    2. initial-load is an XML Schema of choice (or DTD for that matter)
    3. user is presented an empty document (for example as a JTree) that is to be validated against the Schema
    4. user can add elements, by right-clicking with the mouse a popup is shown allowing the user to insert elements that are valid at the specified location in the document.
    5. the document can be saved to file, serializing the internal datastructures into XML

    there is nothing special about these requirements exception #4, how does the application know what a valid element is at an arbitrary location in the document ?
    When using DBO directly in the code I would like to be able to ask at any location in the datastructure: "Which of the elements would be valid here and now ?" I never found an implementation that provided an elegant solution to this issue, so I had to implement it myself.

    It's not too difficult to do, considering you take the time to think it over, but I think it would be a better solution if the Data Binding facility would have this kind of logic... (and as my business is not xml2Java serialization there are no doubt people better suited for this job)

    What do you think about this ? Did somebody ever had this issue ? Are there alternatives to having to implement it yourself ? Is it a bad idea to have this feature available in the Data Binding implementation ?

    best regards
    Wouter.
  23. toXml, fromXml[ Go to top ]

    Hi All,

    This is a very interesting thread.

    /Gal/
    I don't think having a toXML method in the value object is good OO design. Turning a value object into XML is not an inherent operation of the object, but rather an operation than can be carried out *on* the object
    /Gal/

    I'd argue with that. toXml is basically exposing certain important attributes of the object (like a complex getter) and fromXml is generating an initialized object (like a constructor). These two methods are as tightly coupled to the current structure of the object as they can be. You can't change one (structure) without the other (xml binding).

    A class therefore should implement an interface like XMLizable that would

    force the class to have toXml and fromXML. The rest (how to do the XML binding) is implementation detail of the DBO.

    interface XMLizable{
    String toXml();

    XMLizable fromXml(String xml);
    }

    public class ValueObject{

    public String toXml(){
    return ValueObjectDBO.getInstance().toXml(this);
    }

    public XMLizable fromXml(String xml){
    return ValueObjectDBO.getInstance().fromXml(xml);
    }

    }
  24. toXml, fromXml correction[ Go to top ]

    Sorry. I was lazy. The above class should read

    public class ValueObject implements XMLizable{
    blah blah
    }
  25. toXml, fromXml[ Go to top ]

    <sandor>
    I'd argue with that. toXml is basically exposing certain important attributes of the object (like a complex getter) and fromXml is generating an initialized object (like a constructor). These two methods are as tightly coupled to the current structure of the object as they can be. You can't change one (structure) without the other (xml binding).
    </sandor>

    I guess we have a different idea of how "inherent" an XML representation of an object is to the object. I tend to think of it (in most cases) as a seperate feature, kind of an "added-value" to the object.

    It is certainly true that the XML representation of an object is tightly bounded to the object, but I don't think it should be tightly bounded to the object's implementation. Therefore I don't think toXML and fromXML should necessarily be a part of the object. For instance, the GUI component that draws a table is coupled with the table, but that still doesn't mean that your table object should be a JComponent. Indeed in Swing JTable is seperated from it's model (like any other Swing component).

    As I explained in this thread, one of the main reasons why I think the XML representation is not "inherent" to the object is that a single object can be represented using XML in many different ways. The persistent storage for JavaBeans spec defines one way to transform an object into XML. Mappers like Castor or JXV define other ways. I see the object as that table and the XML as the GUI that represents it. Even though they are coupled they are not necessarily in a one-to-one correspondence (one table can be displayed in a variety of different ways, one object can be XMLized in a variety of different formats).
    Another analogy is that of ObjectOutputStream and serialization. Why did Java's designers put the serialization operations into a seperate class rather than as a part of the Serializable interface? My guess is - for the same reason as I described above: to avoid coupling the object with one specific binary format, allow implementors to support different formats naturally (i.e, without having to define a whole new interface), and to provide more flexibility in the specification of the format (you can set all sorts of options in ObjectOutputStream to affect the format, and you can even subclass it to create your own "subformat").

    Gal
  26. toXml, fromXml[ Go to top ]

    /Gal/
    It is certainly true that the XML representation of an object is tightly bounded to the object, but I don't think it should be tightly bounded to the object's implementation. Therefore I don't think toXML and fromXML should necessarily be a part of the object. For instance, the GUI component that draws a table is coupled with the table, but that still doesn't mean that your table object should be a JComponent. Indeed in Swing JTable is seperated from it's model (like any other Swing component).
    /Gal/

    By doing

    public String toXml(){
    return ValueObjectDBO.getInstance().toXml(this);
    }

    you don’t actually bind it to the object’s implementation. You bind ValueObjectDBO to ValueObject. toXml is there because the interface you implement demands a contract from your class. There should be a very clean 1:1 relationship between the two. The XMLization details are the responsibility of the DBO. You can even call that DBO ValueObjectXMLDBO to indicate one specific task (to and from xml) for that DBO.

    /Gal/
    As I explained in this thread, one of the main reasons why I think the XML representation is not "inherent" to the object is that a single object can be represented using XML in many different ways. The persistent storage for JavaBeans spec defines one way to transform an object into XML. Mappers like Castor or JXV define other ways. I see the object as that table and the XML as the GUI that represents it. Even though they are coupled they are not necessarily in a one-to-one correspondence (one table can be displayed in a variety of different ways, one object can be XMLized in a variety of different formats).
    /Gal/


    “Castor can marshal "almost" any arbitrary Object to and from XML. When descriptors are not available for a specfic Class, the marshalling framework uses reflection to gain information about the object”

    In general I don’t like reflection. It’s breaking OO encapsulation. Also, if you can create descriptors for your classes then pretty much you couple each descriptor to the class just like how you’d couple toXml to it.

    If you’d want to export your objects in different XML formats you have a very good business requirement to do so. By implementing an interface you clearly signal to anyone that may use your class that that class is to be XMLized in certain ways. You want to add new ways? Implement JDOMizable, SAXizabe on demand and create corresponding DBOs like ValueObjectJDOMDBO, ValueObjectSAXDBO etc.

    /Gal/
    Another analogy is that of ObjectOutputStream and serialization. Why did Java's designers put the serialization operations into a seperate class rather than as a part of the Serializable interface? My guess is - for the same reason as I described above: to avoid coupling the object with one specific binary format, allow implementors to support different formats naturally (i.e, without having to define a whole new interface), and to provide more flexibility in the specification of the format (you can set all sorts of options in ObjectOutputStream to affect the format, and you can even subclass it to create your own "subformat").
    /Gal/

    By implementing XMLizable on the object you don’t prevent implementors from supporting additional formats. You just clearly and explicitly signal to all the users of the class that it can safely be turned to/from xml.

    When you externalize toXml and fromXml from an object and your client wants to simply display the xml representation of a contact let’s say for manual editing, you’d have to create signatures for each layer for the xml service calls, so instead of doing in your JSP:

    <bean:write name=”contact” property=”toXml”/>

    You have to escalate your framework.

    In your action:

    String contactXml = XMLDelegate.getContactXML(contactId);

    In your delegate:

    public String getContactXML(String contactId){
    return contactManager.getContactXML(contactId);
    }

    You do the same for your remote and local interfaces and managers. And what happens if you want to display the xmls for each contact?

    Simple way (if the toXml is in the object):

    <logic:iterate id=”contact” name=”contacts”>
    <bean:write name=”contact” property=”toXml”/>
    </logic:iterate>

    More complicated way (toXml is separate from the object):

    In your action:

    List contactXmls = XMLDelegate.getContactXMLs(List contactIds);

    And it goes all way down the layers.

    I think you should keep the operations of an object as close to the object as possible. In a case where it’s not a clear cut (like you can argue with toXml, and fromXml) keep it in the object too, because for a very concrete operation to be too close doesn’t violate any OO paradigm, and as the above example indicates it can simplify other things as well.
  27. toXml, fromXml[ Go to top ]

    <sandor>
    By doing

    public String toXml(){
    return ValueObjectDBO.getInstance().toXml(this);
    }

    you don’t actually bind it to the object’s implementation. You bind ValueObjectDBO to ValueObject. toXml is there because the interface you implement demands a contract from your class. There should be a very clean 1:1 relationship between the two. The XMLization details are the responsibility of the DBO. You can even call that DBO ValueObjectXMLDBO to indicate one specific task (to and from xml) for that DBO.
    </sandor>

    I did not mean to say that having the object implement a "toXML" method would bind the XMLization code to the object implementation. I meant that even though the XMLization process is tightly-coupled to the object it is not tightly-coupled to the implementation and therefore the "toXML" operation doesn't *necessarily* have to be placed in the Object's interface. I have different reasons for thinking that the "toXML" operation *shouldn't* be placed in the Object's interface that I outlined later in the post.

    <sandor>
    “Castor can marshal "almost" any arbitrary Object to and from XML. When descriptors are not available for a specfic Class, the marshalling framework uses reflection to gain information about the object”

    In general I don’t like reflection. It’s breaking OO encapsulation. Also, if you can create descriptors for your classes then pretty much you couple each descriptor to the class just like how you’d couple toXml to it.

    If you’d want to export your objects in different XML formats you have a very good business requirement to do so. By implementing an interface you clearly signal to anyone that may use your class that that class is to be XMLized in certain ways. You want to add new ways? Implement JDOMizable, SAXizabe on demand and create corresponding DBOs like ValueObjectJDOMDBO, ValueObjectSAXDBO etc.
    </sandor>

    I disagree with your comments on reflection and encapsulation. Misuse of reflection can hurt encapsulation, but not all uses. Castor and JXV use the public contract of an object, particularly it's public getter/setter pairs (and public field is you explicitly state that you want them) and mutates the object through this public interface.
    There is one key difference between coupling descriptors to classes and coupling toXML methods to classes. If I want to support a new format I write a new config file with new descriptors. And then instead of writing (I demonstrate for JXV, same is true for Castor or Java Serialization):

    JXVConfig config = JXVConfig.getInstance("configFile1.xml");

    I write:

    JXVConfig config = JXVConfig.getInstance("configFile2.xml");

    From here on I don't need to change anything. A different serialization format is supported by a different instance of the serialization framework.
    In the case of "toXML" methods I would have to change the implementations of all my classes, and assuming I want to support both formats I would have to also add another interface to all my classes (what would the name be? XMLizable2? :)).

    I have no problem with implementing XXXizable as a marker interface as done in the Java serialization framework (although I don't consider it completely necessary, and indeed the Java serialization framework can serialize objects even if they do not implement Serializable as part of an object graph). My point is that the functionality having to do with XXXizing a particular object is usually not an inherent feature of the object, but rather a seperate set of behaviors that are specific to one format. I don't think a User object needs to know the name of the tags used to mark up it's content, or the binary codes used to represent each of it's fields in a certain binary format. This is the business of a serialization framework. Accordingly, I think that a seperate class representing the serialization framework should be used to serialize the object. The object may interact with this framework via naming conventions (as is the case in Castor and JXV), standard methods (readObject/writeObject in serialization) and interfaces (used in all afordmentioned frameworks, for example the Externalizable interface in the serialization framework). But serializing/deserializing an object is an operation of a specific serialization framework and should be accessed through the framework's own interfaces, not directly on the object.

    <sandor>
    By implementing XMLizable on the object you don’t prevent implementors from supporting additional formats. You just clearly and explicitly signal to all the users of the class that it can safely be turned to/from xml.
    </sandor>

    You would need a new interface, and add changes to all of the objects that you intend to serialize, in order to support another format (even if you only want to make a tiny change). Of course I didn't mean implementors can't support additional formats. But they have to reimplement the whole thing. Again, if XXXizable is used just as a marker interface as I suggested, you don't have this problem.

    <sandor>
    When you externalize toXml and fromXml from an object and your client wants to simply display the xml representation of a contact let’s say for manual editing, you’d have to create signatures for each layer for the xml service calls, so instead of doing in your JSP:

    // code omitted for brevity
    </sandor>

    I don't think this issue has any theoretical or practical importance. All you are doing here is taylor-fitting your implementation so it can be used with a particular taglib. If you really need to display objects in XML that often you can spend 5 minutes and write a "toxml" taglib that just invokes the XML serialization framework you want to use. This will make things at least as simple as using <bean:write>.

    <sandor>
    I think you should keep the operations of an object as close to the object as possible. In a case where it’s not a clear cut (like you can argue with toXml, and fromXml) keep it in the object too, because for a very concrete operation to be too close doesn’t violate any OO paradigm, and as the above example indicates it can simplify other things as well.
    </sandor>

    I disagree. I think this is a case where the operation is clearly not inherent to the object. One object can be serialized into XML in many different ways. IMO writing an object as XML is an operation performed *to the object*, not *on the object*. Keeping such operations in the object runs contrary to the OO paradigm, because it creates poor seperation of concerns and makes it harder to change or add one type of behavior (in this case the serialization format) without changing all the classes to which the behavior applies.
    I also don't see a reason why you should have to add bolerplate code to each class you want to be able to serialize. A marker interface is OK for documentation purposes, just like in Java serialization. But what would you think if in every serializable class you would have to write

    public void toBinary(OutputStream out) throws IOExceptino {
       new ObjectOutputStream(out).writeObject(this);
    }

    Isn't this silly? ObjectOutputStream supports the serialization functionality. Why shouldn't the users talk to it directly? That way my object doesn't even have to know if the user wants to use ObjectOutputStream or some subclass with object replacement features/etc or any other format-specific features.

    As a side note, I have already mentioned in this thread several times that I think Strings are about the worst technique you can use to move XML around. See my previous posts for details. So even if I were to implement a toXML method I would return something like a SAXSource that can be used in an XML-pipeline without reparsing the XML or wasting memory.

    Regards
    Gal
  28. toXML, fromXML[ Go to top ]

    Hi Gal,

    The deeper we discuss this toXML issue the more I think that toXML is not different at all from toString. It’s exactly like toString only in a different format. It belongs to the class you’d dump out as XML. On further thought it should belong to every class (along with fromXML) that has at least one attribute. The java compiler should generate these two methods for you so no extra coding would be needed. No framework would be needed. No dealing with different XML formats would be needed. They should be part of the language. Think about when you’d use toXML? It’s when you want to exchange your data with someone or send it over the wire or persist it. At any of these cases there shouldn’t be any coding at all on your part. It should be as simple as possible. Imagine if we’d have toXML and fromXML at our fingers then all we’d need to persist any object from our application is one table:

    MyObjects
    -------------
    id text (UUID)
    objectName text
    objectData text

    one record would look like:

    id: UUID generated by the persistance framework
    objectName: ‘com.yourcompany.vo.YourObject’
    objectData: ‘<YourObject attribute1=”a”/>’

    The persistance framework would simply call the object’s toXML to generate objectData when persisting and it would read objectName to get the name of the class whose fromXML to use to read a persisted object back. No O/R mapping framework would be needed.

    I disagree with you when you say that you’d want to serialize a class into XML in more than one way. If you serialize in two different ways then you have two business cases so indicate it clearly in your application by separating the concerns into two different toXMLs like toXML and toSGML. I think that you should serialize to XML in only one format what you’d call toXML. Representing an object in XML should be as clear and simple as possible. Since XML looks like a format to stay for long so it should be adopted by the java community as the de facto serialization standard and it should be integrated into the language by the automated toXMLs and fromXMLs.

    The problem with ObjectOutputStream that it supports the serialization functionality. It shouldn’t. The object itself should. Serializing an object shouldn’t be an external operation performed on the object. You’d have to change the implementation of the serialization every time you change your object anyways. Why not having these methods automatically created and not worry about anything else? It took several years until the Java community finally recognized the need of automatically generated getters and setters as they will be implemented in Tiger. I would like to see the same thing for toXML and fromXML. No extra worry, just the ability to use toXML on any object you create without any effort on your part. Wouldn’t it be nice and simple?

    I think what comes out from our discussions is that you believe that toXML, fromXML, toBinary, writeObject etc. are something to leave to frameworks to deal with and I think that these should be integral part of the language.

    Cheers
    Sandor
  29. toXML, fromXML[ Go to top ]

    Hi,

    <sandor>The deeper we discuss this toXML issue the more I think that toXML is not different at all from toString. It’s exactly like toString only in a different format. It belongs to the class you’d dump out as XML.</sandor>

    That really depends on the context, but on the whole I have to say I disagree. What you are dexcribing may be true if you only want to output XML as a simple debug output or something like that. For general purpose XML IO I think a toXML method is completely insufficient. Again, I refer you to the Serialization analog. Why not have toBinary() in every class?

    <sandor>
    On further thought it should belong to every class (along with fromXML) that has at least one attribute. The java compiler should generate these two methods for you so no extra coding would be needed.
    <sandor>

    I know of no one standard way to encode objects using XML. Dictating one at the object implementation level doesn't seem right to me. Even if I were to do something like that, I certainly wouldn't put it in the hands of the compiler. If anything I would use a bytecode enhancer. The compiler shouldn't know about XML formats.

    <sandor>
    No framework would be needed. No dealing with different XML formats would be needed. They should be part of the language.
    </sandor>

    Of course a framework would be needed, only it would be embedded in the compiler, making it harder to chose between other frameworks.
    Dealing with different XML formats is not a burden that XML frameworks place on the user. XML framework are perfectly happy dealing with just one standard format. Dealing with different XML formats is a business requirement the arises from the fact that your system often communicates with other systems, which have their own formats which might not be the same as your "standard". XML frameworks support this requirement.

    <sandor>
    Think about when you’d use toXML? It’s when you want to exchange your data with someone or send it over the wire or persist it. At any of these cases there shouldn’t be any coding at all on your part. It should be as simple as possible. Imagine if we’d have toXML and fromXML at our fingers then all we’d need to persist any object from our application is one table:
    </sandor>

    It's just as simple to do with Castor or JXV. I see no compelling reason why this should be part of the compiler or why it should be embedded inside the Object (except some dubious claims about increased performance). I do agree that you shouldn't have to write the XML-related code yourself. That's why we have XML mapping frameworks.

    I also want to use an XML representation of my objects in order to apply things like XSLT and XPath to it. I don't even want to write it anywhere, just feed the SAX events to my XSLT/XPath processor.

    <sandor>
    MyObjects
    -------------
    id text (UUID)
    objectName text
    objectData text

    one record would look like:

    id: UUID generated by the persistance framework
    objectName: ‘com.yourcompany.vo.YourObject’
    objectData: ‘<YourObject attribute1=”a”/>’
    </sandor>

    That's one possible format, certainly not the one I would choose and quite probably not very useful for interaction with non-Java systems (what exactly are C++ parsers supposed to do with a persistence-framework-specific UUID and a Java Class name?). If you think this format is useful, why not make an XML persistence framework that supports it?
    The point is that there are many possible formats, there is no one-size-fits-all, and we are unlikely to agree on a standard. Why not just provide different frameworks (or one framework with different configurations) to support each useful format?

    <sandor>
    The persistance framework would simply call the object’s toXML to generate objectData when persisting and it would read objectName to get the name of the class whose fromXML to use to read a persisted object back. No O/R mapping framework would be needed.
    </sandor>

    There are various complicated issues of what exactly the input of toXML and fromXML should be, but since I don't like the idea of having one standard to/from XML I'll leave those out of the discussion.

    <sandor>
    I disagree with you when you say that you’d want to serialize a class into XML in more than one way.
    <sandor>

    I'm not saying I *would* want, I'm saying I *do* want. And I don't see how you can disagree, being that I'm usually the one deciding what *I* want :)

    <sandor>
    If you serialize in two different ways then you have two business cases so indicate it clearly in your application by separating the concerns into two different toXMLs like toXML and toSGML.
    <sandor>

    I would do that if I wanted to place those methods in my object, but I don't. Instead I have two persistence frameworks to support my two formats, just like two subclasses of ObjectOutputStream. But in your suggestion, how would I have those two methods if the compiler only generates one, and only using a standard format? Would I have to write them myself? If so why not just use a persistence framework as I suggest?

    <sandor>
    I think that you should serialize to XML in only one format what you’d call toXML. Representing an object in XML should be as clear and simple as possible. Since XML looks like a format to stay for long so it should be adopted by the java community as the de facto serialization standard and it should be integrated into the language by the automated toXMLs and fromXMLs.
    </sandor>

    You have still made no argument as to why this is better then using a standard XML serialization framework. I personally do not think that there is one perfect format and like to able to configure my XML, but that's a matter of taste.
    toXML and fromXML are automated by XML persistence frameworks without requiring language level or (god forbid) compiler-internal support.

    <sandor>
    The problem with ObjectOutputStream that it supports the serialization functionality. It shouldn’t. The object itself should.
    <sandor>

    I disagree, for various reasons. For one thing I think objects can be serialized in a variety of different ways, and shouldn't mandate just one. For another thing, ObjectOutputStream allows you to customize the format by subclassing it, which would be very hard without it. See for example the object-replacement features of the serialization protocol used in RMI. Without them RMI would have been impossible to implement. When I serialize an object during an RMI call (as a parameter or return value) I want references to Remote objects to be replaced at runtime with their Stubs. How do you propose I do that if the objects containing the references (which know nothing about RMI's specific needs) implement their own serialization format? How would they know that they should replace the Remote references with Stub references during RMI serialization, but not during normal serialization? Do you propose we add a "toRMIBinary" and "fromRMIBinary" that will also be supported by the compiler? And the same thing for every RMI-like system?

    <sandor>
    Serializing an object shouldn’t be an external operation performed on the object. You’d have to change the implementation of the serialization every time you change your object anyways.
    <sandor>

    No you don't. The serialization framework handles the serialization details.

    <sandor>
    Why not having these methods automatically created and not worry about anything else? It took several years until the Java community finally recognized the need of automatically generated getters and setters as they will be implemented in Tiger.
    <sandor>

    They are not automatically generated - they are supported by persistence frameworks. You don't have to write them yourself anyway, so what's the problem?
    I don't know the Tiger getter/setter feature you are referring to. Can you give me a reference?

    <sandor>
    I would like to see the same thing for toXML and fromXML. No extra worry, just the ability to use toXML on any object you create without any effort on your part. Wouldn’t it be nice and simple?
    <sandor>

    It is that simple. Just pick up the latest and coolest XML mapper and enjoy.

    <sandor>
    I think what comes out from our discussions is that you believe that toXML, fromXML, toBinary, writeObject etc. are something to leave to frameworks to deal with and I think that these should be integral part of the language.
    <sandor>

    I agree.

    Regards
    Gal
  30. toXML, fromXML[ Go to top ]

    Hello Gal,

    <gal>
    That really depends on the context, but on the whole I have to say I disagree. What you are dexcribing may be true if you only want to output XML as a simple debug output or something like that. For general purpose XML IO I think a toXML method is completely insufficient. Again, I refer you to the Serialization analog. Why not have toBinary() in every class?
    </gal>
    I would say that if toXML was standardized and used properly in each class as explained in my previous posting it could make the toBinary in the class unnecessary. (Of course it has to be implemented first). My point is that every single class must be outputtable in the same standardized way to XML (or any other format &#61664; maybe binary) for serialization/deserialization purposes.

    <gal>
    I know of no one standard way to encode objects using XML. Dictating one at the object implementation level doesn't seem right to me. Even if I were to do something like that, I certainly wouldn't put it in the hands of the compiler. If anything I would use a bytecode enhancer. The compiler shouldn't know about XML formats.
    </gal>
    There should be a standard way to encode objects using XML. If there is none yet it must be defined. Engineering must be based on standards. If XML is perceived as a language that is both human readable and flexible enough to handle all java objects (which to my best knowledge it is) then maybe one particalar dialect of it (if there are more incompatible ones) should be adopted by the Java community as a standard. Call it JavaXML.

    <gal>
    Of course a framework would be needed, only it would be embedded in the compiler, making it harder to chose between other frameworks.
    Dealing with different XML formats is not a burden that XML frameworks place on the user. XML framework are perfectly happy dealing with just one standard format. Dealing with different XML formats is a business requirement the arises from the fact that your system often communicates with other systems, which have their own formats which might not be the same as your "standard". XML frameworks support this requirement.
    </gal>
    I disagree. You as a programmer could still externalize the whole XML binding concept if you wanted to. But if there was one built in flexible enough to use how many times would you not consider using that first?

    <gal>
    It's just as simple to do with Castor or JXV. I see no compelling reason why this should be part of the compiler or why it should be embedded inside the Object (except some dubious claims about increased performance). I do agree that you shouldn't have to write the XML-related code yourself. That's why we have XML mapping frameworks.
    </gal>
    Then make Castor hidden from the programmers, built in so to provide a default standard xml mapping framework at the language level. I can’t stress enough that (for me anyways) a .toXML and .fromXML (or sg like it) would be the most straightforward and logical things to do for serialization.


    <gal>
    That's one possible format, certainly not the one I would choose and quite probably not very useful for interaction with non-Java systems (what exactly are C++ parsers supposed to do with a persistence-framework-specific UUID and a Java Class name?). If you think this format is useful, why not make an XML persistence framework that supports it?
    <gal>
    Would you like to be involved in an open source project for this? It sounds a good idea to me. I’m not sure how many people would use it though.

    <gal>
    The point is that there are many possible formats, there is no one-size-fits-all, and we are unlikely to agree on a standard. Why not just provide different frameworks (or one framework with different configurations) to support each useful format?
    </gal>
    I disagree. Regardless of the number of the different possible formats there should be one common denominator to which you can export your objects by default. All other frameworks should be able to deal with that format.

    <gal>
    There are various complicated issues of what exactly the input of toXML and fromXML should be, but since I don't like the idea of having one standard to/from XML I'll leave those out of the discussion.
    </gal>
    I think that this is where our ideas depart most. This is more like a religious issue. I believe in standards (you do too but to a different extent), more clearly I believe that xml binding should be standardized for java.

    <gal>
    I would do that if I wanted to place those methods in my object, but I don't. Instead I have two persistence frameworks to support my two formats, just like two subclasses of ObjectOutputStream. But in your suggestion, how would I have those two methods if the compiler only generates one, and only using a standard format? Would I have to write them myself? If so why not just use a persistence framework as I suggest?
    </gal>
    I guess the reason why you have 2 formats because there wasn’t a default flexible enough built in framework there for you at the time you decided on the 2 formats.
    Does your 2 frameworks indicate incompatible formats? If so, you spend time on assuring that they don’t intermix at various levels of your application. One standard would solve this problem.
    Also, how many possible XML formats can you think of? If there is an upper limit (I guess there is: maybe 2 or 3) then those should be parameterized in the compiler.

    <gal>
    You have still made no argument as to why this is better then using a standard XML serialization framework. I personally do not think that there is one perfect format and like to able to configure my XML, but that's a matter of taste.
    toXML and fromXML are automated by XML persistence frameworks without requiring language level or (god forbid) compiler-internal support.
    </gal>
    Because as you said there is no standard framework. All I’m trying to advocate is to use a standardized built in xml binding framework. I don’t say that I have the magic solution at my fingers and that I know which xml format is to be adopted, but I think there must be one. Can you explain why compiler-internal support is ‘god-forbid’?

    <gal>
    I disagree, for various reasons. For one thing I think objects can be serialized in a variety of different ways, and shouldn't mandate just one.
    </gal>
    I disagree. There should be one and exactly one standard and n perfectly valid non standard ways to serialize objects.

    <gal>
     For another thing, ObjectOutputStream allows you to customize the format by subclassing it, which would be very hard without it. See for example the object-replacement features of the serialization protocol used in RMI. Without them RMI would have been impossible to implement. When I serialize an object during an RMI call (as a parameter or return value) I want references to Remote objects to be replaced at runtime with their Stubs. How do you propose I do that if the objects containing the references (which know nothing about RMI's specific needs) implement their own serialization format? How would they know that they should replace the Remote references with Stub references during RMI serialization, but not during normal serialization? Do you propose we add a "toRMIBinary" and "fromRMIBinary" that will also be supported by the compiler? And the same thing for every RMI-like system?
    </gal>
    I think this is a different issue from xml binding. Objects wouldn’t have to explicitly implement their own serialization format. The one that would be standard would have to be dealt with by the RMI framework. The problem here is that Java doesn’t guarantee that an object serialized to byte stream would be deserializable in the following release of JVM.
    Isn’t RMI-like systems behave similarly anyways? Don’t they have to conform to certain formats in order to communicate?

    <gal>
    They are not automatically generated - they are supported by persistence frameworks. You don't have to write them yourself anyway, so what's the problem?
    I don't know the Tiger getter/setter feature you are referring to. Can you give me a reference?
    </gal>
    I think it’s under metadata or automatic boilerplating.
    http://www.javasig.com/Archive/lectures/JavaSIG-Tiger.pdf



    <gal>
    It is that simple. Just pick up the latest and coolest XML mapper and enjoy.
    </gal>
    Not quite. Which one to use? Which is more standard than the other? Will I be able to process your xml content you send me with the framework that I picked arbitrarily over the other?

    Cheers,
    Sandor
  31. toXML, fromXML[ Go to top ]

    Hi Sandor,

    <sandor>
    I would say that if toXML was standardized and used properly in each class as explained in my previous posting it could make the toBinary in the class unnecessary. (Of course it has to be implemented first). My point is that every single class must be outputtable in the same standardized way to XML (or any other format &#61664; maybe binary) for serialization/deserialization purposes.
    </sandor>

    You are missing my point. If you want to define a standard format for XML-to-object serialization that's fine. One example of such a format is the long-term persistence for JavaBeans specification. Just like Java Serialization is a standard binary serialization format for Java objects, you can define a standard XML serialization format for Java objects. Following the serialization analog, you would have an XMLObjectOutput interface (analogous to ObjectOutput) which defines an interface for object serialization frameworks, and an implementation (analogous to ObjectOutputStream) that provides the standard implementation. I'm not arguing against a standard XML format (although that wouldn't qualify as a "design pattern"). I'm arguing about how to design the framework to provide that standard format (or any other format).

    <sandor>
    There should be a standard way to encode objects using XML. If there is none yet it must be defined.
    Engineering must be based on standards. If XML is perceived as a language that is both human readable and flexible enough to handle all java objects (which to my best knowledge it is) then maybe one particalar dialect of it (if there are more incompatible ones) should be adopted by the Java community as a standard. Call it JavaXML.
    </sandor>

    You could define such a format. I believe, based on my expirience with XML mapping, that it is highly unlikely that one format would become a de-facto standard "for all things XML". There are many different choices when designing the format. For instance, a simple User object may be serialized as:

    <object id="13424" class="mypackage.User">
      <field name="firstName">
        <object class="java.lang.String">...</object>
      </field>
      ...
    </object>

    It can also be serialized as:

    <user name="..." />

    And many other formats. Including id references makes the format more robust and enables it to deal with cyclic object graphs, but it also makes the format much harder to work with using tools such as XPath and XSLT.
    Even if you do design an ultra-generic format, and even if you do standartize it through the JCP or some similar mechanism, my point still stands. There is no reason to support that format within the compiler, and there is no reason to place the methods involved with that format in the object. Take the serialization spec for instance. It is standard, it is a formal part of the Java language. It is everything you speak of, and yet it doesn't require compiler-support or add methods to each object.

    <sandor>
    I disagree. You as a programmer could still externalize the whole XML binding concept if you wanted to. But if there was one built in flexible enough to use how many times would you not consider using that first?
    </sandor>

    Of course you could externalize it, don't be silly. No design, no matter how bad and non-reusable it is, will make it impossible to implement another design. However, all the code that uses the toXML-based framework would have to be changed in order to use a different framework. Seperating the the serialization-framework into a seperate class makes it possible to switch frameworks or formats without *changing all affected code*. That is a highly important goal of object-oriented design.

    <sandor>
    Then make Castor hidden from the programmers, built in so to provide a default standard xml mapping framework at the language level. I can’t stress enough that (for me anyways) a .toXML and .fromXML (or sg like it) would be the most straightforward and logical things to do for serialization.
    </sandor>

    I don't know what you mean by "make Castor hidden from the programmers". If you just mean to provide a standard interface that will be implemented by Castor and all other XML binding frameworks then that is a fine idea, and actually this design pattern advocates exactly this idea: to wrap all binding frameworks (including hand-written ones) in a standard interface.
    Castor does come with a default XML mapping, and so does JXV and most XML mappers. There is no "standard" right now, but if there will be I'm sure most mappers will support it. You have still given absolutely no reason why .toXML and .fromXML are the most logical thing for serialization. I have repeatedly shown that it is a highly unflexible, unextesible and limiting design. We have the example of Java Serialization which has proven to be an appropriate solution for the varying requirements of different systems. I have shown you a specific example where a .toBinary solution would have been insufficient (in RMI).
    I have never heard a developer say that "creating an ObjectOutputStream is not straightforward enogth for me". To me it seems perfectly straightforward. Why is creating an "XMLObjectOutput" class any harder?

    <sandor>
    Would you like to be involved in an open source project for this? It sounds a good idea to me. I’m not sure how many people would use it though.
    </sandor>

    Like I said, I don't think that format is very useful. I don't want to open up another big discussion on relational databases, but it seems very clear to me that the point of storing your objects in tables is to represent them in a relational model, not to just stick them in a text-based BLOB field.
    I am the founder of an open-source XML mapper (JXV). If you want to implement this project yourself I would be happy to help you make use of JXV for the purpose of serializing objects as XML.

    <sandor>
    I disagree. Regardless of the number of the different possible formats there should be one common denominator to which you can export your objects by default. All other frameworks should be able to deal with that format.
    </sandor>

    You can decide on a standard format, you can provide a standard implementation, and you can urge other XML mappers to support that format. This has nothing to do with compiler-internal support, or a restrictive design (.toXML and .fromXML) that makes it impossible to switch implementations without rewriting the code or customize the format in any reasonable manner (see the Java Serialization object replacement features). As I have said before, you can define a standard format as done in the Java Object Serialization spec without crippling your system with all of these design flaws.

    <sandor>
    I think that this is where our ideas depart most. This is more like a religious issue. I believe in standards (you do too but to a different extent), more clearly I believe that xml binding should be standardized for java.
    </sandor>

    You are attacking a straw man. All I meant was that the parameters given to "toXML" and "fromXML" should be carefully considered. I have said before that a String is the absolute worst choice you can make. Readers and Writers are slightly better but also flawed. The best choice I can think of is SAX-based classed (InputSource, etc.).
    I don't know to what extent a single XML format can become a de-facto standard for Java. However I'm not denying (and have never denied) that usefulness of such a format. I'm arguing that the design you are advocating is seriously flawed, and that the "Java Serialization"-style design is superior by far (as well as widely tested, etc.). This has nothing to do with whether you support a standard XML format or not. I do think that the hypothesis of the design should be that even if there is a standard format (as with Java Serialization) some users may need to override it, customize it, or replace it (as with Java Serialization). Therefore the design must not be so restrictive that it would make it impossible to support anything other than the standard. Note that allowing the user to customize the generated methods is not sufficient. In the RMI case, for instance, you need to use the normal serialization format in some cases, and the modified format in others. Sometimes you have to be able to decide at runtime (for instance, in RMI, based on whether the object has been exported or not) which format to use.

    <sandor>
    I guess the reason why you have 2 formats because there wasn’t a default flexible enough built in framework there for you at the time you decided on the 2 formats.
    Does your 2 frameworks indicate incompatible formats? If so, you spend time on assuring that they don’t intermix at various levels of your application. One standard would solve this problem.
    Also, how many possible XML formats can you think of? If there is an upper limit (I guess there is: maybe 2 or 3) then those should be parameterized in the compiler.
    </sandor>

    Two formats do not neccesarily imply two frameworks. It's possible for one framework to support two formats (for instance, Java serialization with normal serialization and RMI serialization). When serialization is seperated into a seperate class you can create multiple serializers, configure each of them appropriately, and decide which one you want to use at runtime. Using the .toXML and .fromXML solution the only way to do this is to generate multiple .toXML and .fromXML methods for each object, which is completely unacceptable. I can't see how you can advocate such an approach. I don't even see a tradeoff here. You have not given a single argument for why .toXML and .fromXML is better than the Java Serialization approach. All of your arguments about a standard XML format are orthogonal to this point. A standard XML format can be equally supported by a standard serializer class like ObjectOutputStream.

    <sandor>
    Because as you said there is no standard framework. All I’m trying to advocate is to use a standardized built in xml binding framework. I don’t say that I have the magic solution at my fingers and that I know which xml format is to be adopted, but I think there must be one. Can you explain why compiler-internal support is ‘god-forbid’?
    </sandor>

    If by "built-in" you mean provided in the standard JDK release we have no argument. As was done for most XML applications (parsers, XPath, XSLT) the JCP can define a standard set of interfaces and the JDK can provide a default implementation.
    If by "built in" you mean built into the compiler I strongly disagree. First of all, I don't see why compiler-support is needed. It is only needed if you want .toXML and .fromXML, and you have given no arguments as to why that approach is preferrable.
    Compiler support for any feature should only be used as a final resort. Adding a compiler feature means adding something to the definition of the language. It means classes built with older versions of the compiler have to be rebuilt (and sometimes you can't rebuild them because you don't have the sources). It means any update/bug-fix entails an update to the compiler, and compilers are rarely updated. This type of thing might work for ".equals" or ".toString" (even there Java opted not to do it) but XML serialization is *far* to complicated to support directly in the compiler, in terms of bugs and other updates that may be required, and in terms of the variety of different implementation details that the spec shouldn't mandate (and if it doesn't mandate them, classes built using different compilers would be incompliant).
    Finally, I think compiler support for this is 'god forbid' simply because we don't need it, just as we don't need it for Java Serialization.

    <sandor>
    I disagree. There should be one and exactly one standard and n perfectly valid non standard ways to serialize objects.
    </sandor>

    Again, you are attacking a straw man. I'm not arguing against a standard XML format.

    <sandor>
    I think this is a different issue from xml binding. Objects wouldn’t have to explicitly implement their own serialization format.
    </sandor>

    And this is different from XML binding how? Objects do have to implement their own XML format? Can't you use XML binding frameworks just like you use ObjectOutputStream?

    <sandor>
    The one that would be standard would have to be dealt with by the RMI framework.
    </sandor>

    You are completely missing the point. The RMI framework does "handle" the standard serialization format, but RMI has a rather bizzare requirement: when a Remote object that has been exported as a server object is sent from one computer to another, we don't want to serialize it. Instead, we want to serialize a "remote reference" so that the receiver of the serialized object gets it it gets a remote reference rather than a duplication of the server object. In order to do that RMI must be able to override the standard serialization format. It does this by subclassing ObjectOutputStream and making use of it's "object replacement" feature (see serialization documentation).
    I could have the same situations if I want to implement "XML-based RMI" (which is not such a bizzare idea). I think any reasonable designer would agree that we can't require the compiler to generate special "toRMIXML" and "fromRMIXML" methods. If that is what you are arguing then I'm afraid we don't have even a minimal common ground to carry out a discussion. To me the fact that the compiler shouldn't support every format required by every RMI-like system internally is completely self-evident and obvious.

    <sandor>
    The problem here is that Java doesn’t guarantee that an object serialized to byte stream would be deserializable in the following release of JVM.
    </sandor>

    That is not true. And anyway it has nothing to do with my point about object-replacement features.

    <sandor>
    Isn’t RMI-like systems behave similarly anyways? Don’t they have to conform to certain formats in order to communicate?
    </sandor>

    Of course they do... Did I say that they don't? The formats that they confrom to is not exactly the same as the "standard" serialization format - remote server objects are not serialized directly but rather replaced with remote references at runtime. And yet RMI didn't have to rewrite the entire Java Serialization codebase or ask the compiler to support their own format internally. Behold the power of polymorphism.

    <sandor>
    I think it’s under metadata or automatic boilerplating.
    http://www.javasig.com/Archive/lectures/JavaSIG-Tiger.pdf
    </sandor>

    I don't know what "automatic boilerplating" is supposed to mean. I have read the metadata spec and haven't seen the feature you are referring to. I couldn't find any mention of the feature in the document you linked to.

    <sandor>
    Not quite. Which one to use? Which is more standard than the other? Will I be able to process your xml content you send me with the framework that I picked arbitrarily over the other?
    </sandor>

    Use whicever one you want. You will be able to process my XML content assuming you are using a mapper that supports the same format. Either make sure we use the same mapper or configure both mappers to produce equivalent XML.
    I'll say this again: I am not arguing against a standard XML format. I doubt one standard would suffice for all applications, but if you want to define one I can certainly see value in it. This thread is a discussion about a design pattern, and you are avoiding the topic by talking about the benefits of standards formats instead of talking about *the design you propose and the compiler-support it requires*. I argue that the design you propose is flawed, and even if we did have a standard format (as we do for binary serialization) the "Java Serialization"-style design is superior in every aspect.

    Regards
    Gal
  32. toXml, fromXml[ Go to top ]

    Hi Gal,

    I would like to summarize my argument on xml binding of Java objects as this would probably be my last posting on this topic. Thank you for the valuable discussions in the thread.

    My argument is based on xml binding of JAVA OBJECTS ONLY. And as such (for that purpose only: xml binding of JAVA objects) one standard universal format would be enough. Once that format is adopted there would really be no need for different frameworks for binding java objects. This could be integral part of the language.

    To my opinion toXml means a complex getter, and fromXml means a complex constructor. They should belong to the class. toXml and fromXml are so coupled to each member of the object that if you move it away from the object the binding framework either has to call getters/setters or use introspection to examine the state of each member. What if there are no public accessors/mutators? Is the framework going to tap into the class’es the private members?

    If you still prefer using frameworks for xml binding, that’s fine. I have no problem with frameworks unless they’re overkill for simple problems. I think that’s the case with java object binding.

    Following class doesn’t have to be changed when switching to different framework.

    class MyClass implements XMLizable{
    ..
        public String toXml(){
            return new XMLFrameworksFactory().getDefaultFramework().toXml(this);
        }

    …
    }

    Quote from http://java.sun.com/j2se/1.4.2/docs/api/java/io/Externalizable.html
     “Externalization allows a class to specify the methods to be used to write the object's contents to a stream and to read them back. The Externalizable interface's writeExternal and readExternal methods are implemented by a class to give the class complete control over the format and contents of the stream for an object and its supertypes. These methods must explicitly coordinate with the supertype to save its state.
    Object Serialization uses the Serializable and Externalizable interfaces. Object persistence mechanisms may use them also. Each object to be stored is tested for the Externalizable interface. If the object supports it, the writeExternal method is called. If the object does not support Externalizable and does implement Serializable the object should be saved using ObjectOutputStream.
    When an Externalizable object is to be reconstructed, an instance is created using the public no-arg constructor and the readExternal method called. Serializable objects are restored by reading them from an ObjectInputStream.”

    Isn’t the Externalizable concept similar to what I’m advocating with the XMLizable interface and toXml and fromXML methods? If you see analogy here I would challenge your argument on the flawed design I’m advocating. I think our 2 different ways of attacking the problem could be done similarly to the java serialization framework: if there is XMLizable interface implemented in a class, you’d have to provide the toXml and fromXml (either code manually yourself or with a framework’s help as I wrote in the above example). If the class doesn’t implement XMLizable but implements another marker interface like XMLSerializable it would be serialized with XMLOutputStream. Do you see connection between your approach and my approach here? If you do we’re talking about the same thing from two perspectives.

    As not very important part of this topic but I disagree with your argument on xml data exchange between your application and mine. With your way of doing it you would require me to use a mapper that would support the same format as yours. It’s nice to have open source libraries but what if you use a proprietary mapper that costs money and I can’t justify the cost? Wouldn’t then I ask you to try to use a format that we both can exchange for free?

    Cheers,

    Sandor
  33. toXml, fromXml[ Go to top ]

    <sandor>
    My argument is based on xml binding of JAVA OBJECTS ONLY. And as such (for that purpose only: xml binding of JAVA objects) one standard universal format would be enough.
    </sandor>

    That is an insubstantiated claim. I have mentioned RMI's need of object replacement in previous posts. You seem to be taking a very narrow interpretation of the term "XML binding", namely a very unflexible sense of reading and writing objects. "XML binding" generally refers to the process of binding a Java class to an XML format. XML binding can be used to apply XML applications such as XSLT, XPath, XQuery, etc to Java object graphs. It can be used to communicate with different systems (possibly developed in different languages), where the specific format can be affected by a variety of factors:
    1. Industry standard XML formats.
    2. Format expected by the system you are communicating with.
    3. Format features that are inherent to the protocol at hand: for instance remote reference replacement in RMI.

    XML binding has at least as many different requirements and factors as normal Java serialization (probably far more). You could say "For the purpose of serializing Java object one standard universal format would be enogth". And yet RMI has to specialize the format.
    I understand that in your particular applications you may have never seen the need for more than one binding format, but don't let that affect your perspective.

    <sandor>
    Once that format is adopted there would really be no need for different frameworks for binding java objects. This could be integral part of the language.
    </sandor>

    The Java Serialization is one standard format, and yet RMI did need to specialize it and create it's own format. Besides, I have made it abundantly clear that *I am not arguing against a standard XML format for Java objects*.

    <sandor>
    To my opinion toXml means a complex getter, and fromXml means a complex constructor. They should belong to the class.
    </sandor>

    Why isn't "toBinary" a so called "complex getter" by the same logic? You are saying that they should belong to the class, yet you make no argument about *why* they should belong to the class. You are blatantly ignoring the fact that there are *real-life situations* where having them belong to the class is simply insufficient. RMI is one of them, as I have explained in my last two messages.

    <sandor>
    toXml and fromXml are so coupled to each member of the object that if you move it away from the object the binding framework either has to call getters/setters or use introspection to examine the state of each member. What if there are no public accessors/mutators? Is the framework going to tap into the class’es the private members?
    </sandor>

    The framework may access the object in any way you see fit. Probably getter/setter access would be a common solution, but you use standard interfaces or even use private introspection (as Java serialization does). I see no problem in "tapping in to private members" provided that the code is generic and doesn't rely on the specifics of the private members - encapsulation is only broken when one part of the code *relies* on the implementation details of another part. The serialization code is generally generic except for one main problem: when the class' implementation changes the serialization format changes. To avoid this you *have* to resort to accessing the class through it's public interface. Placing the code in the class *does not* solve this problem: the format will still be changed when the implementation is changed.

    <sandor>
    If you still prefer using frameworks for xml binding, that’s fine. I have no problem with frameworks unless they’re overkill for simple problems. I think that’s the case with java object binding.
    </sandor>

    How can "a framework" be an overkill? A framework can be as simple or as complex as you want. As I said in a prior post, I have no problem with a binding-framework being provided with the standard JDK, so you don't have to even download it. An XML binding framework is no more an overkill than ObjectOutputStream.

    <sandor>
    Following class doesn’t have to be changed when switching to different framework.

    class MyClass implements XMLizable{
    ..
        public String toXml(){
            return new XMLFrameworksFactory().getDefaultFramework().toXml(this);
        }

    …
    }
    </sandor>

    I have commented on this before (not to you, in a seperate branch of this discussion). This is going about indirection the wrong way. This still wouldn't allow you to implement RMI. The user should be able to customize the format - not just the class. See my comments on that branch. And please, please strop writing "String toXml()". You are making this seem much simpler than it is. Returning a String from this methods is down right incompetent IMO. It is the absolute worst way to return XML. Even a StringBuffer would be immeasurably better, but a SAX InputSource is probably best.

    <sandor>
    Quote from http://java.sun.com/j2se/1.4.2/docs/api/java/io/Externalizable.html
     “Externalization allows a class to specify the methods to be used to write the object's contents to a stream and to read them back. The Externalizable interface's writeExternal and readExternal methods are implemented by a class to give the class complete control over the format and contents of the stream for an object and its supertypes. These methods must explicitly coordinate with the supertype to save its state.
    Object Serialization uses the Serializable and Externalizable interfaces. Object persistence mechanisms may use them also. Each object to be stored is tested for the Externalizable interface. If the object supports it, the writeExternal method is called. If the object does not support Externalizable and does implement Serializable the object should be saved using ObjectOutputStream.
    When an Externalizable object is to be reconstructed, an instance is created using the public no-arg constructor and the readExternal method called. Serializable objects are restored by reading them from an ObjectInputStream.”

    Isn’t the Externalizable concept similar to what I’m advocating with the XMLizable interface and toXml and fromXML methods?
    </sandor>

    Indeed it is, but there is a huge difference: users go through ObjectOutputStream, which in turn provides a means for an object to externalize itself - not the other way around as you suggested above. The protocol can easily be refined and overidden by subclassing ObjectOutputStream. If I want to add a schema reference at the top of the document, I can override ObjectOutputStream (or XMLObjectOutput in this case) and make it output this data at the top. If users call the object and it directly externalizes itself I cannot customize the format at all.

    <sandor>
    If you see analogy here I would challenge your argument on the flawed design I’m advocating.
    </sandor>

    That's a pretty weird claim. Externalizable is part of the Java Serialization system. How can you challenge my argument by referring to it? You are just emphasizing my argument by advocating the Java Serialization design - exactly the design I am advocating. If the Java serialization design was to to have users serialize classes by directly calling the methods of Externalizabe then you would have a point. Luckily for us that isn't the case.

    <sandor>
    I think our 2 different ways of attacking the problem could be done similarly to the java serialization framework: if there is XMLizable interface implemented in a class, you’d have to provide the toXml and fromXml (either code manually yourself or with a framework’s help as I wrote in the above example).
    If the class doesn’t implement XMLizable but implements another marker interface like XMLSerializable it would be serialized with XMLOutputStream. Do you see connection between your approach and my approach here? If you do we’re talking about the same thing from two perspectives.
    </sandor>

    Of course I see a link, and here is what it is: XMLOutputStream should be used to serialize objects as XML. Objects that implement XMLExternalizable would be able to externalize their own format. You seem to think that Externalize is an "alternative" to the Java Serialization design. It is not. Externalizable is part of the Java Serialization design: a way to customize the serialization format in the object. These are not "different perspectives". Externalizable is not design to be used "independently". Serialization should always go through ObjectOutputStream. Externalizable is just another means for the object to communicate with the serialization system: just like getter/setters, introspection, etc.

    <sandor>
    As not very important part of this topic but I disagree with your argument on xml data exchange between your application and mine. With your way of doing it you would require me to use a mapper that would support the same format as yours.
    </sandor>

    Well, obviously... Would you expect to communicate with my system when our formats do not match?

    <sandor>
    It’s nice to have open source libraries but what if you use a proprietary mapper that costs money and I can’t justify the cost? Wouldn’t then I ask you to try to use a format that we both can exchange for free?
    </sandor>

    Of course, that's why a good mapper (especially one that costs money) should be able to support the widest range of formats to enable interoperation with as many systems as possible - including, of course, any industry standards if there are any. I want to clarify for the one-hundredth time: *I am not arguing against a standard XML format for Java objects*. Nor am I arguing against providing a standard implementation within the JDK just as done for parsers, XSLT transformers, etc.

    Thanks for the good discussion.
    Regards
    Gal