Discussions

News: WS-Duck Typing - tips for flexible web services

  1. Arjen Poutsma has posted "WS-Duck Typing," which is (thankfully!) not yet another spec, but is instead a way to get more flexible type support from web services. The tips are:
    Don’t validate incoming messages!
    Not only is XSD-based validation slow, it also requires strict schema conformation from the other party, thus creating a strictly typed service. Such a service breaks Postel’s Law: be conservative in what you do; be liberal in what you accept from others. If you really want to do validation, do it on server-side outgoing messages only. After all, you should adhere to your own schema. Also, Schematron is the exception to the rule, since it not based on grammars, but on finding tree patterns in the parsed document. Which brings us to: Use XPath!
    XPath is an excellent way to extract information from an XML document. Code written with XML API like DOM, SAX, or StAX is typically quite fragile when it comes to element ordering, nesting, or unexpected elements. And XML marshalling isn’t much better: some of these API’s throw exceptions in these cases. Not with XPath. When using XPath, you don’t care whether the element is the first or the second child of ; the /person/lastname expression grabs it anyway. And if if you really don’t know where to find the last name, you can always resort to //lastname, which finds it anywhere in the document. In the past, XPath has dismissed as being too slow. With modern XPath libraries which support XPath pre-compilation, this is less of an issue. Don’t create stubs or skeletons!
    This is perhaps the most controversial tip. If you create client-side stubs or server-side skeletons in a strongly-typed language like Java, you throw away any option of being liberal about the XML messages. Instead, you have create a strongly-typed API that is strongly-coupled to the contract, and that passes or expects parameters of a certain kind. If they are of any other kind, or if they are simply not there, your code will never be invoked. Even if you didn’t need the parameter in the first place. If you treat Web services like XML messaging, rather than RPC, you could have handled the message gracefully: let’s see if I can find the first name under the person element, and if it’s not there, I’ll try and find in anywhere in the document. Still not there? Perhaps it’s an older message: I’ll just apply this stylesheet, and see if I can find the first name then. Et cetera, et cetera.
    Odd, I always thought the alternative to WS-"Death Star" was "WS-X-Wing," not "WS-Quack..."
  2. I agree that using XPath to extract message data is very robust approach although the most fragile way to work with XML in Java is not mentioned: JAXB/Castor style schema -> class generation. The points on validation are a little iffy, however. Can't find the name where it's supposed to be so just grab a name element from a random spot in the document? This reminds me of the nonsense about how XML is self-describing. I've worked with XML documents that contain dozens of distinct names that could not be used interchangeably. I will agree that validating the message based on the schema up front can be unfriendly to clients of your service. But at some point you must validate that the message is, well, valid. Think if you placed an order for a millions of dollars of goods that did not have a shipping address and the server didn't reject it but just grabbed another address off the document and shipped it there. That's not just grossly irresponsible. It's plain stupid. I for one would feel really uneasy about using a service written this way. It puts too much onus on the client to get the message right. Rejected messages are a minor problem that can be easily fixed. Putting expensive resources in motion doing the wrong things is a really big problem. But I will say that validation should be very minimal. Unless you really need something, it should not be required by the schema. Of course, known future needs should be considered.
  3. although the most fragile way to work with XML in Java is not mentioned: JAXB/Castor style schema -> class generation.
    Actually it is mentioned as 'marshalling', I guess. But it's said to be 'not much better' when it is actually much worse.
  4. Another thing. If you use SAX for parsing messages and your code is sensitive to element order, you are probably not using SAX effectively (excepting cases where you actually want to care about element order.) That's another point I disagree with in this article.
  5. Another tip is to incorporate a random key for a unique URI per client. This allows you to easily mount transformers or do additional massaging of the message before processing it without first having to interpret the message. This works great for B2B scenarios where you have different customers using different 'versions' of documents and you can simply align the data before processing by the one WS endpoint.
  6. Flexibility turns into a mess[ Go to top ]

    But at some point you must validate that the message is, well, valid. [...] I for one would feel really uneasy about using a service written this way. It puts too much onus on the client to get the message right. [...] Putting expensive resources in motion doing the wrong things is a really big problem.
    I totally agree with the points above. I think there's a common misconception that flexibility equals power. And it can when applied in the right places -- but extreme flexibility in a service contracts turns into a mess, I think. You will end up with several different clients sending you several different flavors of requests, and you must support them all. The variations that you DO support end up being very quirky and implementation-specific, making your interface a bit fragile since clients are never quite sure if they're crafting the request correctly, or when they're not, it makes it harder to figure out why. Trying to document a loose nebulous set of rules is very difficult, and that documentation will probably become rapidly outdated. If the size of a message is very large, using xpaths with arbitrary depth (such as "//foo") performs poorly since it must check every node in the tree. Using xpath will always be more expensive than calling getter-style methods and standard collections in an object tree. XPath is a nifty syntax, and I'm not saying to not use it entirely, but I still think there are performance concerns that we should be aware of there. I think a perfect example of how loose validation leads to a mess is HTML rendering engines. *shudders involuntarily* If you want your services to all be that quirky and impossible to fully understand, go right ahead. I'll stick with my hard but simple-to-understand contracts.
  7. I think there's a common misconception that flexibility equals power. And it can when applied in the right places -- but extreme flexibility in a service contracts turns into a mess, I think. You will end up with several different clients sending you several different flavors of requests, and you must support them all. The variations that you DO support end up being very quirky and implementation-specific, making your interface a bit fragile since clients are never quite sure if they're crafting the request correctly, or when they're not, it makes it harder to figure out why. Trying to document a loose nebulous set of rules is very difficult, and that documentation will probably become rapidly outdated.
    We get around this though by having a single, strict endpoint, but by having unique URIs per client, we can customize those special cases as filters/transformers and we only maintain one set of rules under a strict WS endpoint. It works fantastic for us in otherwise extremely complex/loose XML specs. Customer A hits: /app/4839jfjei23n43/xml/po Customer B hits: /app/4380f3ingo3nxc/xml/po There's only one po WS endpoint, but we can separately maintain customizations specific to Customer A via filter/transformer mounts without convoluding any of our core business logic or expectations within the po WS endpoint.
  8. Ah, that approach seems sensible.
  9. Re: Flexibility turns into a mess[ Go to top ]

    We get around this though by having a single, strict endpoint, but by having unique URIs per client, we can customize those special cases as filters/transformers and we only maintain one set of rules under a strict WS endpoint. It works fantastic for us in otherwise extremely complex/loose XML specs.

    Customer A hits: /app/4839jfjei23n43/xml/po
    Customer B hits: /app/4380f3ingo3nxc/xml/po

    There's only one po WS endpoint, but we can separately maintain customizations specific to Customer A via filter/transformer mounts without convoluding any of our core business logic or expectations within the po WS endpoint.
    This is a great point and anyone building a B2B infrastructure should assume that they will need to do this kind of thing at some point in the (near) future. Start with this as a requirement. There will always be somebody that can't fix this or that and business people breathing down your neck to get things running regardless of whose fault it is. This is especially true if the partner is large.
  10. XML data "duck typing"[ Go to top ]

    Last year I wrote on this topic, but used a better name: Oct 12, 2006 XML data duck typing and also in my blog: Wednesday, December 27, 2006 XML data "duck typing" My motivation for this approach stemmed from the fact that it became too burdensome to upgrade all relevant clients when XML-based message formats were enhanced. By having the clients use this XML data duck typing approach I was able to leave some client message consumers in place. Has been a satisfying approach and not particularly any more involved to code than JAXB-based object serialization.
  11. Re: XML data "duck typing"[ Go to top ]

    My motivation for this approach stemmed from the fact that it became too burdensome to upgrade all relevant clients when XML-based message formats were enhanced. By having the clients use this XML data duck typing approach I was able to leave some client message consumers in place.

    Has been a satisfying approach and not particularly any more involved to code than JAXB-based object serialization.
    How does supporting different formats for different consumers not solve this issue? It should and doesn't have the pitfalls of a fuzzy logic approach.
  12. Re: XML data "duck typing"[ Go to top ]

    How does supporting different formats for different consumers not solve this issue? It should and doesn't have the pitfalls of a fuzzy logic approach.
    In messaging-centric environments one favors context complete document messages. Multipe message consumer nodes potentially may be consuming the same message - but with a different emphasis for some subset information. (There very frequenlty are the introductions of intermediate message processors that do various transformations, archiving, monitoring, filtering, etc. - as well as new or supplemental business logic. Messaging is a whole different world apart from point-to-point communication techniques.) So if message schema is evolved to suit the needs of some particular consumer, then XML data duck typing assist as it enables one to avoid the need to upgrade other consumers for which the schema change is not relevant. By contrast, brittle object serialization techniques are more problematic in this regard.
  13. Re: XML data "duck typing"[ Go to top ]

    So if message schema is evolved to suit the needs of some particular consumer, then XML data duck typing assist as it enables one to avoid the need to upgrade other consumers for which the schema change is not relevant. By contrast, brittle object serialization techniques are more problematic in this regard.
    I'm familiar with messaging and you didn't answer the question. Allowing each consumer and producer to use a different message format for the same kind of message also avoids the issue you mention. I worked on a system with hundreds of consumers and producers and trying to create a single parser for all idiosyncrasies between the producers and consumers was not possible/feasible. There were often incompatible requirements and in other cases it too difficult to prove that changing the parsing/building for one producer or consumer did not break the parsing/building for another. What did work was having different versions of the same message funneling into and out of canonical forms. The other option was embedding lots of complex logic in the xpath transformations. This was attempted for a while but failed miserably. After 2 or 3 different formats had to be supported, it turned to spaghetti. Because many consumers had completely different message layouts we already had to do this anyway. Trying to mix similar different messages into a monolithic mapping was pointless.
  14. Very interesting, I can see the pros and cons of strict typing, I think it really just depends on the situation. So my biggest question is, what tools, frameworks, and technologies are folks using for this kind of approach. I've used a varietly of tools in the past such as XStream and a custom XPath mapping engine, but would like to see something that's "fully baked", robust, tested, scalable, etc.
  15. Arjen Poutsma has posted "WS-Duck Typing," which is (thankfully!) not yet another spec, but is instead a way to get more flexible type support from web services.

    The tips are:
    Don’t validate incoming messages!

    Not only is XSD-based validation slow, it also requires strict schema conformation from the other party, thus creating a strictly typed service. Such a service breaks Postel’s Law: be conservative in what you do; be liberal in what you accept from others.
    I don't think that Postel aim was the anarchy. I think the law must be carefully read as: be liberal in what you accept, provided that you can detect what was the intent of the requestor, beyond any reasonable doubt, even if the request was not formally extremely correct. XPATH solutions conform to this revisited law ? Please, make public warning for any service implemented in such a way. Just for an informed consensus for dangerous operations. Guido
  16. Fail fast[ Go to top ]

    Just a very quick comment: Use XPath is a good idea. ...but I disagree with "Don’t validate incoming messages" and "Don’t create stubs or skeletons!" "Postel's Law" sounds good, but when you know the law "Fail Fast", then you realize "Postel's Law" isn't so good after all... See http://www.martinfowler.com/ieeeSoftware/failFast.pdf for more information.