XML & Web services: semantic validation in JAXB
Hi, When a user imports an xml file into our application, I need to be able to detect errors and report them and their locations in the document. The import process is done via JAXB. I know how to validate against the schema and to get error info back, but how do I catch errors that can't be constrained by a schema, like domain level (semantic) errors? Example 1: I'm parsing an xml document in which some elements contain values that are foreign keys into data stored in a database. When the document is being parsed, I want to make sure that records with those keys exist in a database. If they don't, I want to the unmarshaller to generate an error containing location info (line+position) of where this invalid value was found in the xml document. Any hints on how to implement this? Example 2: Some data being imported is in a key/value format. I need to make sure that these key/value combinations are unique within the document. E.g. 1 title 1 1 title 1 1 title 2 Parsing of the third object should raise an exception because there is already an object with id=1 associated with a different title (notice that object 2 is valid). And again, I want to be able to catch these errors during unmarshalling so that I could report their document locations. thanks dmitry
- Posted by: Dmitry Beransky
- Posted on: October 06 2006 11:27 EDT
Well! Though I don't have much knowledge about JAXB, still consider the following: If you want to validate the data integrity like your object identity, you can perse the XML file with a sax parser, and register a ErrorHandler which will output the errors. cheers, http://www.javaicillusion.blogspot.com/
I recommend that you take a layered approach to the validation. The first layer consists of the Schema rules that JAXB checks. The second layer are other business rules that are coded in Java (either directly or using a rules engine). There is no need to do all of the validation at unmarshalling time. Cheers, Tony.
Even I have a similar situation to handle. I will be taking layered approach. But consider a scenario where I just want to validate if the data in the section, say <organization></organization> in the beginning of incoming file contains valid data. The xml file coming in will be huge and I don't want invest in reading it. Infact in some scenarios, we will have to store it in DB as CLOB, for offline processing. If the data in <organization>someuniqueidentifier</organization> section is not valid or is not found in my DB, I just want to discard the whole file returning a meaningful error to user. Any thoughts about this.