Discussions

News: A Modular Approach to Data Validation in Web Applications

  1. Data that is not validated or poorly validated is the root cause of a number of serious security vulnerabilities affecting applications, such as Cross Site Scripting and SQL Injection. A paper entitled "A Modular Approach to Data Validation in Web Applications" presents an approach to performing thorough data validation in modern web applications so that the benefits of modular component based design (extensibility, portability and re-use) can be realised.

    It starts with an explanation of the vulnerabilities introduced through poor validation and then goes on to discuss the merits and drawbacks of a number of common data validation strategies such as:
    • Validation in an external Web Application Firewall
    • Validation performed in the web tier (e.g. Struts)
    • Validation performed in the domain model
    Finally, a modular approach is introduced together with practical examples of how to implement such a scheme in a web application, including such strategies as transformation of data to a canonical format, detecting attacks based on likely vectors, accepting only valid data (i.e., a name shouldn't contain angle brackets), and escaping meta-characters that might have specific meaning for specific contexts (i.e., watching for characters that might execute something unexpected in a SQL engine or LDAP).

    What other possibilities can you think of to modularly test data-related vulnerabilities in web applications?

    Threaded Messages (17)

  2. Validation in setters[ Go to top ]

    I'm curious as to what people thing of putting validation (ie. business logic) in the domain model, in setters specifically.

    I've always thought that this crosses the boundary between model and business rules/services.
  3. Validation in setters[ Go to top ]

    I'm curious as to what people thing of putting validation (ie. business logic) in the domain model, in setters specifically.

    I've always thought that this crosses the boundary between model and business rules/services.

    Everything I've read is that putting validation in your domain model is fine, but only done correctly. In most cases, within series of interactions, you can't assert that your objects are valid in all context-- but are validatable. So the domain model shouldn't actively enforce validation of business rules unless specificaly requested. Like transactions, let the state be applied/manipulated, and then when you 'commit', validate. This can be supported by something like Hibernate's Validation Annotations or some other custom Validatable contract on your domain objects.
  4. Re: Validation in setters[ Go to top ]

    Everything I've read is that putting validation in your domain model is fine, but only done correctly. In most cases, within series of interactions, you can't assert that your objects are valid in all context-- but are validatable. So the domain model shouldn't actively enforce validation of business rules unless specificaly requested. Like transactions, let the state be applied/manipulated, and then when you 'commit', validate. This can be supported by something like Hibernate's Validation Annotations or some other custom Validatable contract on your domain objects.
    I couldn't agree more. I find the model of asserting that the domain objects are always valid is counter-productive. Sometimes (often) you have to transit through invalid states before reaching a valid state. This can be worked around with constructors and setters that take combinations of values, but in the end it makes things less toolable and less productive. I like the hibernate annotations/validator way of doing things. You express your constraints validations and check them prior to saving to ensure state is valid before it is persisted. That said, I think there are a bunch of validations that are specific to web applications - primarily those involved in converting from Strings back into rich types, but often enough there's also some contextual valiation. This shouldn't end up in a domain object, but it needs to be very easy to use otherwise people will continue to roll their own. -Tim Fennell Stripes: Because web development should just be easier.
  5. Validation in setters[ Go to top ]

    I've always thought that this crosses the boundary between model and business rules/services.

    This statement highlights a common OO anti-pattern, the Anemic Domain Model (often times seen in J2EE apps, which encourages this anti-pattern). I recommend reading Martin Fowler's Anemic Domain Model:

    http://www.martinfowler.com/bliki/AnemicDomainModel.html

    From Fowler's article:
    The fundamental horror of this anti-pattern is that it's so contrary to the basic idea of object-oriented design; which is to combine data and process together. The anemic domain model is really just a procedural style design, exactly the kind of thing that object bigots like me (and Eric) have been fighting since our early days in Smalltalk. What's worse, many people think that anemic objects are real objects, and thus completely miss the point of what object-oriented design is all about.

    Domain Objects are business objects and should by all means encapsulate business rules pertaining to that domain object.
  6. Validation in setters[ Go to top ]

    Fowler does not go that far. He has qualified his remarks in response to some folks who took them to mean you should load up domain objects with application specific rules.

    What Fowler says and what is good practice is to have domain objects validate their own domains. If they don't then what is the difference between a String, for example, and a domain object that merely gets and sets its value? However, he does not suggest and nor do I that you add application specific validation to the domain object. (He might go further than I in some cases though.)

    What is the difference? The domain for a U.S. zip code might be a two part field of numeric characters, each fixed length. The first is mandatory and has a fixed length of 5 and the second either is null or has a fixed length of 4. You might add more rules if you knew which combinations were valid zip codes or not. That would be a domain. If you only to deliver a product to certain zip codes, on the other hand, that could be a business rule that should not be part of the domain object's validation. Certainly this restriction would not apply to the zip codes of your employees or suppliers.

    I also don't extend domain objects in order to add business rules. I prefer to delegate to them instead.
  7. The problem with your analogy lies in the changing nature of business rules as relates to the application.

    I like the way you make a distinction between a rule intrinsic to a zip code and a rule that is only relevant to the application at hand. A zip code is 5 digits and maybe plus 4. Thats a rule about zip codes. A valid zip code for the application may only lie in Florida.

    The problem is that application requirements keep wandering between those two rules. If your shipper requires you to have a full 9 digit zip, that should be an application rule but it conflicts with the domain rule (or overrides it).

    Perhaps you can have an address without a zip code and perhaps you can't. That depends on the application. But it interacts with the domain rule.

    Perhaps you have a module in your system that corrects the address and fills in the zip code given the street and city. Then the application rule is that a missing zip code is ok if you have a city, state and street before you send the address over for correction and not-ok afterwards. Again, the application rule overrides the domain rule.

    My experience is that such domain rules have to be very shallow to be useful and that almost all validation is application based.

    Even the rule that zip codes have to have 5 digits minimum and maybe 9 quits working as soon as you want to sell something from the US into Canada. Then you typically expand the concept of zip code to be postal code (whether or not you rename it in your application) and rework all the validations.

    My point? The distinction between domain rules and application rules is useful when you are designing and thinking about the application but isn't very practical in real life programming of applications. That's one man's opionion.
  8. My point is that if you look beyond the individual applications, the separation between domain rules and application rules is very important.

    Domain rules are intended to fill in the gaps between real world data types and the limited data types computers, databases and languages can support. Domain objects and domain rules sort of complete the type definition .

    The problem with mixing domain rules and application rules comes when you are trying to reuse domains and data and when you persist data. I can store an address that is valid for some applications and not others. I don't want to store the address multiple times in different places. What I want is some domain (or at least enterprise-wide) rules that everyone must always follow and not have to repeat those rules (or their structures) everywhere so I want separatation between the domain rules and the application specific rules.

    I agree that in a number of applications it seems hard to justify separating the two because one or both might be pretty thin. If you are working at the enterprise level and are concerned with reusing data, XML schemas and structures, its worth doing. Of course, domain objects are just one approach to accomplishing this. I'm always open to a better solution that is 1) still OO (or close), 2) maintains the separation at some point for ease of maintenance and administration, 3) allows me to chose which combinations of rules to apply in any circumstances.

    Wouldn't it be nice if the database vendors supported user defineable domain types? But that still wouldn't solve the problem on the front end where you really want to catch the invalid data. If everything on the front end were XML and if XML validation was faster, we could do this with libraries of XML schemas and type restrictions.
  9. I'm always open to a better solution that is 1) still OO (or close), 2) maintains the separation at some point for ease of maintenance and administration, 3) allows me to chose which combinations of rules to apply in any circumstances.
    I don't mean to constantly plug our stuff on TSS (though it might look like that). This just seems, once more, pertinent to mention. What you describe above is exactly what we tried to solve with RIFE's automated meta data merging and constraints. You can read the TSS story about it here:
    http://www.theserverside.com/news/thread.tss?thread_id=39308

    And find more details in our release notes and documentation:
    http://rifers.org/blogs/gbevin/2006/3/2/rife_1_4_released#1_4_highlight01
    http://rifers.org/wiki/display/RIFE/Constraints
  10. Domain Rules vs. Application Rules[ Go to top ]

    I don't mean to constantly plug our stuff on TSS (though it might look like that). This just seems, once more, pertinent to mention.

    It's not your fault that people (including myself) are constantly wondering about ways of solving certain types of problems. If you have solutions that address those types of problems, then they are certainly worth mentioning.

    Frederic
  11. Validation in setters[ Go to top ]

    Fowler does not go that far.

    Nor do I ;-)

    I'm not advocating "loading up you're object with every application specific rules". All I'm saying is Domain Models should be more than just Data structures.

    There are many other design patterns that make sense for various validation needs, including descrotors, proxies/delegation, adaptors, etc. I tend to use something akin to validation strategies, and/or state + visitors.

    As you stated William, there are many different levels of validation requirements including actor input (application) validation, domain model (business) validation, transaction validation, system requirements, state (transition conditions), etc. There's not a one shoe fits all design pattern.

    My original response was *only* addressing the statement about Domain Models as a simple Java Bean (only simple getters/setters) and all functionality in a Service Layer. Maybe I misread Derek's post, but it sounded like he was leaning on the ol' EJB Stateless patterns with DAOs / plain Data beans, and stateless functions.
  12. Validation in setters[ Go to top ]

    Maybe I misread Derek's post, but it sounded like he was leaning on the ol' EJB Stateless patterns with DAOs / plain Data beans, and stateless functions.
    Close, but replace EJB with Spring. I was thinking in terms of an architecture with DAO, POJO services, with a Struts client.
    If I put validation in the setter, then I start to get errors when I build the domain object from Struts to pass to the service, which may or may not be accurate.

    For example, I can't assume Struts will give me a perfect object, since I would leave it to the service to do validations on dependent objects. I wouldn't want to the domain object to hit the database to retrieve dependent objects. And a partially-populated object passed from Struts may be just fine, as it could be up to the service to build the rest of it before passing it to the DAO for creation.

    Reading through the thread, the opinion seems to be that perhaps some "light" validation, such as string lengths, would be fine.

    To circle back to the original article, I can see why validating data is such a security hazard. There is no single way to implement it! :-)

    Derek
  13. Validation in setters[ Go to top ]

    Maybe I misread Derek's post, but it sounded like he was leaning on the ol' EJB Stateless patterns with DAOs / plain Data beans, and stateless functions.
    Close, but replace EJB with Spring. I was thinking in terms of an architecture with DAO, POJO services, with a Struts client. If I put validation in the setter, then I start to get errors when I build the domain object from Struts to pass to the service, which may or may not be accurate. For example, I can't assume Struts will give me a perfect object, since I would leave it to the service to do validations on dependent objects. I wouldn't want to the domain object to hit the database to retrieve dependent objects. And a partially-populated object passed from Struts may be just fine, as it could be up to the service to build the rest of it before passing it to the DAO for creation.Reading through the thread, the opinion seems to be that perhaps some "light" validation, such as string lengths, would be fine.To circle back to the original article, I can see why validating data is such a security hazard. There is no single way to implement it! :-)Derek


    Using Spring and AOP, i was able to inject business validations on service method calls. For e.g. if an action calls a service method, say saveOrder(...) and passing an object graph, the validator is configured to be called anytime a method with pattern save* is called on the target object, which in this case is the service object.
    This way, the service objects don't have to worry about changing business rules, since you could replace a new validator through XML.
    Furthermore, i do agree, basic data valdiations should be part of domain objects where as business rules should be pluggable.
  14. Validation in setters[ Go to top ]

    Using Spring and AOP, i was able to inject business validations on service method calls. For e.g. if an action calls a service method, say saveOrder(...) and passing an object graph, the validator is configured to be called anytime a method with pattern save* is called on the target object, which in this case is the service object.

    I would be very interested to see how you did this. Can you show an example of a Spring xml file using these techniques?

    TIA
  15. Validation in setters[ Go to top ]

    You can put some validation in setters, but not all:

    - Consider date input in text field that should be put in java.util.Date property of the JavaBean. You have to validate and convert string to a Date before invoking setter.

    - Validation only in setters for the apps where domain and UI are on separate tiers requires network roundtrip which may not be acceptable from the usability and user experience reasons.
  16. Since everything is ultimately stored in the database, the database has the last say anyway ;-)
  17. As a simple example consider the name "Conan O'Brian". This is a valid name, and should thus be accepted by the domain object. But when sent to SQL, this name might create and SQL-Injection problem if the SQL-statement is using singly quoted strings (' as a string begin/end marker). Same yields for javascript.

    The problem here is meta character handling. Meta character must be handled when data is leaving your code. And data leaves your code when you're storing data in your database/subsystem, or when you're sending data to the user.

    Don't get me wrong. I like the idea about validating input in the domain objects, but I'll rather go for a "yes, both please" approah. This is more in terms with defence in depth.

    My approach is rater:
    - handle simple input validation (string, number, length etc.) when input enters your system, either from the user or a subsystem
    - Never massage input to make it valid - throw away and give an error message
    - Handle domain specific input validation in your domain objects as this is domain knowledge
    - Handle meta characters when data is leaving your system (as an example always use HTML-escaping unless there is a specific reason not to - in that case manually escape characters which should not occour
  18. Exactly[ Go to top ]

    I agree 100%, and this is exactly what the paper recommends!