HTML Screen Scraping in JSP/HTML pages

Discussions

Web tier: servlets, JSP, Web frameworks: HTML Screen Scraping in JSP/HTML pages

  1. HTML Screen Scraping in JSP/HTML pages (5 messages)

    Hi

    We are about to develop an Internet facing J2EE application. It is planned to use JSPs/HTML for the presentation tier. Some client side validations would be developed in JavaScript.

    One of the possible requirements is that the end users would use HTML screen scrapers to access the application.

    We would like to know the following:

    * Can HTML screen scrapers modify the request data (and hence bypass the client side validations)?

    * Are there any specific guidelines to be followed? e.g. server side validation?

    Any kind of inputs on this will be highly appreciated.

    Thanks in advance,
    Lala
  2. HiWe are about to develop an Internet facing J2EE application. It is planned to use JSPs/HTML for the presentation tier. Some client side validations would be developed in JavaScript.One of the possible requirements is that the end users would use HTML screen scrapers to access the application.We would like to know the following:* Can HTML screen scrapers modify the request data (and hence bypass the client side validations)?* Are there any specific guidelines to be followed? e.g. server side validation?Any kind of inputs on this will be highly appreciated.Thanks in advance,Lala
    If you are considering screen scrapers already you might as well create a solution that renders a view that is more friendly than HTML. Using struts or any other MVC framework out there you can read a request parameter return an XML view of the same model.
  3. You should perform server-side validations in addition to client-side validations no matter what kind of web-app you are building. It is always possible to circumvent client-side validations.

    Utilities like Jakarta's Validator framework can make this process easier: http://jakarta.apache.org/commons/validator/

    Here is my general approach:

    1. Client-side validations to make the user experience better.

    2. Server-side validations to ensure data is valid. Often identical to client-side validations.

    3. Where possible, use some generation tool to ensure client-side and server-side validation logic is the same. Regular expressions are great here, because both Java and JavaScript support them.
  4. Paul,

    Thanks a lot for the valuable inputs !

    I take your point that the validations need to be built in the server side, no matter whether they are there in the client side or not. Given this, would you be able to throw some lights on the following.

    * Is there any generic guideline that can give me some details about how to implement the scenario when validations need to be there in both client as well as server sides ? Using the validator framework is one way of implemention. I am interested in a more generic form of the guidelines.

    * Is there any standard method/tool for "testing" the system when validations are there in both client as well as server side ?

    * In you third point you have mentioned about "Generation Tool" and "Regular Expressions". Can you please elaborate on this a bit ! What exactly are they ?

    * While validating the data in the server side, apart from validating the data contents, should we need to put some extra validations like if the number of data fields are proper etc ?

    Thanks and Regards,
    Lala
  5. Hmm. I am afraid I don't know of any articles specifically on this topic. It is a tough problem.

    I do know how this kind of problem is handle in many web frameworks though. Both Struts and Tapestry provide mechanisms for defining validation logic on the server side, and automatically generating JavaScript validation logic client side.

    For this to work, it would be best to have some sort of component (e.g. a JSP custom tag) generate all you fields. The custom tag could example a configuration file, and generate the necessary HTML and JavaScript logic for your field.

    Regular expressions are a syntax for matching strings. For example, the regular expression "[0-9]{2}-[0-9]{2}-[0-9]{4}]" will match strings like "##-##-####", and can be used to verify that a string is a US-style date. Regular expressions are a good generic validation mechanism, because they are supported by many programming languages, including both Java and JavaScript.

    All of these are very difficult problems to solve. Rather than trying to build them from scratch, I suggest you take a look at the existing web frameworks to see how they are supported there. Struts is the most popular framework. Other frameworks (such as Tapestry and my own framework, Chrysalis) handle it as well.

    Even if you do decide to build your own code to handle this, I suggest you look at one or more of those frameworks to see how they handle the problem.
  6. Paul,

    Very valuable inputs !! Thanks a lot mate..

    Cheers,
    Lala