Solution :- Process-Display-Process Desgin.
In this we adopt a approach which will process a small set of data first and then display the results to the user. While the user is having a look at the given results we can process the remaining results in the background and store them for further use.
Suppose we have a requirement, where we need to query on a table having a million records and fetch ten thound of them to display it to the user. But we sure can not display ten thousand records at a time, so we devide the records into the batches of 100 records each and the user can then scrole through the batches.
The Normal design would implement a Servlet and stateless session bean. The Session Bean queries the database based on the query criteria and then passes on the result set bound in PBV (pass-by-value) objects to the servlet. The servlet then Displays first batch and subsequent batches to the user based on this query data.
In this design (PDP pattern) we will make use of Servlets and Stateful session beans. The stateful session bean will have two methods, one for selecting the first batch, and another for selecting the remaining of the records. When the client enters some query criteria, these are passed to the first method of the stateful session bean (typically getFirstBatchResults() ). This method will then process the first batch results and send them back to the Servlet. The servlet displays the results to user making use of ( Response.flushBuffer() ). The query to database for selecting first batch can be made faster by using database features which allow you to select the "top n" records from the results.
After this, the control is still in the servlet. So we can call the second method of the Statefull session bean. This method ( typically getAllResults() ) will fetch all the records from the database and store them in the stateful session bean's privatte variable (arraylist). When the user wants to scroll through the records, we simply try to get that perticular batch from the stateful session bean. The getAllrecords() method needs to ommit the first batch results from subsequently adding to the arraylist.
1. This gives the user a feeling that the response is quite fast.
2. Since the most of the data is stored in the Stateful Session Bean, It reduces the size of HttpSession.
Disadvantages / limitations :-
1. It can not be used where in the user is required to show the summery of all the results. In that case we need to process all the records before we can show anything to the user.
2. It results into two network calls for the same query criteria.
3. Thread safe ness of the methods needs to be checked, as we are operating on a private variable of the Stateful Session Bean.
4. If the user needs to sort the records based on some perticular columns, then this can not be implemented.
First, for an argument as to why a stateful session bean should absolutely not be used in this situation, I'll point you here: http://www.onjava.com/pub/a/onjava/2001/10/02/ejb.html
Second, this is just a page-by-page iterator, and you can find it in the Sun blueprints.
The way you should set this up is with a stateless session bean with a single method, getRows(int startIndex, int pageSize,...).
If you're lucky enough to have jdbc 2.0, you can query the entire set and return only the window of rows requested using ResultSet.setFetchSize(pageSize) and ResultSet.absolute(startIndex). You can also use prepared statements here to make things more efficient. If you want to sort by a particular column, just pass it as a parameter to your session bean and generate it on the fly.
If you’re stuck with older drivers, you can use database-specific parameters to retrieve the results a page at a time.
thx for analysing the design.
The link u have given suggests that the stateful session beans should not be used, but their presence is not ignored, in fact, at some cases the use is recommended.
Also, the JDBC2.0 features are quite promissing. but in that case we end up doing multiple database queries. When the database is on the remote machine, this would certainly have some overheads. Also, the usage of resultSet.absolute() and resultSet.setFetchSize() would mean that we are always querying the database for all the records and just at the time of retrieving them we retrieve a part there off. Thus it will mean that we are putting a large amount of load on the database server unnecessarily. Also this will make the response of the all pages slow throughout (by the factor that the time it takes to query the records). Even if we use storred procedures (which are faster than prepared statements) the time taken for querying 50000 records is almost 10 secinds.
The main aim of PDP pattern is to improve the response of the all the pages throughout and not to put higher load on the database server.
Correct me if I have mis-interpreted anything.
I have a real problem with loading all the data into a session bean. And it’s for all the reasons listed at that URL above.
However, I think your app is a totally read only data source and you desperately want to minimize the user wait. You have decided that your database caching and retrieval is too slow, so you load it all into memory and trawl this, only returning pages over the network.
So you have optimized performance by:
- Creating a Java cache in memory in the middle tier at start up - i.e. performance hit is at servlet init.
- minimized network traffic by only returning pages from middle tier to web tier.
But, I'd argue, for a read only source such as yours you may as well move the cache to the web tier and really show your disdain for the database. I personally would not do this, as every large read only data set I ever worked with eventually becomes read write. But if you truly have a read only set that you can hold in memory all at once then stick it in a singleton on the web server.
In fact go the whole way and pre-render the pages....Speed will be amazing.
kind of serious.
Do you have a link to the page-by-page iterator in the Sun blueprints?
I´m using the Jdbc 2 driver from oracle 8.1.7. I tried ResultSet.setFecthSize and ResulSet.absolute. It works when i´m getting through the first 10000 rows of my table(it has 100000 - row size 400 bytes). However above this i always get an out of memory error. I have a 1GB machine pentium III 866 Mhz. In fact, the jvm where the ejb server is running never gets above 120 Mb. But it gives me out of memory errors, with one client only. Is there some point I´m missing? There´s some kind of max memory parameter, that indicates the maximum memory the jvm can use?
you get a out of Memory exception because
even though you code says:
the (oracle's) JDBC thin client
reads all the prceeding (9999) rows in to the memory.
That would explain why you get a out of memory exception.
you can use rownum to restrict the number of rows you are retriving.
suppose you want to retrive rows 11000 to 12000 you can say:
select * from (select a.*, rownum as num from a order by a.id) where num > 11000 and num < 12000
But this does not prevent your database from throwing an out of memory exception though :-).
and rs.fetchSize(x) simply helps you retrive x records when
the client talks to the db and caches them locally(at client side) so that
next call to rs.next() does not go back to the server.
so it would not help you solve the out of memory problem!
when jvm starts the following optional parameters can be set
-ms<size> set initial Java heap size
-mx<size> set maximum Java heap size
I don´t think the database will throw an out of memory error, since it can use temporary files and other things.
Anyway, thank you, I´m trying to use ROWNUM since the begining of this project. However, i could not make it work using rownum > <some value>. Surprisingly, your command work!
Thanks for the hint with the jvm too.
Oracle does not support a query for rownum > xxxx. we have achieved the above filtering by using Stored procs returning an array of the data. The stored proc defines a cursor position it to the required row num and return all the data in an array to the required rownum...it works fine.
I was trying to use resultSet.absolute() for a REF CURSOR return from an Oracle stored proc. It give me
java.sql.SQLException: Invalid operation for forward only resultset : absolute
I did use the TYPE_SCROLL_INSENSITIVE, or TYPE_SCROLL_SENSITIVE in the prepareCall().
I tried the same query using prepareStatement() with the scroll enable params. And it works.
Is it something special with Oracle stored proc or something else? Thanks
We'll that's a good design pattern. But the only problem is if you have a clustered application wherein the ejb and the web tier are in separate machines, then they become 2 network calls. In the case of clustering we could cache the data at web tier instead of ejb tier. any way load balancing is taken care by the App server, and also fault tolerance if your app server supports.
Umm, network calls and failover etc are all irrelevant. He does his load of cache on init. The performance hit is all at startup - which is fine.
If you want to optimise then stick it all in the web tier. Fine, large n/w traffic for each web server startup, but zero n/w traffic from then on.
From my understanding of the pattern, there is no real caching going on, the servlet makes a request for some data and the stateful session bean does this request in two stages, first retrieving enough data for the servlet to display one page and then retrieving the rest of the data in case the client requests more pages. The so called cache is created when the client makes the first request so the data will not be out of date at the point of them making a request.
If you are only dealing with data that will not change, then creating a cache is a good idea, but if the data is dynamic you are creating problems for yourself to ensure the cache is synchronised.
I have a question that's somehow related to this pattern. It is more a DB question than an EJB question but I'll try my luck. What is more time consuming, in a large query, to execute the SQL statement or to fetch the rows over the network ?
From what I've seen during the development of three web apps, with MS-SQL, Oracle and MySQL, when querying about 10000 records transfering the result data over the network to the EJB/web tier is definitely the most expensive.
So we always did just fine by scrolling the ResultSet up to the relevant page, retrieving the data that needs to be displayed and then closing the ResultSet.
An optimization would certainly be using the absolute() and
setFetchSize() methods that someone previously suggested.
If I got the "pattern" right, what it says is "get the relevant data, show it to the user, then get the rest of it maybe it will be needed later", which is almost never the case since I don't see many users browsing a 10000 records recordset page by page up the last one.
BTW, is anyone still moderating this "patterns" posts? They
became really lame lately, IMHO...
Oh, and I forgot, use CONCUR_READ_ONLY and TYPE_FORWARD_ONLY
results set. This should prevent the driver from storing
the records you don't need in memory.
We use data caches for our static data - as described above the data is retrieved at start-up which does make start-up a bit slower but significantly improves the performance of the app. This has been fine up until now but we now have some data that is more volatile. Would I be right in thinking that entity EJBs would be a good solution for the more volatile data or is this a question for another thread?
Using Entity beans will ensure that the data that you are using is up to date, but I have found that using them incurs an overhead where the appserver checks that the data is valid. If you know that there can only be one way of modifying data you could write your own caching system that also handles the writes to the database or you could use features such as Weblogics isModified to let the appserver know when the data has been changed, I have used both methods in projects and the choice comes down to what kind of data you are dealing with - Entity Beans:Simple Data Bespoke Cache:Advanced Data. If you are dealing with clustered servers or other methods of the data being modified you'll need to add synchronising of the data into the solution, all in all a bit of a pain
Hi Jonathan, when we are talking about n/w traffic Load balancing becomes relevant. When we are talking about Load balancing fault tolerance becomes relevant.
hi there, actually we have a lot of places in our application where we can implement this pattern , but since ours is an application we cannot use a stateful session bean , bcos it gets timed out after 10 min. even if i specify the idle-timeout-seconds parameter in weblogic specific deployment desctiptor . can u suggest me how to implement this pattern for tand alone applications?
thanks and regards
a common pattern to deal with the stateful session-bean time-out issue is to re-run the database query and use absolute(index) to move to the row being pointed before the bean instance gets passivated. This pattern is OK if most of the queries in your system are not too expensive. otherwise you need to consider using cache i think, like a result-pool at background keeps reading all the data(then i don't think the stateful bean will be passivated). But overusing cache can also cause performance problem. that's why we only use forward only ResultSet.
hope this helps.
Why not just use old fashioned SQL statements. Split the data into batches.. e.g 1-50, 51 - 100 etc. Read the Max record in your table and MAXNUMBER % 50. This gives you a fair idea of how many batches you can have. Now on the client side use the ranges e.g 1 and 50 as parameters to the session bean which would execute an SQL call to find all records between 1 and 50.
Though the JDBC 2.0 stuff is pretty cool, its still inefficient as the DB will fetch records that might never get used.
Hope this makes sense.
Would anyone be able to point me to an example of using
SQL to bidirectionally access rows in batches of 50?
I agree that user shall get the first batch result page immediately.
After that server side doesn't have control of sending the set of batch.
based on the user scroll through, next set of records will be send to the browser.
In case, user is not willing to view the remaining records. He leave the page then when will clean up occurs for the remaining records hold by stateful session bean.
If i set idle-timeout for the stateful session value to 20 also. Will it not affect the performance and memory usage.
In case user is requesting the second set of records after the expire ( > 20 ) of the stateful session bean. What will server send to client side ?
For solving this particular problem, I prefer to use both stateful session bean and HttpSession.
Let Stateful session bean maintains all the records and HttpSession instance maintains the batch index and set the httpsession timeout should always be higher than stateful session session.
Looking forward your reply if still any bottleneck existis in this solution.