Query with bulk result is a normal operation in a web application. Here I want to discuss with you about how to meet this problem:
Sun propose a so called "ValueList handler" pattern to meet this problem and Floyd(middleware-company.com) also suggest we can do in the way of JDBC and RowSets for reading. Both of them all have a same idea that make a query in a session bean (with helper object) and return Value object(or RowSet) other than ejb reference to client. The difference is Sun cache the result in stateful session bean and Floyd return all the result at once.
I think both of them all ignore an inevital problem:
1) For Sun's, it can eliminate numerous database query times due to ejbFinder and reduce network traffic since it only return a small subset of result like a cursor. But It may suffer a dead problem if we have 1,000 users concurrently make a same query operation. Suppose we have a query with 1,000,000 records and have 1,000 users to do such query. What happen?
2) For Floyd's, It doesn't have caching problem, But It will return 1,000,000 records to user at once which may be interesting first 30 of them.
My idea is that:
Using ValueList Handler pattern but No buffer in stateful session bean ( I think this pattern is more suitable for the programming to interface idea). To do so:
1) Every query must limit returned max rows ( which can be defined in config files)
2) Every query will return an iterator which keep the reference of this stateless session bean and have the ability to create a next queryEvent.
3) If local copy have been iterated over then iterator will generate a next query Event and send it to remote session bean to retrieve next small subset.
3) Client only need to operate iterator to get next ValueObject regardless of it is local copy or remote method invocation.
Doing so can benifit from:
1) Uniformed query handling
2) No high cache overhead in server side
3) No Network Traffic congestion.
Looking forward to your reply
We have been considering a similar solution for our applicactions. As I understand it from step 3 in your description:
> If local copy have been iterated over then iterator will generate a next query Event and send it to remote session bean to retrieve next small subset.
You mean that the sesion bean will re-execute the query to get the next block of data. This is OK if the data does not change very often otherwise you may miss rows or get rows retrieved twice.
We have also been considering caching the results set in a type of entity bean. The session bean executes its query and caches the results in a BMP entity bean. The bean is declaread as using BMP but does not actually have any persistance. The primary key is used to identify the query and any arguments to it. If another session bean needs to execute a query it can first do a findByPrimaryKey to see if there is a cache bean already created for that query. That way multiple clients can access the same cached data. Again this is most useful for data that will hardly ever change. But that is true for most caching strategies.
Thanks for your reply. And I have two questions from your reply:
1) Is BMP bean suitable for Caching? Caching a ResultSet in a BMP bean means you will hold an implicit db cursor for a latent bulk query. And you will locate this bean using findByPrimaryKey method. I am doublt about "multiple clients can access the same cached data". since BMP shared access mechanism is not single bean shared by multiple client but multiple bean instances representing a same resource shared by multiple clients. If multiple client concurrently dispatch a same query, you will have multiple caching in memory.
2) Is BMP bean suitable for bulk clients? Suppose there are 1,000 clients concurrently dispatch a same bulk query, say 1,000,000(I think this is a quitely normal case), Entity bean Pool must be over-load due to limited resources. Futhermore, DB cursor is a another biter of resource if we use it out of limitations.
My point is don't do any caching for bulk query in the server since maintaining a lot of bulk caching will critically hit the performance.
I am thinking make batch query using some special things to locate next block instead of caching. for example OID (object identifier) is a good one.
We don't hold onto the ResultSet we cache the data from the result set so the connection to the database is not held onto. I know different app servers may implement things differently but I am sure that Weblogic (which we are using) only has one instance of the entity bean accessed by multiple clients. This means that calls to the entity bean are qued but in this case they are very quick calls. Also we would not have anything like 1000 users hitting the same query at the same time. If I am wrong and there are lots of instances of the entity bean than we may have to re-think our design. At the moment we are still in the process of evaluating different solutions to a number of design problems.
There seems to be a number of ways of handling large queries at the moment but nobody seems to be able to come up with the definative solution.
I think as far as i get it right from your postings, that you both run into trouble with your mentioned approaches.
For Gabriel's approach:
The ValueList Handler in a stateless session bean doesn't solve the problem, because a stateless session bean doesn't share the ValuesList data. Every stateless session bean instance would cache a single copy of the data. It is not guaranteed to the client, that he get's the same instance of the stateless session bean, even he has cached the stub. What you can determine with your approach is the fact, that you can better limit the amount of the instances on the server. If you configure them equally to the amount of the available threads in the threadpool of the server, you have your solution (if the system doesn't crash).
For Jonathan's approach:
I think you have the same problem as Gabriel in not having only one instance of your entity bean. AFAIK is it of no importance if you use Weblogic or another server.
We have had the same problem and we couldn't solve it to our full satisfaction. What we all need is a singleton, which is managed by the container, but there is no EJB, which fits this. So we have choosen a JVM controlled singleton class and have to suffer about scalability and fail over (manually copy). A possible solution is perhaps using a singleton as a CORBA class. Any more proposals or comments?
When you say that severan clients will use the resultset from the question and this resultset is accessd by findByPrimaryKey , doesn't that mean that the whole question including the where-clause must match exactly between the clients ?
How do you find out the ID of the question ? Are they "pre defined" ?