I would like to ask for some expert advice. We have a J2EE application running under Weblogic Server. The application is in the area of workforce optimization (e.g. optimizing employee's schedules, etc.) that uses a lot of stateless EJBs that perform database access via JDBC. All the EJBs use BMB. Almost all of the EJB calls involve transactions.
There are a few cases in our application when we have to deal with very large scenarios. For example, a single EJB call, such as
void ScheduleEJB.saveSchedules(Collection collSchedule)
could need to save the schedules of 5000 employees in the database. This could mean that tens of thousands of records need to be inserted into database tables. We are thinking about a couple of issues that have to do with this scenario:
1) EJB transcation timeout. I believe that the EJB transcation timeout is typically set to some value between 1-5 minute. With the amount of data we need to insert into the database, we are bound to exceed that value sooner rather than later. The caller then would get an exception.
2) A large (and long) transaction like that uses server resources while it is running, potentially preventing other users from accessing the same tables whil it is in progress.
Now keep in mind that that this Schedule EJB I referred to ha a lot of functionality (over 50 methods). Most of the methods do not deal with large amounts of data. There are only a couple of methods that could potentially cause long trasactions.
I would be very grateful if people could share some advice as far as how to deal with these issues.
Thanks a lot
We do not use any use entity beans - we only use statess session beans in our aplication that perform database access via JDBC calls
I reckon you have quite an unstable design. You should escape large write transactions under any means - in any other case you'll have a lot of performance problems in your application hence databases usually lock the rows when they perform write operations.
If you deal with READ_COMMITED transaction isolation level (which is default for most of the databases), you'll always have problems while reading your data - your database will wait for your write transactions to commit and just imagine the user waiting for several minutes to get the response back. And I assume that you read your data much more often then your write.
In that case, I recommend you to use asynchronous strategy for your transactions - you'd better perform hundred of short transaction rather than one long transaction. You can create a special MDB, that will execute several update transactions in equal portions (for example, 1 transaction of 5 updates per second or even per minute (if the data you're updating are not very precise by nature)), thus giving your database a chance to execute queries more often, hence increasing performance in times! You MDB can be activated from your session bean easily.
thanks for your reply. It appears you're correct in that the readers will wait for the writer to commit changes before they can proceed with READ_COMMITED isolation level. At least this is the case with MS SQL Server 2000 I was testing in-house. My database guys tell me that with Oracle it is not the case - the writer does not block readers from reading the data. They will just get the data as if the writer was not even there, and after the writer commits, they will get the new data.
Apparently SQL Server is not as sophisticated. It looks like it quickly escalates the row-level locking to page-level locking, probably because there are many inserts to adjacent rows on the same page(s). This makes the readers wait for the locks to be released.
How unfortunate :(
As far as the solution you proposed: There is one big reason I don't like it:
The updates happen asynchronously, so there is no good indication to the user when/if the updates successfully complete.
How about another solution: I write another (facade) EJB that has a single method like this
void createSchedules(Collection largeCollection)
This method will be marked as 'NotRequired'. In the implementation of this method I will perform the update in reasonably small chunks, calling another EJB method like
void createScheduleChunk(Collection smallChunk)
several times (as many times as it is necessary to perform the whole update)
This method, in turn, will be marked 'RequiresNew'. So each chunk will be saved in a separate transaction.
The only trouble is with the atomicity of the whole "createSchedule" operation. That is when chunk #46 fails, how do I roll back the updates caused by whole "createSchedule" operation (i.e. chunks #1 through #45) ?
What do you think?
You're right about Oracle as long as its READ_COMMITED isolation level is the same as REPEATABLE_READ on most of the other database. And I really love that they've achieved the same performance for READ_COMMITED+REPEATABLE_READS as other databases have on REPEATABLE_COMMITED.
You "chunk" idea is almost the same that I've offered - there're many ways of reporting back to the client that update has been completed in asynchronous messaging. But in your approach the speed will be the same as with long transactions, so your client will wait for a lot of time before he/she gets updated data. In my approach, he/she can get do something else while the data being updated. Are you sure that you definitely need all updates to be rollbacked if one fails? If so, than you've to create a special array that will contain the statements that should be executed in order to perform rollback - this operations should be executed in batch for performance.
That's how I see it.
We think it's okay for the writer to wait - it only affects a single user, who's creating a large schedule after all. Making the readers wait is not okay: it affects many many users. With my design, I don't think the readers will be forced to wait longer than it takes perform a transaction for a single chunk. For example: if the writer wants to create schedules for employees with IDs 1-15, he/she will call createSchedules() method, which in turn might create schedule in chunks for 3 employees at a time by calling another method createScheduleChunk() 5 times. If the reader with ID 1 want to see their shedule, they will only have to wait for the first chunk to complete.
Regarding your question, yes, we do have a requirement that the whole schedule needs to be saved successfully or not at all.
Like I said before though, the only trouble I see with my approach is if the reader sees their new schedule, and at some point later on the writer decides that the whole thing needs to be rolled back. If the whole thing is rolled back then the schedule the reader saw is no longer valid.
I am not quite sure how to address that.
Just talked with someone, and it looks like with chunking we don't really have to rollback the whole thing if some chunk fails. It is really really hard to implement and it is of questionable value. So maybe the chunking approach will work after all.