Ok, I'm relativly new to java, and I had a question on how to scale our web application. We currently have one app server, and our Lead Engineer has written a caching object that will deal with keeping normal java objects up to date, only loading them once, etc. We now need to add multiple applications servers. Rather then re-writting everything with Entity beans, could I perhaps have a Session Bean simply act as a middleman between all the servers, and the Caching Object? (only one server would be dealing with hitting the databse that way). I may be totally off base, let me know if there is an established pattern for something like this. Thanks in advance
First determine whether it is a bad thing to have duplicates of cached data on your multiple servers.
1. If the answer is "no" (for example, your application is primarily read-only), just replicate your existing application on the multiple servers and have multiple caches (which will have the best performance any way).
2. If the answer is "yes" (for example, your application performs a lot of write operations and you want to avoid data corruption from concurrent users), you will probably need to re-architect the application. This is a very tricky problem to solve, and there are no easy outs. In fact, solving this very problem was why Entity beans were created, so switching to these is not a problem.
3. If answer (2) is too painful, move you data-caching logic to its own application layer (which could be Session Beans or RMI), hosted on a single machine, and replicate your User Interface layer (web applications). Bear in mind that your throughput will be worse than it is now, but your will be able to scale to more users. You may still need to re-write the code, because the current caching layer assumes it is on the same machine as the UI.
4. Look for alternative products that tackle these issues, like the Coherence product that Cameron Purdy is always trying to sell :)
Thanks for the advice. We don't have a *lot* of writes compared to reads, but when someone makes an update to the database, if we had replicated the Cache, we need a way to notify the cache to refresh. Thats why I think we'll be going with option 3. Now, does this mean that reading from the Session bean will be slower even when the web front end (jsps) is located on the same server? Also, if I go with this approach, won't the main server that actually has the Caching mechanism be more heavily laden then the other servers? (Which is bad if we're just doing round robin load balancing). Thanks for the advice. Feel free to keep it comming!
Depends on ur enviroment. There are some application servers in the market that optimize calls when ur in the same VM/classloader. U can ofcourse switch to a weight based load balancing to avoid over loading ur caching server
If you move the caching control into a session bean, the performance on the server hosting the cache probably won't go down enough to worry about. This server will be (much) more heavily laden than the rest of the servers in the cluster, though. In fact, you may get better performance for the entire cluster if you move the caching logic to a separate, dedicated server.
Frankly, clustered caching logic is very, very difficult to do correctly, and is not something you cannot easily retrofit into your system. I suggest you do some expirements to see if clustering the web logic but leaving the caching logic centralized is really going to improve your scalability; you may do a lot of work and get no real benefit.
I guess the alternative is to have all the app servers have their own cache, and use their own database connection, etc, but keep each other's caches up to date? Doesn't seem like we need EJBs if we want to do it that way.
This problem with each server having a cache is that it opens a lot of questions on when the cache needs to synchronize with the database. If you synchronize for every read operation, you lose all the benefits of caching. If you synchronize periodically (say, once an hour), you risk stale data.
If you try to build a more sophisticated cache-refresh mechanism, things get very complicated very quickly. It is much more complicated to build a multi-server cache than a single-server cache. In general, you are better off using Entity beans or buying a caching product rather than trying to build your own multi-server distributed caching system (unless you have engineers with nothing-better to do for couple of months).
If you use read-only entity beans with some sort of cache invalidation mechanism you could get the benefit of caching. Have you checked out the Read Mostly pattern