Hi, my application contains some read-only data (which can be large volumn) that will be requested by a huge number of clients concurrently.
the problem is these read only data WILL be updated by another Java problem in the background, and the refresh rate is very high.
should I use read-only entity bean ? I use JBoss 3.X and I don't want to care about caching, pooling of data. Since the data will be access by a lot of concurrent requests, a reliable, highly efficient caching, pooling method is a must.
or I should write a POJO singleton to hold the data and do the caching manually ?
any help would be appreciated
Frist of all, don't use entity beans if the data volume is high and we are talking about read-mostly data. DO NOT!
I have implemented caches as a variant of Singleton. However, the data volume was small at that time, and the refresh rate was also slow. In that project, we made refresh of the cache every minute or so without any problems.
If you have more data than you want in memory, the cache is much more complex and it is hard to give any recommendations without knowing details on data. If a record i accessed, is that record likely to be accessed again within a short timeframe? Or is the access more random?
If the refresh rate is high, is it acceptable to use old data for a while. Or do you need really fresh data? If so, you should not have a cache at all. (Caching always results in the risk of using old data.)
So, what to do? My proposal is: Write the application without cache. But encapsulate the data access in a Singleton. If you run into performance problems, rewrite the Singleton to include a cache too. (Never solve performance problems until you have them. But always reserve a well encapsulated space for the solution.) Then you don't have to rewrite the client code accessing data.
Also note that you should use PreparedStatement (in JDBC) to make it possible for the database to cache the queries.
thanks for help !
The nature of the data are:
1. high volumn
2. same piece of data may be requested by a large number of client concurrently
3. refresh rate can be very high if in peak period
4. client get old data isn't a problem
5. but client should get the newest data in a acceptable time period (eg. if the database has updated, client should get / see the newsest data with no more than 15 sec.)
6. NONE transaction is involved, all update is done by external Java background process
if I implement a singleton (say DAO1) to preform data access, can my application scale up to cluster in an easy way? In a cluster envoirnment, when the data is updated in the DB, how can I inform all DAO1(s) in the cluster to refresh ?
I have a draft design to place a messge driven bean (say msgBean) in front of each DAO1. When the background process update some data in the DB, it send a message (JMS) to all msgBean(s). Then msgBean will invoke the coressponding DAO1 to refresh.
does it sounds good? or anyone can give me some comments ?
I have been on a nice vacation for a week, so sorry for this late reply. With this new information, I have some more comments.
1. I hope you have enough memory in the production environment. This will simplify a lot since you can load everything every 15th second (or so) and then just through it away when new data is loaded.
2. To me this sounds like an in-memory cache! I suppose it is a Web application, and then you have to keep the cache in the Web container.
3. Then you have to refresh the cache quite often. No problem, except for possible bottlenecks when it comes to performance.
4. This is great! This is a prerequisite for a cache since a cache can never guarantee 100 % updated data all the time.
5. Making a refresh now and then is not a problem. Every 15th second does not sound very often to me.
6. Great! Then you definitely dont have to involve the EJB container in the cache solution.
The MDB solution will not work in a cluster. The reason is: If you send one message to a cluster, this message will invoke the MDB on one of the nodes in the cluster. (This is the idea behind MDBs since they are business messages, not technical messages.) The cache will be updates in one JVM on that node. (A J2EE server is free to start many JVMs and sometimes also starts several class loaders.) For a cache, you need some sort of technical triggering of the refresh operation where the trigger has the same scope as the cached objects/data.
I have made one successful implementation of this with small data volumes. If your data volumes are high (that is higher than the memory of the machine you are running on), I think this solution can be modified. Unfortunately, I cannot share any code on this since I have written it in a commercial project. However, I can share some of the ideas of it it you are interested. (I have plans to write a paper on this, but time
Hi, have a nice holiday? ^^
after some thinking, I come up the design like this:
the DAO(s) WON'T be cluster because it simple don't needed
the system would have 10 DAOS, each DAO represent difference catergory of data
so DAO1 holds data of Hong Kong, DAO2 holds data of USA, DAO3 holds data of UK ....
but the structure of all DAOS are the same
then nodeA would contain DAO 1 -- 3 , nodeB contains DAO 4 --6 etc...
and each DAO would have a MDB to handle refresh
why have this design is becasue each DAO contains not so much data, but the data become huge when count on all these DAOs together.
do this still a form of cluster??
Sorry for another week of vacation :-)
The proposed form of cluster should work on most application servers, but you will loose the load balancing part of the cluster functionality.
However, you app server is still allowed to run with many JVMs and class loaders on every node, and the message will be handled by one class loader only. This means, if you have many caches in one node, only one of them will be updated. However, I don´t expect any application server to work this way although they are allowed to to it.
Good luck with the project!