Performance and scalability: Distributed object cache
We are building a system where almost all the beans will follow a read-mostly pattern that is reads will take place 100's if not 1000's of times more often than writes will. Currently we are developing on Weblogic 5.1 and will most likely be deploying on Weblogic 6 in a clustered environment a number months from now. We have come to the problem that Weblogic does manage a distributed object cache like we had previously believed it did. I've heard that there are some products on the market that will provide a plugable distributed cache (Object Design's Javlin and Secant Extreme Persistent Object Services) The docs and all the information I could find on these products was very sparse so I was wondering if anyone else had experience will deploying these along with Weblogic or may know of other similar products on the market.
Our second option is to implement a combination of read and read-write beans since read beans are cached by Weblogic in a clustered system. Any reads that did not need to be involved in a transaction would take place through the read beans while the updates would be done through the read-write beans. Instead of using the build in method of updating the read beans at a specific time interval when a write was made to a read-write bean it would send a message to all the read beans to reload on the next method call. The messaging could either be done through JMS which will probably prove to be too heavyweight for this application or by implementing our own very lightweight messaging service specifically for doing this. Is there anybody else out there that's implemented something similar or has any thoughts on this design.
- Distributed object cache by Andrew Johnson on November 20 2000 05:45 EST
- Distributed object cache by mickey hsieh on November 23 2000 19:35 EST
- WebLogic distributed cache? by carl sjoquist on June 02 2003 17:16 EDT
- Distributed object cache by Gad Barnea on June 14 2005 11:54 EDT
- Read-Mostly Pattern by Owen Taylor on June 14 2005 12:37 EDT
- Check out GemFire by Vincent Lee on July 03 2005 13:32 EDT
- Tangosol Distributed object cache by Cameron Purdy on November 03 2006 06:11 EST
I would be wary of creating two types of beans to represent the same entity: I think you will be storing up maintenance problems for the future.
In fact this problem sounds like a classic for the use of BMP. If you write the code to grab the data you can effectively control when that is done. Keep a 'dirty' flag with each bean so that when an update is issued by the server your code can decide whether or not the update should actually occur. There will still be some transaction overhead involved (because the server will still request load/writes) but its impact on performance should be minimal as your code will filter these out. I would expect that network latency would cancel out any perceived performance loss at the client.
In a clustered environment the issue is ensuring that when a bean is marked dirty all instances of it in the cluster know. JMS messaging is a solution but there is the possibility of synchronisation issues raising their head because of the inherently asynchronous way this occurs - if a bean changes and sends a message onto the queue, another client could read the old data from elsewhere in the cluster before that message is processed. An alternative is to use JNDI to hold 'version' data for a bean (as a singleton stateless session bean?). There is still a chance for synchronisation problems but these should be smaller than messaging: in either case you would need to judge the importance and implications of this issue in your system. Be careful of contention problems with singletons: use one per bean type or group.
Just some ideas,
In a NON clustered environment Weblogic allows it to specify that the database is not shared so the EJB can always be assumed to be up to data and it will not call ejbLoad on every method call.
In a clustered environment this is what were planning as a possible solution.
1. Create read-write and read EJB's for the entities. We must have both because in a clustered environment no matter what options are set read-write will always call ejbLoad at the start of a transaction (and all method calls must be in a transaction according to the Weblogic implementation). Updates will be made through the read-write EJB's and non transactional reads will be made through the read EJB's.
2. The read-write entities will send out messages through either JMS or an alternative messaging system at the end of ejbStore.
3. The messages will mark the other EJB's in the cluster as dirty so they will reload from the DB when a method is called on them.
Keep a 'dirty' flag with each bean
How do we implement this 'dirty' flag for a BMP. If i understand this correctly, there can be multiple Active EntityBeans at the same time for multiple requests.
If one EntityBean has updated a set of records ( since BMP) how will the others know of the change.
a. There needs to be some kind of messaging/callback involved for the EntityBean to call on all active EntityBeans to set their 'dirty' flag.
b. If so then how do we know how many EntityBeans exist and how to invoke them.
<Note/>Excuse me, am a novice, so any info would be encouraging. Thanks.
Has anyone worked with SolarMetric JDO in a distributed environment? Especially w/ the Tangusol distributed cache product?
Well, I have. Of course, I work at SolarMetric and was involved in the integration of Kodo and Tangosol.
Do you have any particular questions about how the two products interact?
Regarding the initial topic of this thread -- Tangosol's Coherence product is designed to do exactly what Matt described in the first part of his original post, among other things.
Two products will be handle your problems.
2. Persitence Power Tier
Both two product support object cache and cluster sysnchonization.
It alose support concurrent access to share object ( not serialize) and tranaction view of Object
And what about CocoBase?
I know it supports caching and since they use their own server, I though it might apply to distributed environments...
You mention a distributed cache managed by WebLogic. Is that the same kind of thing that you'd get in a Tangosol distributed cache? Can you point me at any details on WebLogic's implementation - I haven't been able to find anything - only references to other caching products in conjunction with WLS.
There is no distributed cache built into WebLogic.
For certain types of caching you can use read-only or read-mostly entity beans, but that's for non-transactional (the data can be dirty) entity EJB caching.
Coherence: Easily share live data across a cluster!
Here's a good article on BEA dev2dev about caching using WebLogic sessions.
Tangosol Coherence: Clustered Shared Memory for Java
As Cameron said WebLogic does not provide distributed caching. We have 2 whitepapers on our website (http://www.gigaspaces.com/whitepaper.htm) which can give you an idea on what to consider when choosing a distributed cache solution - one regarding distributed caching in general and the other on integration with WebLogic.
Gad Barnea (GigaSpaces, Inc.)
A couple of years ago I wrote an xdoclet task that produces read-only and read-write CMP Beans that are deployed in parallel. It seemed elegant at the time because I was able to reuse the code contained in the actual beans and simply wrap it with the specific weblogic stubs/skeletons that handled the transactional (or not) interactions with the outside world. In addition to different wrapper code, they also had different names so that client code could distinguish between them.
This works great (extremely fast read-only beans) -unless you ever want to update based on the data read from the read-only bean.
Using a versioned DTO along with the read-mostly design can help to formalize addressing this sticky situation, but you end up trying to make a silk purse out of a sow's ear.
I honestly wouldn't recommend going in that direction as it has proven itself to me to be problematic:
Things to be aware of with the Read-Mostly CMP include:
1) vendor and version-specific solution
2) CMP beans are limited in what they offer (relational/Object mapping/queries, etc)
3) Maintenance headaches (you need distinct xml descriptor tags for both sets of the beans)
4) Hard to test as you are forced to use the full EJB stack everytime you want data
7) Error prone: looking up the wrong name in JNDI
I personally believe that at this time your choices are much richer and you should have no trouble finding a wealth of options. Is this a situation where you will have a choice or do you have to use EJB?
If you have a choice - check out some of the newer options listed in response to your query.
For distributed caching as well as data virtualization. Check out GemFire from GemStone Systems.
GemFire was selected at one of the largest Broker dealers, Bear Stearns in NY. Incidentally, they beat out GigaSpaces. Giga was dropping packets.
GemFire was selected [..] they beat out GigaSpaces. Giga was dropping packets.
I take it this means you no longer work for Gigaspaces?
Tangosol Coherence: Clustered Shared Memory for Java
Some additional links: Cluster your objects and your data using Tangosol Coherence Provide a Tangosol Coherence Queryable Data Fabric Provide a Tangosol Coherence Data Grid Real Time Desktop Peace, Cameron Purdy Tangosol Coherence: Java Data Grid and Clustered Cache
Check out <a href=www.alachisoft.com/download.html>NCache</a> - a distributed cache for java. NCache allows you to use different cache topology that suites your need. It has many important features like expiration, eviction, db sync and more.
Try NCache for free with a totally free edition NCache Express or you may download a full-feature version to try for 2 months.
Download from http://www.alachisoft.com