Performance and scalability: DBMS clustering for J2EE applications
- Posted by: Claudio Morgia
- Posted on: June 18 2003 05:08 EDT
I'm facing a problem trying to architect a complex clustered J2EE application.
I think that I understood the complexities around the Web tier and the EJB tier and their clustering issues but it's not clear to me how to implement a DBMS cluster that could be able to:
- maintain a distributed caching layer for the EJBs,
- maintain coherence between caches (as a layer) and DBMS (as another layer), maybe using some synchronization/replication feature
I don't know if such a solution already exists but I know that on any book and article I've read, forum I've seen, the DBMS layer is always composed of a single node so the clustering is always implemented using proprietary solutions.
Please, help me!
- DBMS clustering for J2EE applications by Cameron Purdy on June 18 2003 10:44 EDT
- DBMS clustering for J2EE applications by Jason McKerr on June 18 2003 12:09 EDT
- DBMS clustering for J2EE applications by SAF . on June 24 2003 15:40 EDT
For clustering the database, you need to look at Oracle RAC.
For the app tier, see Tangosol Coherence. It provides true coherent clustered caches for J2EE.
Coherence: Easily share live data across a cluster!
Thank you Cameron for your answer,
I've looked around at the Oracle site (and others) but my question is sligthly different: there's a reason why I should design (or not) a DBMS cluster behind an application cluster (maybe using Coherence) ?
Another question is: I've looked at Tangosol site and it seems to me that Coherence is a pretty nice system and it provides for a very good distributed cache but how can I use it with a layer of CMP Entity beans without having to transform them to BMP?
Maybe I'm missing some part of the picture ...
Thank you again,
> I've looked around at the Oracle site (and others) but my question is sligthly different: there's a reason why I should design (or not) a DBMS cluster behind an application cluster (maybe using Coherence) ?
There are a number of reasons you would (or would not) cluster a database.
Most often they have to do with uptime, failover and physical proximity for high-volume network access. The typical challenges (and tradeoffs) there are along the axis of data ownership (how many 'locations' can update the data at the same time), data concurrency (whether all updates are real-time and if not how often they are synchronized), scalable performance (how long each operation will take and how it degrades under load) and last but not least - cost (hardware, licensing and bandwidth).
> Another question is: I've looked at Tangosol site and it seems to me that Coherence is a pretty nice system and it provides for a very good distributed cache but how can I use it with a layer of CMP Entity beans without having to transform them to BMP?
You are correct in your conclusions with regard to needing to use BMPs to access the cache. While we have automatic CMP integration in our plans, that is not something available in the current release. In the meantime, if you wanted automatic database integration and mapping (and at the risk of starting a flame storm here with my suggestion ;-) with clustered distributed caching, there are some JDO vendors (Solarmetric KODO JDO for example) that support Coherence distributed caching in their products.
Coherence: Cluster your Work. Work your Cluster.
You might try "C-JDBC" from ObjectWeb.org. I don't know much about it, nor have I tried it. But it looks pretty cool, and I've been meaning to check it out.
Northwest Alliance for Computational Science and Engineering
I'm not sure what your system requirements are with regards to cost and time constraints, but I can tell you right now that, according to what I have read, implementing a database cluster requires a great deal of time and money, and also may require a group of specially-trained DBA's who are familiar with the configuration and monitoring of a cluster.
It is important to understand the fundamental differences between the types of clustering technology offered by major players in the DBMS world, specifically, IBM and Oracle.
IBM uses a "shared nothing" approach, where the data is partitioned equally across each physical server in the cluster. A master server determines the task to be serviced and delegates the work to the server that holds the data related to the task. The downside to this option is the potential failure of a server in the cluster, which ultimately makes the data on that partition unavailable. IBM's technology is tightly integrated with IBM mainframe (no surprise), so if your company maintains a vast amount of EI on an IBM mainframe, this could be a viable option.
Oracle uses a different approach called "shared disk", where all physical servers participating in the cluster have equal access to the data. The processing work is simply divided amongst the servers, not the data, and if a server fails, the remaining servers in the clusters can pick up the load.
If your intentions are for high-availability and fault-tolerance on your DBMS tier, the next best alternative to database clustering is database replication. Database replication methods are well-known and they are far less less costly and quicker to adopt than a full-scale clustered solution.
Database clusters are not without their drawbacks. Despite the fact that clustering offers high-availabilty, keep in mind that physical servers participating in a cluster are required to be in close proximity to each other because they communicate using SCSI cables, and if your data center is destroyed by a fire or some other event, your "availability" just took a crap; not a very robust solution in my opinion. In contract, a database replication solution can be configured across a LAN or WAN, which promotes the isolation of physical servers and increases system availability in the event of some failure or catastrophe.
You may want to weigh your options when considering these alternatives in your architecture.
Thank you Raffi,
from your discussion I can guess that what I could probably need is a sort of database replication system.
Currently, I'm using a DB2 based system as a backend for a Websphere application and I would like to expand the appliation to a clustered environment.
My idea would be to use a database replication system and use that system as the single entry point.
Maybe you can point me to a replication system, possibly opensource.
Thank you again,
I share a similar problem as Claudio's and I'd really appreciate if you may direct me to some avaible database replication systems that may fit the need of medium size corporation.