Why should you combine Reliable Messaging with Distributed Caching?

Java Development News:

Why should you combine Reliable Messaging with Distributed Caching?

By Jags Ramnarayan

01 Apr 2008 | TheServerSide.com

In Event driven architectures, applications react to events being sent to them - most of which originate internally from other closely cooperating applications. Event distribution is performed using messaging systems such as JMS servers, MicroSoft MSMQ, IBM MQ Series, Tibco Rv, etc (these messaging platforms are also the underpinnings for most SOA/ESB platforms). At the very core of these messaging systems, they enable sharing of data with other applications by pushing events to other distributed processes through an asynchronous delivery channel. The sender is not affected either by the availability or the speed at which the receiver operates.

When you have several closely cooperating applications, what they typically want is sharing of information along with events. For instance, the application that takes in customer trade orders notifies the application that r outes the order to a trading exchange on the arrival of the order, but also has to share related information such as customer credit or delivery information. The common architectural pattern is to combine traditional messaging, used to notify events between applications with common databases to share the required contextual information.

There are numerous challenges/issues with this approach. Some are listed below:

  • Access related data for each incoming event from an existing centralized database and your application only can consume events at the rate the database can handle accesses. This ultimately causes either the sender to throttle or the messaging queues to build up dramatically when the message rates are high or bursty.
  • When messages carry identifiers for data that is stored in a database, the receiving application may not be able to interpret the message properly. By the time the message arrives, the related information referred to by the identifiers may have actually changed in the underlying database. Essentially, the loss of data consistency can result in wrong decisions.
  • The lack of message ordering across destinations can cause exception conditions. For instance, JMS does not define order of message receipt across destinations or across a destination's messages sent from multiple sessions. For some applications ordering can be crucial particularly if a customer submits a purchase order version 1, then amends it and sends version 2. In this case, you don't want to process the first version last (so that you loose the update). To avoid this situation, you have to build in sufficient intelligence to deal with such conditions in your application.
  • The combination of message creation, interpretation, fetching related data from a relational database, etc can be quite cumbersome and prone to problems. Programmers often have to deal with items like correlation identifiers, serial number, headers, converting object data and relationships to some neutral form such as XML, etc

Therefore, "closely cooperating" applications are better enabled by a new class of middleware data management infrastructure offering - technology that manages data in distributed main memory and combines this with the event management semantics found in high performance messaging. We refer to this class of middleware software as the 'data fabric' (also referred to as Data Grid).

Essentially, the data fabric manages large quantities of fast moving data along with steady state or historical information primarily in distributed main memory. Applications can register interest in what looks like a database rather than on messaging destinations (topics) and the applications are guaranteed to receive asynchronous notifications with the same reliability semantics like a messaging system. This allows applications to provide complex and sophisticated interest registrations like through the use of a complex query expression that joins multiple domain data types. Depending on the application usage, the data fabric can be configured to make multiple copies of data resident close to the application that is consuming it or spread the data across many nodes or a combination of the two. When data is updated it merely calculates the "delta" - how the data compared to what is already stored has changed and merely ships the change to consumers dramatically saving on distribution costs that would otherwise be incurred. As the publisher and consumer both operate on a data model that enforces constraints and relationships, data consistency is always preserved.

We would like to provide the motivation for architects and developers alike to take a deeper look at a new data/event distribution development model – one that is information centric instead of being message centric.