Discussions

Performance and scalability: Information Centric vs Message Centric architecture

  1. Information Centric vs Message Centric architecture (1 messages)

    I have been reading over the thread posted by John Davies about the usage of Data Fabric and messaging/middleware technologies at investment banks (http://www.theserverside.com/news/thread.tss?thread_id=42563#220125). As a financial industry architect who has struggled for many years with both the operational and development pitfalls of MOM, database performance constraints, and the difficulties of integrating relational and object models for different end-users, I am starting to see a new paradigm emerging that targets some of these issues. With several projects at GemStone, we are starting to see a strong shift towards a much more information-centric architecture rather than a message-centric architecture. As you may know, the various Data Fabric vendors—whom John finds so adept at buying rounds at the pub--all have the ability to let you register for event notification callbacks when data is created, modified, or destroyed. The potential of this model has been stymied until recently due to inadequate notification delivery guarantees, particularly in failure/failover scenarios. An information-centric architecture must go beyond the simple idea of cache coherence--this simply isn't good enough anymore for demanding use-cases. For data fabric technology to become more useful, you have to guarantee not just cache consistency, but that each and every delta to the cache is also propagated to all critically interested applications. GemFire 5.0 now incorporates strong event notification guarantees, including store-and-forward event queues and the concept of durable subscribers, so that even an application that fails to start before transaction activity begins is still assured of later receiving all relevant events. So, instead of the older model of performing a transaction and then packaging a message to notify other systems, the new model is to either join a cache server cluster or become a client to a cluster, and then register interest in notifications by one of these common methods: (a) being a full mirror, (b) registering for a set of keys of interest, (c) evolving your "interest list" naturally based on what you have previously accessed or created (via get(), put(), or create()), or (d) through simple or complex filter expressions. The owner of any particular piece of data within the cache server cluster automatically becomes responsible for tracking who is interested in what data, pushing the update notifications, and tracking notification completion status, while the backup owner of the data automatically mirrors all relevant interest expression and delivery queues to provide fast and seamless failover. The picture that emerges turns the old model of data storage combined with messaging middleware inside-out. Messaging/Notifications become intrinsic to the duties of the operational datastore, which is quite natural since it is already the logical hub of data activity. This is conceptually similar to using database triggers and built-in database queues to push data updates to interested parties directly from the source, except much, much more efficient due to a data fabric’s distributed nature, much easier to configure, and much easier to code and maintain. For most of our projects, customers are still reluctant to eliminate the relational database as a long-term storage and archival mechanism. After all, the analytics and compliance ecosystems that exist around RDBMS are massive and quite mature. Because of this, guaranteed database write-behind queues have become critical technology in many trading system projects—permitting both the operational efficiency of a data fabric while still assuring that the RDBMS eventually catches-up. Cheers, Gideon GemFire--The Enterprise Data Fabric http://www.gemstone.com
  2. .. customers are still reluctant to eliminate the relational database as a long-term storage and archival mechanism. After all, the analytics and compliance ecosystems that exist around RDBMS are massive and quite mature. Because of this, guaranteed database write-behind queues have become critical technology in many trading system projects—permitting both the operational efficiency of a data fabric while still assuring that the RDBMS eventually catches-up.
    Exactly. Trading, order management, execution / fill, reconciliation ... anything that has high data rates. For more information, see: Cluster your objects and your data Provide a Queryable Data Fabric Read-Through, Write-Through, Refresh-Ahead and Write-Behind Caching Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid