GemFire 5.0, distributed operational data store for scalability

Discussions

News: GemFire 5.0, distributed operational data store for scalability

  1. GemStone Systems has announced the release of GemFire 5.0 Beta 1. GemFire provides distributed data caching, transactional support, object queries, multiple language bindings, distributed messaging capabilities, and more. GemFire's cache is based on a Map, where one connects to the repository of regions through a set of properties. As a Map, it implements get() and put() semantics, as well as some custom operations related to invalidating elements and object queries. A five-node trial license lasting thirty days is available from the product web site, after registration. The final release of GemFire 5.0 is expected in October 2006.

    Threaded Messages (20)

  2. The complete press release can be found here http://www.gemstone.com/news/pr-091206.php The 5.0 feature descriptions can be found here http://developer.gemstone.com/pages/viewpage.action?pageId=1490 A downloadable version is available at http://www.gemstone.com/download/ Cheers Bharath http://www.enterprisedatafabric.com
  3. You should take a look at NCache 3.0 if you want distributed object caching for .NET. NCache is a clustered object cache that lets you boost application performance by up to 10 times. Learn more from http://www.alachisoft.com/ncache/index.html
  4. GemFire offers .NET API's[ Go to top ]

    if you want distributed object caching for .NET.
    All of the GemFire API's are also offered through C# and C++. GemFire also has a 100% pure C++ distributed object caching product with performance far above any alternative in the industry, and even provides interoperability between the native C++ and Java products. I would certainly encourage anyone to compare the features and performance of GemFire and any competing product--there are 25 years of distributed object management history behind the GemFire product line, and many of our original engineers remain as key players on our development and architecture teams. Cheers, Gideon
  5. Coherence?[ Go to top ]

    How does GemFire compare with Tangosol Coherence? Can I plug GemFire as a Hibernate cache?
  6. Re: Coherence?[ Go to top ]

    How does GemFire compare with Tangosol Coherence? Can I plug GemFire as a Hibernate cache?
    At a high level there are similarities between Coherence and GemFire when it comes to distributed caching - in-memory topologies, Map-like APIs, data partitioning, replication, in-process and out-of-process caching, etc. Our goal with GemFire (primarily driven by several leading customers on Wall St, Federal Govt, Logistics, etc.) has been to enhance the base distributed caching platform with several advanced features such as the following to differentiate us from most offerings in the market, not just Coherence. 1. Role-based reliable distribution - mapping application dependencies at the infrastructure level http://developer.gemstone.com/pages/viewpage.action?pageId=1490#What%27sNewinGemFire5.0%3F-Rolebasedreliabledatadistribution 2. A robust query language to query the distributed caches with support for joins, inner-joins, etc. Not restricted to just a variety of filters 3. Unlimited scalability through loose-coupling of multiple distributed systems. http://developer.gemstone.com/pages/viewpage.action?pageId=1490#What%27sNewinGemFire5.0%3F-Looselycoupleddistributedsystemsforunboundedscalability 4. Special logic to handle slow receivers or unresponsive caches http://developer.gemstone.com/pages/viewpage.action?pageId=1490#What%27sNewinGemFire5.0%3F-Handlingslowandunresponsiveapplications 5. Native XML data management capabilities with XPath querying and indexing http://www.gemstone.com/solutions/webservices.php 6. Fine-grained policy control - Policies such expiry, eviction, persistence, replication are all defined at a Region level, not just a cache level (In GemFire parlance, a cache consists of multiple regions). This way you have the flexibility of modeling different behavior for different data subsets even within the same node/JVM. 7. Comprehensive monitoring and tuning support via a console and visual statistics display, in addition to comprehensive JMX support. As far as integration with hibernate goes, yes we can be used in conjunction with hibernate (or any other O/R or JDO tool) for query and key caching. Here is a document that explains it http://developer.gemstone.com/display/gfedev/Using+Hibernate+with+GemFire -Cheers Bharath http://www.gemstone.com/products/gemfire/enterprise.php
  7. Re: Coherence?[ Go to top ]

    Another relevant/differentiating feature for large-scale distributed systems supported in GemFire Enterprise 5.0.. 1. Automatic handling of/recovery from network failures and network partitioning http://developer.gemstone.com/pages/viewpage.action?pageId=1490#What%27sNewinGemFire5.0%3F-Sustainabilityundernetworkfailures%2Fsegmentation
  8. You also me mention that Reliable Multicasting with UDP, how are you achieving this and recovery from network failures and network partitioning?
  9. You also me mention that Reliable Multicasting with UDP, how are you achieving this and recovery from network failures and network partitioning?
    We have two multicast protocols that both offer reliable delivery of messages. The first uses negative acknowledgement and the second uses positive acknowledgement. With negative ack, our receiving code tracks what sequence numbers it's seen and periodically finds out what other receivers have seen. Based on that, it can tell when messages have been lost and request retransmission. A flow-control algorithm ensures that receivers aren't overwhelmed and also averts retransmission storms. With positive ack, receiving code sends a small acknowledgement message for every message received. The message sender blocks until it receives all acks or knows that a missing ack is from a process that is no longer a member of the system. The message is retransmitted point-to-point periodically if the wait goes on too long. There are two selectable failure detection algorithms. The first uses tcp/ip socket connections and periodic heartbeats in a ring configuration to detect failures. The second uses UDP datagram heartbeats. The datagram algorithm is recommended in configurations that might suffer network problems because tcp/ip sockets are known to hang for long periods (OS level keepAliveTimeout setting) on network failure. The most interesting thing we have introduced to manage reliability is the use of declarative roles where individual members in a distributed system can declare the roles they are playing and publishing members declare the roles they need to ensure reliability and consistency. The system ensures that messages can always be delivered to roles declared by publishers. A policy configuration controls how a publisher will behave when a declared role is missing. For instance, in a distributed cache deployment, you might designate two member to asynchronously send all updates to the database. They could play a role, say, "DB writer". The other members simply publish to the cache and don't want to worry about their updates making to the database. So, even if the network partitions in such a way that the publishing node always sees atleast one "DB writer" it is works without a hitch. If both members playing "DB writer" are out, then a policy configuration dictates if the cache should go into a read-only mode, continue as-if nothing happened, shudown the member or start queuing events for later delivery. Policy also dictates how members merge back into the system. So, for instance, members can be configured to attempt reconnecting to the distributed system periodically. If and when they can ensure that critical members are visible ( are all declared roles I need visible?), then, the member automatically re-initializes its local state to become consistent with the rest of the system and notifies the application. Cheers! -- Jags Ramnarayan
  10. I like the idea of publishers and non-publishers, but it might not work in all situations, for example HTTP session replication where anyone can be publisher. Reading your post, it seems like you might be using JGroups to provide some of the functionality (i.e failure detection protocols, different acknowledments (which provide the reliability to UDP)), am I correct?
  11. I like the idea of publishers and non-publishers, but it might not work in all situations, for example HTTP session replication where anyone can be publisher.

    Reading your post, it seems like you might be using JGroups to provide some of the functionality (i.e failure detection protocols, different acknowledments (which provide the reliability to UDP)), am I correct?
    The whole design around declarative roles is to make sure that true reliable messaging semantics are naturally incorporated as part of the distributed data management framework. Almost 9 out of ten times our customers will use cache listeners to recieve notifications and take action. Unpredictable change in membership of the system has undesirable consequences. Declarative roles are designed for multiple publishers and multiple consumers. Whether it is applicable for HTTP session replication will depend on the implementation. If, say, you designate nodes to be backup nodes for sessions then every publisher in the cluster can ensure the replication happens to the backup "role" at all times. Having said this, this is really designed for messaging scenarios - a new paradigm if you may. Operate on a object domain model, make changes to objects and relationships and subscribers are notified on these changes. No need to create explicit messages, incorporate all the contextual information as part of the message or talk to a database before taking action. And, upon receiving a notification, all related data you need to make decisions is available where you need it with data consistency guaranteed. The end result is very fast messaging with fewer resources being utilized. You are right about JGroups. Some of our engineers like the flexible and extensible protocol stack offered by JGroups. I believe, we do use some of the failure detection protocols built into it. Cheers! -- Jags ramnarayan
  12. Re: Coherence?[ Go to top ]

    How does GemFire compare with Tangosol Coherence?
    I have never seen GemFire, so I can't personally describe any pros and cons in reference to it. However, it is clear that the various products in this space each have their own "genetic roots". In the case of Gemstone, they have been in business for 25 years or so, and they focused on a Smalltalk object database (Object Oriented Database Management System, or OODBMS), and I have heard that the Smalltalk product is pretty good. AFAIK, Gemstone ported their object database to C++ in the 1990s, and used that as the basis for an interesting J2EE server back in 1999 or so, which in turn evolved into Gemfire. Tangosol Coherence was designed to be as easy to install, develop with and deploy as possible. It was designed from the ground up to run _within_ Java application server environments, even embedded within a J2EE application, and (as a side-effect of that requirement) uses a true peer-to-peer architecture, and is pure Java. To deploy Coherence as part of an application, it is as simple as dropping the Coherence library into the application WAR/EAR file. Coherence also is widely used, with over a thousand production applications, and in production at hundreds of direct customers including most of the world's major financial, telecom, insurance, logistics, travel and e-commerce companies. Although I cannot compare the products directly, I'd be glad to put you in touch with customers who have. Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  13. Re: Coherence?[ Go to top ]

    How does GemFire compare with Tangosol Coherence?


    I have never seen GemFire, so I can't personally describe any pros and cons in reference to it.

    However, it is clear that the various products in this space each have their own "genetic roots". In the case of Gemstone, they have been in business for 25 years or so, and they focused on a Smalltalk object database (Object Oriented Database Management System, or OODBMS), and I have heard that the Smalltalk product is pretty good. AFAIK, Gemstone ported their object database to C++ in the 1990s, and used that as the basis for an interesting J2EE server back in 1999 or so, which in turn evolved into Gemfire.

    Cameron, In the interest of keeping this thread factually accurate, I would suggest you stick to just Tangosol/Coherence marketing and not traffic into gemStone's pedigree or the origins of GemFire (which BTW was not a transformation from our app server - but was built from ground-up using constructs and principles from our 25 years of technology heritage). -cheers, Bharath htttp://www.gemstone.com
  14. Re: Coherence?[ Go to top ]

    AFAIK, Gemstone ported their object database to C++ in the 1990s, and used that as the basis for an interesting J2EE server back in 1999 or so, which in turn evolved into Gemfire.



    Cameron,
    In the interest of keeping this thread factually accurate, I would suggest you stick to just Tangosol/Coherence marketing and not traffic into gemStone's pedigree or the origins of GemFire (which BTW was not a transformation from our app server - but was built from ground-up using constructs and principles from our 25 years of technology heritage).

    -cheers,
    Bharath
    htttp://www.gemstone.com
    You're both egotistical jerks. :) Make your software and we'll decide if you deserve praise. Coherence is nice and way overpriced. GemStone is nice and way overpriced. BTW, distributed Maps, no matter how many partitioning features you provide, aren't really that awesome. You're only the first few to market and it's not long that open source will eat the both of you alive. Just... FYI...
  15. Make your software and we'll decide if you deserve praise.

    Coherence is nice and way overpriced.

    GemStone is nice and way overpriced.

    BTW, distributed Maps, no matter how many partitioning features you provide, aren't really that awesome. You're only the first few to market and it's not long that open source will eat the both of you alive. Just... FYI...
    Christopher, Well, I guess somebody always has to bring the conversation down a level. Egos and opinions are like . . . . what do they say, "Everybody's got one?" ;-) These folks work 80 hour weeks for years building cutting-edge technology that's extremely useful to a large segment of the Java community, so maybe we should give them a little credit, you think? Open source solutions are great for the technology world as they reduce costs and lower the barrier to entry for the use of common infrastructure requirements. Contrary to your assertions, however, they will not "eat us alive." The energy, focus, talent, and organization required to push the leading edge of technological innovation is clearly in the vendor camp when it comes to Data Fabric technology. Properly combining the operational data management capabilities of distributed caching, relational or object databases, middleware messaging, and continuous querying is a hugely complex task. After years of work by a large and cohesive team, this dream is finally becoming a reality with GemFire 5.0. A good (though I believe ultimately less broad and complex) comparison might be relational database technology. There are great open-source solutions commonly in use, but the major database vendors aren't exactly hurting for cash, are they? Cheers, Gideon GemFire--The Enterprise Data Fabric http://www.gemstone.com
  16. Re: Coherence?[ Go to top ]

    In the interest of keeping this thread factually accurate, I would suggest you stick to just Tangosol/Coherence marketing and not traffic into gemStone's pedigree or the origins of GemFire
    If I knew that being so nice was going to elicit such a response from you guys, I would have tried it sooner. ;-) I also don't mind the increased awareness and attention that this market is receiving, and I assume that part of the attention we get is caused by companies like yours and all the other entrants beating their drums. The amount of interest in what we do seems to be going off the scale, and we continue to grow more than double every year, so keep it up! Peace, Cameron Purdy Tangosol Coherence: Clustered Shared Memory for Java
  17. In the case of Gemstone, they have been in business for 25 years or so, and they focused on a Smalltalk object database (Object Oriented Database Management System, or OODBMS), and I have heard that the Smalltalk product is pretty good. AFAIK, Gemstone ported their object database to C++ in the 1990s, and used that as the basis for an interesting J2EE server back in 1999 or so, which in turn evolved into Gemfire
    Now, taking a J2EE server and evolving this into a distributed data management platform would be rather difficult, wouldn't it? All we did was to use a lot of experience in distributed computing algorithms, strong focus on creating rugged system software and applied it to GemFire.
    Tangosol Coherence was designed to be as easy to install, develop with and deploy as possible. It was designed from the ground up to run _within_ Java application server environments, even embedded within a J2EE application, and (as a side-effect of that requirement) uses a true peer-to-peer architecture, and is pure Java. To deploy Coherence as part of an application, it is as simple as dropping the Coherence library into the application WAR/EAR file
    Same here with GemFire. Deploy into any JEE application simply with a dependency on a JAR, eclipse plugins, automatic participation in JEE server coordinated transactions thru JTA complaince, JMX Mbeans that can be directly exposed through management consoles in any popular JEE vendor implementations. Anyone seriously considering a scalable distributed data management product has to think about the difficult part of a getting it right in a distributed system - what happens in the face of network partitions; when data is distributed to 100 nodes, how do you manage consistency of data?, when someone uses cache notifications across a cluster what techniques do you employ to detect problems? Now, this is one area where GemStone did leverage what we have built and supported for more than decade - a visual analysis tool that can monitor and chart real-time statistics across many distributed nodes to understand the cache behaviour, CPU utilization, application stats, etc. Customers who need to think beyond just perf boost from the cache, have to pay a lot of attention to operational readiness.
    Coherence also is widely used, with over a thousand production applications, and in production at hundreds of direct customers including most of the world's major financial, telecom, insurance, logistics, travel and e-commerce companies.

    Although I cannot compare the products directly, I'd be glad to put you in touch with customers who have.
    This is a technical forum where folks can see the capabilities and uniqueness behind products and technologies. Let us keep it such. Ofcourse, anyone serious should initiate discussions with vendors and battle it out. Serverside is no place for this, no? Cheers! -- Jags Ramnarayan
  18. Re: Coherence?[ Go to top ]

    How does GemFire compare with Tangosol Coherence? Can I plug GemFire as a Hibernate cache?
    Can you plug GemFire as a hibernate cahce? Don't see why not, though you would need to do some work to Hibernate to add support for Gemfire as a hibernate cache as I don't believe it is a supported out of the box cache. Since I have worked with both products, and work for neither Gemfire nor Tangasol I would like to offer the following light weight comparison: Tangosol Tangosol's main differentiator is its ability to split data accross multiple caches using a very clever hashing algorithm along with its ability to provide a configurable guarantee on availability. The upshot of this is that you can store a lot more data in memory than you can place in a single JVM. This is done by distributing data accross multiple cache servers, duplicate copies are then additionally stored on backup nodes. This is very similar to the concept of a RAID 5 disk array. The idea is that if one cache server were to fail, you would not loose any data. The number of failures can be configured. Additionally when you commit an item to the cache you are assured the cache as a whole (including replica's etc.) are kept 100% coherent. These two core concepts make Tangosol a suitable replacement for a database (you can still trickle feed a database for long term persistence). This can increase a data centric applications performance easily 10 - 100 times depending on the exact nature of the application. The Tangosol's API is extremely simple to pick up and use and is extremely compact. Gemfire Gemfire's approach has been to focus on providing more control over how the cache is partitioned to the developer. The upshot of this is you have a lot of controll over how the cahce is structured, the nature of the caching, reliability, etc. The downside is the API set is large and can be complicated to use, especially when starting out on your first development using the tool. It is also very easy to do things wrong. My first attempt at using the Gemfire API produced a poor perfoming result. However, after using the API for a while and once you get the concepts properly you can make Gemfire sit up and do tricks. I have not yet worked with the new release of the Gemfire product, though I believe it has some features that will see Gemfire taking on more of the messaging space as well. A lot of gemfire's focus is on the performance and latency of event distribution as well as providing cache solution is languages such as C/C++, C#, as well as Java. Gemfire provide a shared memory cache solution as well which can provide shared memory solutions to C/C++ and Java applications (at the same time). Both products are excellent pieces of engineering and are backed by helpful teams of people. Both products provide (at a high level) the same basic functionality of object caching. Choosing between either is a matter of identifying the requirements of your organisation and then evaluate both products in a realistic POC scenario. Make sure to involve the product engineers to make sure you don't use the product inappropriately. Lastly there was another note in one of the threads about open source killing the market place. To this I would like to note the following: Open souce can be very powerful, but one should never under-estimate the amount of effort and energy that is required not only to develop a good technical solution, but to support and market these products. It has taken about 15 years for Linux to make any serious in-roads to corporates and even then only when the 'free' aspects of the product are removed. Using Linux in most large organisations can often encur more cost than a Solaris solution due to approach these corporates take to doing things. The area where open source software really makes a difference is the SME and that is a space that neither Tangosol nor Gemstone are playing in (at least from my perspective - Jags, Cameron feel welcome to correct me if I am wrong).
  19. GemFire & Partitioned Caching, etc.[ Go to top ]

    split data accross multiple caches using a very clever hashing algorithm along with its ability to provide a configurable guarantee on availability. The upshot of this is that you can store a lot more data in memory than you can place in a single JVM.
    Howard, Thanks for the overall kind words on GemFire! I just wanted to point out that GemFire does indeed provide nearly the exact same functionality you attribute above to Tangosol. What you state is a common misconception in the larger community due to the fact that Tangosol has placed much more focus on marketing their partitioned cache, while GemStone has been hard at work on a much larger variety of enterprise class functional capabilities such. As you properly mention, GemFire's API is indeed more expansive than what I've heard about Tangosol's, but the reason is (for the most part) because there are many more options available for a more diverse set of use-cases. One example is the creation of new data in our "Region" interface rather than just a Map. With Region, you have the choice of using the create() method instead of put(), which will throw an EntryExistsException if the key already exists anywhere in the distributed system and thus provides you with better data integrity protection. Another problem with Map is that its contract stipulates that a put() to update an Entry must return the old value. What if that value has been paged out to disk through our capacity control overflow manager? Would you really want to incur the cost of retrieving the old value from disk when you have no intention of using it? Nevertheless, with GemFire release 5.0 we have endeavored to give developers the choice of a lighter Map-based API, which should now allow developers to more easily retrofit GemFire into existing projects through dependency-injection or other means. Cheers! Gideon GemFire--The Enterprise Data Fabric http://www.gemstone.com
  20. Re: GemFire & Partitioned Caching, etc.[ Go to top ]

    GemFire's API is indeed more expansive than what I've heard about Tangosol's, but the reason is (for the most part) because there are many more options available for a more diverse set of use-cases. One example is the creation of new data in our "Region" interface rather than just a Map. With Region, you have the choice of using the create() method instead of put(), which will throw an EntryExistsException if the key already exists anywhere in the distributed system and thus provides you with better data integrity protection.

    Another problem with Map is that its contract stipulates that a put() to update an Entry must return the old value. What if that value has been paged out to disk through our capacity control overflow manager? Would you really want to incur the cost of retrieving the old value from disk when you have no intention of using it?
    Gideon, you are sooooooo 2002 ;-) Coherence provides all those capabilities and many more, yet without a convoluted API, allowing the application developer to define any number of API points using agents, carrying application-specified information to and from any point within the data grid, and carrying processing functionality to the most efficient points (including in parallel) that it belongs. Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  21. For those who have no money[ Go to top ]

    Occasionally you see startups with no money and exceptional application requirements, e.g. hotmail consisted of two guys who set up mail servers for six million users. If you are in such a situation and you need to build a special-purpose oltp system with high uptime and low latency you can use my Apache-licensed implementation of the totem single ring membership and ordering protocol: http://www.bway.net/~lichtner/evs4j.html Highlights: - Fast, fast, fast. - Predictable throughput and latency. - Ideal for fully replicated, low-latency applications. - Total ordering and optional safe delivery of messages makes programming model extremely simple. - Already used in at least one commercial product. Guglielmo