Discussions

Performance and scalability: DataGrid - Gigaspaces Vs Tangosol Coherence

  1. DataGrid - Gigaspaces Vs Tangosol Coherence (11 messages)

    Hi there, At present I am evaluating both products and seeing if they might fit in our architecture. We are moving to a computational grid our batch systems, and the big challenge is to solve the single point of bottleneck SPOB the database is. I will explain the scenario and I would like to hear from people that have worked with both data grid solutions in order to decide which one might fit the best. The computational grid software decides where a task is going to run and how many instances (parallel executions) depending on the application requirements. Executions are scheduled. Every tasks will process a chunk of data, some tasks have to process 1000000 x n of records., so the payload it will be very high. The idea is to have an extremely fast data access and avoid the data have over head. There is another problem, we have online services that have to access the database and the batch processes do not have to interfere with the online performance. I have had a look at Tangosol and Gigaspaces. Tangosol says that they solve the data latency moving the data closer to the application... but I do not see it. I mean, I would like to know how can we achieve this. So when the grid computing decides that a task is going to run in n commodity computers, those task have the data closer to them and the chunk of data they need. Any ideas/comments? Thanks
  2. Talk directly to the vendors?[ Go to top ]

    Hi Alfonso, Unfortunately without a lot more information on what you are trying to achieve, including the object model, indicative access patterns, transactional requirements, hardware, networks specifications etc, it's difficult to give a quality response in a single post. While we could talk about and discuss the relative theoretical benefits/advantages of the different approaches from the various vendors, I'm not sure this forum is where you'll want to disclose your corporate architecture - especially as the solutions to such challenges are often seen as competitive advantage in the financial world. I appreciate it's sometimes a challenge to differentiate at the API and marketing levels between some of these solutions, so I can only suggest, in all seriousness, that you directly contact the vendors and have them compete head-to-head in a series of acceptance tests (designed by you) that include both typical and abnormal performance and scale-out load you expect, using real object models and processing. It sounds like a lot of work, but in the end it's only through solid proof-of-concepts, destructive and continuous soak testing will you find the platform that will satisfy your production SLA. You're obviously involved in a large architectural decision and working directly with the vendors is probably the best course of action (I assume the vendors will agree on this) Most vendors are partners with the likes of IBM, Dell, Sun, Intel, AMD, Platform and Data Synapse so providing you with access to a large production-like Grid for your testing shouldn't be a issue. I'd also recommend that you pay particular attention to how each of the solutions behave during intermittent network, garbage collection and server loss (this happens more than people think), because if you're making trading, compliance or business decisions on top of *any* platform, you'll want to know where the edge conditions are - and how much development you'll need to do to recover from them. Right? Assuming your organization approves and the vendors agree you could then publish your specific use cases, tests and results. I'm sure the community would be interested. Ultimately it's going to be down to your individual use cases and your existing architecture lock-in. Each vendor has particular ways of dealing with the sizes of data and processing you're talking about. However, from a Tangosol perspective, I generally don't see the number of objects you're talking about being an issue (depending on your hardware, network etc). Having implemented similar solutions and POCs using Coherence, it shouldn't be an issue. Additionally I'm sure Tangosol can provide you with references as to customers that are already working in production with the types of systems and scale you're thinking about. If you're seriously considering Tangosol Coherence, it's not going to be an issue. Hope this helps. -- Brian
  3. Hi Alfonso -
    The computational grid software decides where a task is going to run and how many instances (parallel executions) depending on the application requirements. Executions are scheduled. Every tasks will process a chunk of data, some tasks have to process 1000000 x n of records., so the payload it will be very high.
    I would certainly welcome a discussion about your requirements, and we may be able to suggest an architecture that would fit well to those requirements. However, the batch approach that you have suggested would not be one that we would encourage the use of Coherence itself for (although we would certainly be able to address the data management portion of your requirements). With your proposed architecture, you would be much better off looking at products from DataSynapse or Platform Computing for that type of scheduled batch work-load. I hope this helps. Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  4. Hi Alfonso
    I mean, I would like to know how can we achieve this. So when the grid computing decides that a task is going to run in n commodity computers, those task have the data closer to them and the chunk of data they need.
    There are two main approaches to achieve that: Bring the data to the process – obviously moving data is expensive and therefore probably not a good option. Bring the process to the data - This can be achieved through the master worker pattern. In this case, when you write the task the task could be routed based on a specific key on that task to the appropriate destination. You can also use annotation @SpaceRouting to mark that property that will be used to determine the location of the task. Our Space Based Spring Remoting provides an abstraction which will basically do the same thing only that the interaction with the space is done implicitly when you invoke a method on your service stub. Since the space based remoting uses the space as the underlying delivery mechanism of the method call, it will leverage the space clustering to route that method to the "right" destination. If you're already using a Grid product you could very easily apply that pattern in those environments. In this case your job wouldn't be the actual job but a command for executing the job. From that point the routing and execution will be done in the same way as outlined below. From a Grid scheduler perspective we provide the ability to control which data instance (partition) would be running. In this case the grid can be made aware of the location of the data instances. When a new job is submitted to the grid scheduler it could define the data-instance it belongs to and the grid will use that information to route it to the appropriate machine that runs that instance. Anyway I know that it is probably not that simple to understand specifically how the model works. Unfortunately there is only so much I can share on a thread based discussion. You can find more information on our blog http://www.gigaspacesblog.com/ and website. In any event I'll be happy to provide you with more specific information and a paper I'm working on these days that addresses this topic. Just send me a direct email and I'll send that information over to you. Nati S GigaSpaces
  5. Hi Alfonso,
    We are moving to a computational grid our batch systems, and the big challenge is to solve the single point of bottleneck SPOB the database is.
    This is the issue that we directly address with our partitioned caching topology, where Coherence dynamically, automatically and transparently partitions the data across all or a subset of the machines of JVMs participating in the Data Grid. Thus, providing unlimited data capacity in memory in the application tier. I say unlimited because by partitioning the data set in this fashion you can add JVM instances to the grid at any time, without changing any configuration or code, and the storage capacity of the Data Grid is increased by the size of the that JVMs heap. Think of this as "water finding its level" -- if a node is added we load balance the data evenly across to it. Coherence does this while also maintaining the fault tolerance of the data at all times. If a node fails Coherence will load balance the data (and its backups) evenly across all of the remaining nodes without any "application intervention." The Coherence Data Grid is a true peer-to-peer system in that each node within the grid is equally responsible for producing and consuming the services of the cluster including the data services. Therefore, there are no single points of failure or single points of bottleneck. Also, it is build on a finite state machine, meaning that all nodes within the grid are always aware of the state of the entire grid. This is important for one very important reason, it is the only way to guarantee data integrity with 100% certainty. There is also data source integration for reads, writes and asynchronous writes.
    Tangosol says that they solve the data latency moving the data closer to the application... but I do not see it. I mean, I would like to know how can we achieve this. So when the grid computing decides that a task is going to run in n commodity computers, those task have the data closer to them and the chunk of data they need.
    Actually, there are two approaches for solving the data latency issue. 1) By using our NearCache technology, Coherence will keep the MRU/MFU data locally in object form. This is best in an environment where there is repetitive data access. 2) Instead of incurring any data latency at all, why not send the processing to where the data is already in the data grid? This allows for massively parallel data manipulation, processing, aggregation, etc. This is extremely easy to use with a very straight forward API called the InvocableMap (JavaDoc: requires registration). Just to give you an idea of how easy it is to perform query-based parallel processing across the data grid here is an example: cache.invokeAll(Filter filter, EntryProcessor agent); This approach has been proven in both testing and production deployments to provide linear scalability of parallel processing in large scale grid deployments. Take a look at these links: Scaling Out Your Data Grid Aggregations Linearly Tangosol Announces Results of Data Grid Scalability Benchmark Effort with Intel and IBM Provide a Data Grid Hope this helps. Later, Rob Misek Tangosol Coherence: The Java Data Grid
  6. Hi Alfonso, If you would like to have a discussion with one of our architects about your data aware grid implementation, I can make the connetion for you. We can also work together with the data vendor to solve your data problem. Thank you. Jingwen Wang Platform Computing
  7. Alfonso, Global Markets Technology, Bank of America had a similar problem. We had an issue where applications in trading environments, requiring millisecond response times, found that data access was causing bottlenecks and poor performance. After an extensive evaluation of several products in this space - including Tangasol and GigaSpaces - we found GigaSpaces not only to give the best performance, but also supported additional functionality that was required by our development teams. I cannot post specifics about the testing on this site, but feel free to email me if you'd like to discuss this in detail. I hope I can be of help. Kevin
  8. Alfonso What my colleagues are pointing out data affinity, co-locating data and work is paramount in any distributed system. With out it you will never reach the full potential a cluster or grid can offer. Caching data in advance on the machines where the work is anticipated to be executed, routing work to data by taking into account data locality into the task scheduling logic, or a combination of both. There are many paths on how to get there, each vendor's product has its own way of getting there, including our own GemFire. Which is best for you, well there is no short answer. A proper analysis of your application's work and data access patterns, in combination of the environment of your grid must be done and then applied to each vendor's ability to yield the optimum result is required. Many of our clients do this, and I can site you use cases that sound similar to yours where a 40 times improvement in processing time was achieved by applying a data grid (distributed cache) product to compute grid deployment. While this may not be the typical result when applied to your use case, I would however expect a minimum of 2x performance boost. I discuss this topic extensively in my book, "Distributed Data Management for Grid Computing" (John Wiley 2005) and welcome the opportunity speak with you in further detail about your specific application. Regards Michael Di Stefano VP Architecture Financial Services GemStone Systems, Inc
  9. After an extensive evaluation of several products in this space - including Tangasol and GigaSpaces - we found GigaSpaces not only to give the best performance, but also supported additional functionality that was required by our development teams.
    Kevin, while your post violates the agreements that Bank of America has with Tangosol, I am more offended by your suggestion, which is largely contradicted by the [assumedly confidential] emails that I have from Bank of America. Nonetheless, I do wish you the best with the technology choices your group has made, and I hope that you will consider using Coherence in the future if you have technology needs that we could help to address. Please give my regards to Nathan. Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  10. Another sometimes overlooked aspect of optimizing grid performance is the delivery of computational results to consumers (both human and systems). One of our major areas of focus in GemFire that's applicable to both batch and real-time grid risk grid processing is highly efficient and tunable notifications--somewhat analogous to what you might use a message bus for, but with additional features that account for WAN distribution, slow-receivers mixed with fast receivers, network partitioning faults, and HA in-memory queues for extremely fast AND reliable event notifications in the most difficult failure boundary conditions. Please feel free to contact Michael or myself directly (gideon.low--at--gemstone.com) for more information on how this might be important to your use-case. Cheers, Gideon http://www.gemstone.com
  11. We are moving to a computational grid our batch systems, and the big challenge is to solve the single point of bottleneck SPOB the database is.
    I thought this link was also relevant:
    Got together with some ex-colleagues, who were marvelling at the London consulting market. The hot areas are Grid Computing, with the prevelant stack being DataSynapse and Tangasol. Also demand is picking up for WPF, with Morgan Stanley leading the way.
    Interestingly, one of the banks mentioned is one of the major drivers behind our "Coherence for .NET" and "Coherence for C++" products. ;-) Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  12. I highly recommend you try them all... Oracle Coherence, Gigaspaces, Hazelcast...