Discussions

News: In memory data grid (IMDG) vendors chase 'big data'

  1. In memory data grid (IMDG) vendors chase 'big data' (13 messages)

    There's a new batch of IMDG products for big data out, according to SearchSOA.com. A recent article sites ScaleOutSoftware, GigaSpaces and Terracotta as major players. Other vendors are also hitting this space, like Oracle and GridGain. There are also open source options (Hazelcast and Ehcache, maybe more).

    Each of SearchSOA.com's big three for big data have their own slant. ScaleOut provides a "virtual" data grid. GigaSpaces touts not only cloud support (noting their own PaaS offering), but also event processing support. Terracotta boasts tenfold improved capacity and the ability to scale linearly with added servers. The article provides much more detail.

    IMDGs are not a completely new concept. Faithful readers of TheServerSide.com may remember a thread on defining in-memory data grids and their use cases. Paul Colmer, a technology consultant with Computer Sciences Corporation (CSC), said there were only three products in the market at that time that he considered true IMDGs. These included IBM, Terracotta and JBoss. Helpful commenters also mentioned Ehcache and Tibco ActiveSpaces as possible options.

    Threaded Messages (13)

  2. In memory data grids (IMDG)[ Go to top ]

    IMDGs are not a completely new concept. Faithful readers of TheServerSide.com may remember a thread on defining in-memory data grids and their use cases.

    There's also an older thread defining In Memory Data Grids: http://www.theserverside.com/news/thread.tss?thread_id=48114

    Paul Colmer, a technology consultant with Computer Sciences Corporation (CSC), said there were only three products in the market at that time that he considered true IMDGs. These included IBM, Terracotta and JBoss.

    That article was a bit confusing -- and it appears to have confused you ;-). Paul listed six products (including Oracle Coherence) that he considered to be In Memory Data Grids, and then listed out three new-comers (the ones you mention above).

    He did miss a few vendors (I'm sure Nikita will point that out ;-), and Terracotta is not a data grid nor is ehCache, although they're moving in that direction.

    Regarding this news item, it is fair to say that a number of the solutions in this space are being applied to BigData problems. I think most if not all of the IMDG products have Map/Reduce or similar capabilities (in the case of Coherence, since 2003). More importantly, there's a lot of investment going on in this area, so it's not surprising to see vendors chasing that.

    One of the things I'm surprised wasn't mentioned is how the IMDG products were instrumental in the early adoption of "NoSQL" concepts like scale-out key/value storage, transparent elasticity, dynamic HA across multiple servers, parallelized queries and data processing with secondary index support, etc.

    The IMDG market continues to grow quite rapidly and there continue to be new entrants in the market. It's a pretty exciting place to be.

    Peace,

    Cameron Purdy | Oracle

    http://coherence.oracle.com/

  3. I wouldn't call it "chase"[ Go to top ]

    In-memory processing fits perfect with the current narrative that Big Data should actually provide some speed when being processed. So, in many ways it is Big Data "chasing" in-memory processing technology to get away from its current and embarrassing batch/offline roots. 

     

    I would disagree with Cameron on MapReduce availability. Without going too much in details - there is a dramatic difference between RMI/RPC on steroids and a full-blow In-Memory Compute Grid. It's like comparing In-Memory Data Grid with local hashmap. Yeah - kind of the same but rather massively different in real live. Here's some key features that GridGain provides for in-memory compute grid just to support my argument: http://www.gridgain.com/features/compute-grid/

     

    Some companies are in process of introducing some of these features (thanks to our open source edition and extensive documentation :) but it will still take more months and years to get to a serious level: zero deployment anyone, distributed tasks sessions, adaptive load balancing with job stealing, etc.

     

    100% agree with Cameron on NoSQL products. Most of them (if not outright all) are technologically years behind more established IMDG crowd. Look at MongoDB or CouchDB technical debacles... Look at “advances” many of the vendors are touting - something that’s been done years ago in IMDG market. 

     

    The reason for that is pretty obvious: 90% of NoSQL usage comes from the same crowd as a typical memcached users: non-critical, “moms-n-pops” websites. 90% of IMDG/IMCG usage comes from mission critical systems. Different customers, different requirements, different products...

     

  4. In memory data grids (IMDG)[ Go to top ]

    Cameron,  Are you seeing IMDG being used in healthcare? IT in HealthCare seems to be in the stone ages. They, for the most part, are not using messaging (HL7/TCPIP does not count), ESB, SOA, caching, let alone IMDG. One of the biggest EMR vendors is using Cache' (supposedly SQL Server too).  We have on of the largest groups using their EMR. Supposedly we have having performance issues. My first thought is that if we used caching and something like an IMDG, it would help with performance.

  5. In memory data grids (IMDG)[ Go to top ]

    Mark -

    I see it being used in health-care, but not very widely. I don't think the need for scalability is there as a general requirement, except for a few areas, e.g.

    * Health care web sites and SaaS apps? Yes. (One recent SaaS example: Drug trials.)

    * Genetics? Yes. (DNA crunching is big money .. but strangely enough, more so for plants than humans.)

    * EMR? Yes -- but the one I'm aware of is still in development.

    Peace,

    Cameron Purdy | Oracle

  6. In memory data grids (IMDG)[ Go to top ]

    Thanks Cameron, that is pretty much where i thought it would be needed.

    How about "data warehouse" type things. We have tons of data and while some questions can be asked "ex post facto", some analysis should be "real-time" (i.e fraud - people getting prescriptions from more than one doc). I was looking at S4 and Storm last night and I am wondering where each, including IMDG, might fit in.  FYI - we have MS Amalga.

    Also, do you think healtcare for this sort of thing is an "opportunity"? Is it worth the effort ($)? 

    I am curious as to who is creating an EMR using an IMDG. I hope they do well, but there are some established vendors and I most healthcare IT might not understand the benefit of an IMDG.

  7. In memory data grids (IMDG)[ Go to top ]

    Mark -

    The company I know of building with this technology in EMR is a pretty large company, but the use cases for the technology are generally simple (performance, scalability and HA), and not so much about advanced IMDG features (parallization of searching / crunching, etc.) However, I'm not directly involved, so there may be more going on than I'm aware of.

    Peace,

    Cameron.

  8. In memory data grids (IMDG)[ Go to top ]

    Cameron, thanks for the replies. 

    FYI - the EMR i am talking about is Epic. My current employer is using it (both in ambulatory and [soon] acute). The other big healthcare organazation in town is too. Look me up on LinkedIn if you would like to know who i work for (i am in your connections).

  9. It's important to note that while many IMDG takes a BigData position into the market there is still a difference on the detailes behind the level of integration.

    Most from what i could see are taking what i would refer to as an alternative appraoch to Hadoop and some of the other NoSQL solutions out there.

    With GigaSpaces we took more of a complementry approach which basically integrates with Hadoop, MongoDB, and other NoSQL/BigData solutions in various areas:

    1. As a front end real-time processing engine.

    2. As a cloud management system 

    As i pointed out on this series of posts.

    @Nikita i agree with your comment - a glorified RMI is not Map/Reduce. Map/Reduce has to be much more data centric and inlcude support for asynchronious invocation, implicit transactional support, workflow etc. This concept stood at the heart of the Space Based Architecture.

    I personally believe that the entire data processing space is going through a tactonic change and what used to be a niche play for IMDG becomes more mainstream play in the context of Big Data. 

    There isn't a single technology that would cover it all. The integration of all the pieces togather and making the data flow from the source through IMDG to the Batch system becomes the next big challange IMO.

    Nati S.

    http://www.gigaspaces.com/datagrid

     

     

     

     

     

    I've written

     

     

  10. Nati -

    Sorry, but you lost me with all of that link spam .. ;-)

    Peace,

    Cameron.

  11. Ah I missed that good old Cameron

    Certainly there are certain things that never change...

     

  12. Nati -

    Ah I missed that good old Cameron

    Certainly there are certain things that never change...

    I aim to please ;-)

    I'll be at JavaOne next week, so if you're there, I'm sure we'll get a chance to meet up. Until then ..

    Peace,

    Cameron.

     

  13. Would love to - unfortunately this JavaOne falls on a Jewish holiday so i'm going to skip this one.

    I'll ping you next time i'm the Boston area - i'm sure there's alot to catch up.

     

  14. Including GemFire/SQLFire[ Go to top ]

    If you are looking at key vendors in this space, it is worth your time to look at VMware's GemFire and SQLFire.  See coverage on highscalability.com and a post on InfoQ.

    Kindly,

    The vFabric Blog Team