Discussions

News: The backlash against the data grid JSR and JCP reform

  1. The recent backlash against Red Hat's data grid JSR proposal sparked an interest as I know the JCP is going under reform right now.

    For those of you who have missed the debates, here's some background reading for you: 

    So, here are my thoughts on the subject!

    Technical merits

    On technical merit the JSR proposal certainly seems sound enough to be a solid starting point for discussion (I'm no expert mind you).

    Open standards are good

    It's a laudable goal to standardise this space, and Red Hat have got 
    my support on that front. Just like vendor lock-in was bad for databases and app servers, it's also bad for developers and users of data grids.

    This area is potentially worth Billions and so I can understand why some vendors may be reluctant to form an open std around it, but really, it's a good thing! I believe the vendors should be competing on performance and other factors, not basic get and put type API calls.

    Where the proposal went wrong

    Politically, Red Hat went about this in an unusual way, hence the backlash. Sadly, that sort of backlash can be enough to sink a JSR before it even sets sail.

    I was a surprised that an organisation with so much JCP experience presented this JSR without the usual pre-collaboration that typically goes on in these cases. So I dug a little deeper to find out why Red Hat had gone about it this way.

    The root cause

    Without knowing all of the ins and outs, the new data grid JSR was proposed partly because of Red Hat's frustration with trying to get JSR-107 (data caching) back to an active state (it's been an inactive JSR for some time).

    A std caching API (JSR-107) would be a natural base for any agreements around standardising any further data grid APIs on top of that.

    Red Hat (and others?) had tried to re-vitalise the JSR-107 Expert Group to get 
    the JSR re-opened, ratified and released. That would've been a great start, as it meant that there would be collaboration amongst the same vendors that are then needed for a subsequent data grid JSR.

    Why did that attempt fail?

    We don't really know. Unfortunately nobody on the outside can confirm what was/is going on as the JSR-107 mailing list is closed to the public!

    This is the crux of the problem with the existing JCP rules, in that 'open standards' are being decided behind closed doors. Thankfully Patrick Curran is working on changing this (see! Oracle isn't always evil ;p).

    So what happens next?

    By raising the new JSR, Red Hat has gotten their desired result of getting JSR-107 moving again to complete the caching work. It's a 
    shame that they were seemingly forced into this stance.  We'd all much rather see deliberate community collaboration, it's certainly not a model of how we want to see inactive JSRs moving again!

    Red Hat's intentions are almost certainly completely honourable, but as some of the other vendor's stated, the raising of the new data grid JSR came across as a great surprise and was therefore not as welcomed as it could have been.

    So, JSR-107 will go ahead, but it'll take some amount of bridge mending before data grid JSR gets off the ground.

    Lets avoid this in the future and support Patrick Curran's JCP reforms!

     

    Cheers,

    Martijn

    Threaded Messages (25)

  2. Few thoughts...[ Go to top ]

    First of all, I'll let JSR leads to respond on why JSR-107 "failed" or moved so slowly (Greg, Cameron). 

    As far as overall idea of two JSRs (one for local cache, another for distributed) - I get it and support it in general. My biggest concern, however, is a chasm in implementations, designs and approaches between different popular vendors. 

    That's what get many people so confused:

    ...basic get and put type API calls.

    Yeah... and if data grids would be so simple - we would not have these discussions. In real iife the differences are dramatic. Just take a cursory look at Coherence, GigaSpaces, GridGain, Infinispan, Terracotta, GemStone - some of the major Data Grid vendors. About the only commonality between these products is that they perform some type of in-memory caching - and that's about it. Every product has 70-80% of its appeal in a specific and unique features & designs that make this product appealing to a particular user group.

    Standartizing data grid (at least today) is like trying to standartize on one-fits-all JVM language...

    And that's the source of backlash. Without taking one particular implementation (Infinispan in case of RedHat) and making the standard essentially a carbon copy of it - this new JSR has little merits, IMHO. But that will create understanable tensions among other vendors.

    Food for thoughts...

     

    Nikita Ivanov

    GridGain Systems

    JSR-107 member

     

     

  3. Few thoughts...[ Go to top ]

    Hi Nikita,

    Thanks for your insider view comments.  Admittedly I have only been a light user of data grids and was genuinely not aware that there was so little common functionlaity.

    I do apologise for not having investigated that properly (poor on my part).

    For my learning (and that of others), are you able to list one or two features that are unique to each vendor's product that make them so compelling/unique?

    I know this is very much something I should've investigated properly myself, but I am genuinely curious.

    Cheers,

    Martijn (who has much to learn about data grids)

  4. Couple of items[ Go to top ]

    Martjin,

    I can't speak for other vendors but I can list few features of GridGain (just off the top of my head) that are absolutely essentially to our users and unique to us:

    • Zero deployment/provisioning
    • FP-based Java APIs and native Scala support
    • Full integration with our Compute Grid
    • Full SQL queries w/ local & remote filters/reducers

    Most of the other guys don't have these features (just as we don't have some of their features) and they are not simple (or in most cases even possible) to implement in their products. So, if we strip down all the unique features from each product and make the standard just to be a lowest common denominator - it'll have relatively zero value for the end user.

    --

    Nikita Ivanov

    GridGain Systems.

  5. Couple of items[ Go to top ]

    I have to admit, it'd be funny to see every distributed cache vendor support full integration with gridgain's data grid.

    Incidentally, LOTS of vendors - including Gridgain - support what JCache would have specified. The concept of key/value stores, distributed or not, isn't hard; lord knows gigaspaces has it because it almost fell out of the sky based on our data grid mechanism.

    The difficult thing would be development of a TCK that actually worked and needed a data grid. Distribution isn't a problem for the *specification*, it's a problem that needs a finely tuned approach based on what the user needs.

    Personally, I don't see the requirement for EITHER JSR - JCache or Infinispan. It's a map, based on key/value. The map should be able to expire entries explicitly and implicitly. If you need services like writethrough or writebehind, well, that's where the vendors come in - if you make a specification that mandates how writebehind or transaction sync works, well, I know that GigaSpaces would resent it because the specification would require that we dumb down our product to, oh, GridGain's level.

    :) :) :)

  6. C'mon[ Go to top ]

    I know that GigaSpaces would resent it because the specification would require that we dumb down our product to, oh, GridGain's level.

    Well Joe,

    We all know about your (and your employer - GigaSpaces) insecurities but perhaps that is what makes you twitch so badly: http://www.globes.co.il/serveen/globes/docview.asp?did=1000634028&fid=1725

     

    Btw, send us your resume - we are hiring!

    --

    Nikita Ivanov.

    GridGain Systems.

  7. Couple of items[ Go to top ]

    Standardizing a subset of features of all products is exactly what the JCP is about. JavaEE servers must support servlets but nothing stops them from supporting clustering, great management features, etc. There will be great value for end users because we'll not have to deal with 10 different ways to bootstrap libraries and we'll be able to work with well defined features even if the implementations support additional things.

    I can see how GridGain guys may not support this since they can't even keep compatibility among versions of their own product... HUGE API changes in SPIs, method renaming for no good reason (PeerClassLoadingEnabled to P2PClassLoadingEnabled).

    See:

    http://www.gridgainsystems.com/jiveforums/thread.jspa?threadID=1341&tstart=30

    http://www.gridgainsystems.com/jiveforums/thread.jspa?threadID=1351&tstart=30

  8. We've fixed several naming inconsistencies. 2 seconds change. Zero complains from our customers (we are helping each and everyone of them to migrate on major releases). 

     

    Best,

    Nikita Ivanov.

    GridGain System.

  9. Couple of items[ Go to top ]

    Nikita,

    I disagree with this. JSR's are not about limiting features they are about standardising apis so that developers do not have to make substantial changes to move their application from one vendor's implementation to another. JSR's don't mandate how a product implements an api or what additional functionality or optimisations are offered by a vendor.

    This has huge advantages to the industry as a whole as it enables rapid adoption of technologies through the removal of the fear of vendor lockin. Most companies we talk to about data grids the first problem is which vendor to choose as there are many in the market. This is a big deal at this stage as you have to build your application to the vendor's api which means if a few months into the project you find their technology does not cut it for your use case you are hosed. There's no easy way to port across to an alternative. Therefore the perception in end users is that the market is fragmented and immature, the technology will lead to vendor lockin, developers with api knowledge are rare = expensive and therefore implementation is high risk. A paraphrased quote from a user is

    "You are trying to convince us to use a data grid as a system of record but there are no standards, sheesh that's a big ask!"

    API standardisation alleviates these problems and will lead to greater adoption in the market while leaving you free to innovate on scalability, high availability, elasticity, monitoring, manageability, performance etc etc. While leaving us users in the field to talk about data grids with a solid foundation based on standards and a choice of vendor offerings so we can choose the best price, performance combination for our customers.

    In my view no standardisation = niche market!

     

    Steve Millidge
    Director
    C2B2
    www.c2b2.co.uk

  10. Agree - in geneal[ Go to top ]

    Steve,

    I would LOVE to see standards around Data Grid and Compute Grid. Our sales will probably 10x fold if that happened. The problem I see is that:

    - JCP is awfully broken process

    - JEE has lost but all momentum - innovation happens really eslewhere nowadays

    - Design/API/Approached are dramatically different betbween vedors to be easily reconcilable (nothing to do with internal implementations)

    - Stripped-down common denominator has little or no use for the end users

     

    I'm all for Manik's Greg's effort and will gladly provide my thoughts to EG (unless we are pushing specific impl as a standard - which is by all accounts may very well be happning).

     

    Nikita Ivanov

    GridGain Systems.

  11. Compare to NoSQL crowd[ Go to top ]

    Another observation I would like to make is that we don't see a standartization drive from NoSQL vendors. Yet, their products are order of magniture simpler and smaller - and you would think the standardatization would come a lot easier and cheaper for them...

    But even on the level of Document-based storage (NoSQL) the solutions' differences are big enough that most products provide unique (not commodotized) value. 

    Think about it...

     

    Nikita Ivanov.

    GridGain Systems.

  12. Compare to NoSQL crowd[ Go to top ]

    Another observation I would like to make is that we don't see a standartization drive from NoSQL vendors. Yet, their products are order of magniture simpler and smaller - and you would think the standardatization would come a lot easier and cheaper for them...

    But even on the level of Document-based storage (NoSQL) the solutions' differences are big enough that most products provide unique (not commodotized) value. 

    Think about it...

    Indeed. My thoughts exactly.

  13. Compare to NoSQL crowd[ Go to top ]

    Another observation I would like to make is that we don't see a standartization drive from NoSQL vendors. Yet, their products are order of magniture simpler and smaller - and you would think the standardatization would come a lot easier and cheaper for them...

    But even on the level of Document-based storage (NoSQL) the solutions' differences are big enough that most products provide unique (not commodotized) value. 

    Think about it...

    Are you actually suggesting that building a standard around most popular NoSQL projects out there, on different platforms (Riak, CouchDB: Erlang; Redis, MongoDB, Membase, TokyoCabinet: C/C++; Hadoop, Cassandra, Voldemort: Java) and different storage models (K/V, document, column), network model (replicated/distributed/local) and persistence (disk vs memory) is easier than building a standard for Java-based, in-memory Map-like (K/V) data grids, with optional support for distributing data?  ;-)

    - Manik

     

    Nikita Ivanov.

    GridGain Systems.

  14. Exactly...[ Go to top ]

    You nailed it. Needless to say that most serious Data Grid vendors support different storage models, different distribution modes (full or partial replication, invalidation, local, etc.), different persistence modes, different storage models (KV, document, free-text), complex SQL-based querying with FP-based logic, and A LOT more than that.

    So, NoSQL seems to be narrower and simpler to put a standard on API for. Most of them have derivitives of Memcache REST APIs, for example.

    Optional distrubution of data?!?! Manik - can you show me one Data Grid without data distribution, LOL.

     

    Nikita Ivanov.

    GridGain Systems.

  15. Compare to NoSQL crowd[ Go to top ]

    And the NoSQL crowd are small and niche due to lack of standardisation w.r.t. RDBMS and will stay there w/o standardisation. It may be that one product will become the "de-facto" and everybody has to implement their api to compete against them but we aren't there yet. This means every customer has to run a long, costly and complex POC of each technology to choose the best NoSQL store for their use case as they have to rewrite the code for each POC.

    Steve Millidge
    Director
    C2B2
    www.c2b2.co.uk

  16. Enabling significantly larger heap sizes without significant GC penalties is sweet.
    In the past, I had read 1Go by JVM was the initial target for Coherence in order not to have such GC penalties.
    How much memory can we get today with 3.7 improvements ?
    Can we start with 4 Go (due to factor 4 improvement mentionned above) and may be, increase more ?


    Is flash memory used for overflow purposes *only*, or is Flash used also like non-Flash memory for main data storage ?


    Thanks for answers.

  17. Hi Dominique -

    I think you accidentally posted on the wrong thread ..

    > How much memory can we get today with 3.7 improvements ?

    We were testing 3.7 with up to 31GB heaps (using G1) with good results. The object count reduction helps, no doubt.

    > Can we start with 4 Go (due to factor 4 improvement mentionned above) and may be, increase more ?

    Yes. I think we recommend 4GB today on 3.6, and 3.7 should allow you to safely expand beyond that.


    > Is flash memory used for overflow purposes *only*, or is flash used also like non-flash memory for main data storage ?

    The priority is to use the amount of RAM that's been provided to the Elastic Data feature, but once that's used up all storage goes to flash.

    Peace,

    Cameron Purdy | Oracle Coherence

    http://coherence.oracle.com/

  18. I'm not sure I understand all of this. The article here seems to suggest that JSR 107 is still not moving and out of frustration Red Hat filed another one.

    But of all things, after years and years of inactivity, JSR 107 just started moving and was announced as being a strategic part of Java EE 7 in the various Java EE 7 news posts.

    Two links:

    1. JSR107 (Java Caching API) Update – Lots Happening
    2. News on JSR107 (JCACHE) and JSR342 (Java EE 7)

    So this really seems a little strange. JSR 107 is inactive for like forever, and just weeks after it's reactivated this other JSR is proposed???

    Is this just really bad timing? E.g. Red Hat working internally on this other proposal for years and then suddenly JSR 107 being activated again, or did they quickly invent this new JSR after seeing JSR 107 being activated?

     

     

     

  19. Thanks for posting this. A JCache update is indeed slated for Java EE 7. The Red Hat move is very puzzling...

  20. My understanding is that JSR-107 only started moving again because this new JSR was raised (or some memebers found out it was going to be raised).  It's hard to know, again because we didn't have access (although I note now that you can join the 107 google group).

  21. The Google Group is a very recent addition.  Most discussions still are on the private JSR-107 EG list on JCP.org.

    - Manik

  22. @augustientje and @reza, it is a timing issue.  And in part, the activity you see on JSR-107 is due to my proposing a new data grid JSR.  

    For the record, I don't actually care which JSR is adopted in the end, or who leads it. I just care about:

    • A standard in the first place.
    • One that isn't too dumbed down to just a Map plus some extra bits.
    • I disagree with Nikita and Joe in that I think there is definitely a need for a standard, and certain enterprise features need to be in this standard.  Such as defining JTA interoperability, write-behind, write-through, expiry and eviction, etc., and ensuring consideration is taken for distributed systems, as most vendors in this space are, in some shape or form, distributed.  This is how most folks use data grids at the end of the day.
    • Additional stuff Greg very recently proposed in JSR-107 - annotations, injection, externalising serialisation - all very good things, which I do support.

    As I have mentioned on my blog here and here, my reasons for proposing a new standard have been twofold:

    1. that JSR-107 has been inactive for way too long.  
    2. that it is too simplistic.  

    The way things stand now, Greg and Cameron are committed to solving both of these issues, which is good.  What remains to be seen is how this progresses.

    Hope this clarifies things.

    Cheers

    Manik

    Founder and project lead, Infinispan

    Red Hat Inc.

    JSR-107 EG member

  23. Manik,

    That does clarify matters and what you are saying certainly makes sense to me. We would definitely support it in an updated JCache API (our team at Caucho is not yet sure if we will join the JCache EG officially or not). Glad to hear things are moving in a positive direction.

    Cheers,

    Reza

  24. I support Manik in calling for a new spec and I think that spec should incorporate JSR-107

    JSR-107 has been dead for nigh on a decade just reactivating it and throwing out a draft spec in a couple of months to me is a bad idea for a number of reasons;

    There has been no formation period for the expert group and no real call for experts. It looks like the members on there have been coopted rather than reaching out to the community. A new JSR will go through the expert group formation phase, and will enable a diverse mixture of vendors and users to get involved.

    JEE has moved on since the JSR-107 was drafted so it will need substantial revision to measure up with the current JEE way of doing things; we will need CDI, Annotations, Generics, app server integration via JTA or JCA and pluggability into other JEE subsystems like JPA, Web Sessions, SFSB etc. This is a big ask to retrofit into JSR-107 in a couple of weeks to get a draft spec and I feel this is what the new JSR should be addressing.

    My preference would be to ditch JSR-107 take the nascent JCache api into a new JSR and address full Java and JEE 7 compatability in the new JSR. Once that has been done we can get onto the interesting bit of adding data grid apis into JEE.

     

    Steve Millidge
    Founder
    C2B2
    www.c2b2.co.uk

  25. Steve,

    I do tend to agree with you that the JCache update has been too much of a black box in terms of it's EG nomination/formation/progress so far. Hopefully that will be rectified as things move along? I don't think the JCache update time-table is particularly hurried so hopefully it should be possible to properly modernize/complete it?

    Cheers,

    Reza

  26. JSR 347 has been voted in[ Go to top ]

    FYI, JSR 347 has been voted in.  I blogged about it here:


    http://infinispan.blogspot.com/2011/05/jsr-347-data-grids-for-java-platform.html

     

    Cheers

    Manik