Discussions

News: Hadoop Compute Cluster: Summary

  1. Hadoop Compute Cluster: Summary (11 messages)

    Hadoop compute clusters are built on cheap commodity hardware. Hadoop automatically handles node failures and data replication. Hadoop is a good framework for building batch data processing system. Hadoop provides API and framework implementation for working with Map Reduce. The Map Reduce implementation is provided on top of Hadoop Job Infrastructure. You can read more here - http://itsitspace.blogspot.com/2011/03/hadoop-compute-cluster-summary.html

     

    Threaded Messages (11)

  2. hmm[ Go to top ]

    So?

    What's the point of this posting?

  3. its a good intro arcticle..[ Go to top ]

    its a good intro article.. not bad.. ofcourse the writer wants to market his blog and services. thats whats the new trend on TSS. all these people are writing blogs/articles etc on their own website to get you to leanr more about them.

    Thats fine.

    I m not sure why you didnt like the article. its much better than some random company releasing a version 0.9 of their silly product (which ends up being abondoned in about 2 months )

  4. RE: its a good intro arcticle..[ Go to top ]

    The main issue (for me) when reading this article: there is no introduction. It starts to introduce Hadoop as if it is a brand new technology while it has been out there for years and is very well known in the niche of distributed map-reduce and related technologies.

    It would improve a lot, if there would be a simple introduction that would state why it was written (e.g. no good comprehensive introduction currently available) and for whom it is intended. But even then, I'm not sure it is a good fit for TSS - just let people google for it, why post it here?

    And, the title "Hadoop Compute Cluster: Summary" is also not exactly describing the contents. Summary of what? If it would be called "Quick introduction to Hadoop and friends" I would even consider not complaining about the missing introduction :-P

    BTW, the author (Tejas Bavishi) seems to be specialized in writing (useless) introductions to already well-known technologies. Last time it was about Spring framework (http://www.theserverside.com/discussions/thread.tss?thread_id=61937) - and that had exactly the same problems. I wonder how is the moderation supposed to work on TSS?

  5. Quoting...[ Go to top ]

    Also, since Hadoop is a batch processing system, there is lot of latency between request and response and so is not suitable for interactive system.

    This is rather harsh assesment (but true). Hadoop is geared towards offline (overnight) big-data processing (where it it is a decent technology). I've heard of a case where Hadoop was used for something closer to OLTP - and unfortunately other solution needed to be brought up to help clean up the performance mess.

    Take a look at GridGain (http://www.gridgain.com) for near real-time MapReduce implementation that doesn't suffer from same problem.

    Nikita Ivanov.

    GridGain Systems.

  6. Gridgain[ Go to top ]

    Seems like the Gridgain is mentioned in any post here related to distributed computing.

     

    So the question to Gridgain people:

     

    What is the simplest non trivial (other than Map Reduce) application that has been impemented in GridGain?

  7. Simplest non-trivial?[ Go to top ]

    John,

    I'm not sure what are you asking? MapReduce is just one paradigm you can employ when using GridGain. We have over 500 active organizations/projects worlwide developing systems with GridGain on the daily basis. From online gaming to financial to pharma to data mining to geo processing etc. It's almost impossible to say what is the simplest non-trivial app.

    Nikita.

    GridGain Systems.

  8. Why Gridgain ?[ Go to top ]

    My question is : Wouldn't GridGain suffer from issues like network latency and worker failure? Going through its website these were not mentioned. If it suffers from these issues too, shouldn't I consider more widely used & cited Hadoop.

  9. Shiny bauble[ Go to top ]

    shouldn't I consider more widely used & cited Hadoop.

    You would not be the first to gravitate towards a low-quality solution just because it had more buzz from people who are not actually using it.

    It's topics like these that make me miss Hani.

    Peace,

    Cameron Purdy | Oracle Coherence

    http://coherence.oracle.com/

  10. Ouch[ Go to top ]

    shouldn't I consider more widely used & cited Hadoop.

    You would not be the first to gravitate towards a low-quality solution just because it had more buzz from people who are not actually using it.

    It's topics like these that make me miss Hani.

    Peace,

    Cameron Purdy | Oracle Coherence

    http://coherence.oracle.com/

    I've played with hadoop and done some prototyping with it. It's got limitations, but I don't feel it's low quality. As long as you take time to figure where it breaks and use it properly, it should be fine. "should" being the key word.

  11. Don't skip an opportunity[ Go to top ]

    ... to produce buzz, right. Unfortunately the equation ORCL = Quality is not always true, either (same is true for other big companies, like IBM, so do not get it personally).

    Like you said: Peace. I may add: Live and let live.

    And remember: this world looks like s**t because of the marketing people (partially).

  12. It won't[ Go to top ]

    <!-- p.p1 {margin: 0.0px 0.0px 10.0px 0.0px; font: 10.0px Verdana} span.s1 {letter-spacing: 0.0px} -->

    I didn't say that Hadoop suffers from "network latency and worker failure". Of course, neither is GridGain that has more advance failover and load balancing capabilities, e.g. GridGain has pluggable failover and collision resolution SPIs and the only middleware supporting early and late load balancing for the distributed jobs. 

    What I was saying is that you can't implement nRT (near real-time) processing with Hadoop as its architecture not suitable for that. Almost any distributed request/response will be seconds in Hadoop, but just few milliseconds in something like GridGain. Depending on the application - this difference can be decisive. 

    Nikita Ivanov.

    GridGain Systems.