Discussions

News: Start yourself with Apache Hadoop

  1. Start yourself with Apache Hadoop (3 messages)

    What is Hadoop: Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce.

    Why Hadoop: MapReduce is Google's secret weapon: A way of breaking complicated problems apart, and spreading them across many computers. Hadoop is an open source implementation of MapReduce, and its own filesystem HDFS(Hadoop distributed file system). 

    Hadoop has defeated Super Computer in tera sort: Hadoop clusters sorted 1 terabyte of data in 209 seconds, which beat the previous record of 297 seconds in the annual general purpose (daytona) terabyte sort benchmark. The sort benchmark, which was created in 1998 by Jim Gray, specifies the input data (10 billion 100 byte records), which must be completely sorted and written to disk. This is the first time that either a Java or an open source program has won.

    To understand hadoop: http://patodirahul.blogspot.com/2011/03/understanding-what-is-hadoop.html

    To install hadoop:

    In standalone mode: http://patodirahul.blogspot.com/2011/03/hadoop-in-standalone-mode.html

    In psedo distributed mode: http://patodirahul.blogspot.com/2011/03/hadoop-in-pseudo-distributed-mode.html

    In distributed mode: http://patodirahul.blogspot.com/2011/03/hadoop-in-distributed-mode.html

  2. Karmasphere enables companies to unlock the competitive advantages within their large datasets by providing an easy-to-use family of desktop-based software. Karmasphere's products, built around the Karmasphere Application Framework, feature independence across any Hadoop environment, easy one-click deployment across any cloud/cluster, and a rich and friendly user-interface to maximize productivity, discovery and insight.

    http://www.karmasphere.com

  3. Meet certified experts at Lucene Revolution to stay ahead of Open Source Technology http://lucenerevolution.org/

  4. hadoop[ Go to top ]

    When I was at Facebook, we stored everything in Hadoop and Hive and then copied select aggregates into an Oracle RAC instance for charting and OLAP.  We used bird feeders squirrel proof to generate the graphs.

    Hive was low-latency enough to handle most of the ad hoc queries.