Home

News: Big data with Hadoop and a NoSQL database

  1. Big data with Hadoop and a NoSQL database (1 messages)

    Most application development teams are used to some amount of data management, and some may have done some experimenting with NoSQL and big data techniques, but when it comes to handling "big data", few teams are truly prepared. Successfully processing the incredible amounts of data available to today's enterprises requires a dedicated distributed data framework such as Apache Hadoop. When moving to a new framework – especially one built with a specific purpose (such as big data) – it's very important to understand what new skills the team will have to come up with and which old skills will have to be honed.

    As enterprise organizations continue to increase their ability to gather data on everything from customers to inventory to vendors to competitors to you-name-it, problems arise in handling the incoming data. Relational databases were a huge step forward in the 1970s and '80s, but they seem to have just about reached their limits in the 2010's. Today's Big Data management must handle terabytes, and in some cases petabytes, of data.

    Handling this much data generally requires distributed cloud computing technology and advanced database engineering and management techniques. One of the prominent methods for handling the high numbers of servers and huge amounts of data is to implement Hadoop alongside a NoSQL database such as Mongo DB.

    10gen recommends a model where Hadoop does the heavy lifting while Mongo DB provides scalable real-time transaction support. Director of product marketing, Jared Rosoff explains that while Hadoop is about processing large volumes of data in big batches, "Mongo DB is not about batch processesing; it's about real-time transactions." According to Rosoff, the NoSQL database hooks into the batch data output from Hadoop and allows "fine-grain access, sorting and filtering [of the data]." Other open source NoSQL databases to consider include CouchDB, HBase and Cassandra, among others.

    Is your organization working on big data projects? How are you getting the job done? What are the most important skills for a Hadoop implementation? If you're just getting started into big data processing, check out this Hadoop skills story over at SearchSOA.com.

  2. Handling this much data generally requires ... advanced database engineering and management techniques. One of the prominent methods for handling the high numbers of servers and huge amounts of data is to implement Hadoop (...)

    I'd like to comment that while Hadoop has become ubiquitous for many big data scenarios, it is not true that "big data" is synonymous with "Hadoop" or that Hadoop is an essential component of any big data project. It's not that big data requires Hadoop, or Hadoop with NoSQL, or Hadoop with something. Actually it's often possible to do "big data" - as defined by very large amount of bytes, key value pairs or whatever - without using Hadoop or a similar technology at all. And then there is also no need to hire or acquire all the special skill that these solutions require.

    It's all a question of what kind of processing the data needs. In many cases, big data is quite useful on its own and doesn't need "heavy lifting" processing which is Hadoop's sweet spot. For example, in web analytics data, depending on the use case, sometimes the raw data itself or a close approximation is enough. You might need to analyze a very large number of transactions by different factors like channel, landing page, and assisting conversions. Because the focal point of the analysis is the transactions themselves, and there is no massive aggregation going on, this sort of thing could be done within MongoDB or a similar solution with no need for "help" from something like Hadoop. 

    I realize this is not always clear cut - just wanted to make the point that in many cases, big data can be handled just by a regular NoSQL database (albeit one with deeper functionality than just key-value), and in these cases the claim that organizations are "not ready" or that special skills need to be acquired, is not necessarily true.

    --Sarah S.