Most application development teams are used to some amount of data management, and some may have done some experimenting with NoSQL and big data techniques, but when it comes to handling "big data", few teams are truly prepared. Successfully processing the incredible amounts of data available to today's enterprises requires a dedicated distributed data framework such as Apache Hadoop. When moving to a new framework – especially one built with a specific purpose (such as big data) – it's very important to understand what new skills the team will have to come up with and which old skills will have to be honed.
As enterprise organizations continue to increase their ability to gather data on everything from customers to inventory to vendors to competitors to you-name-it, problems arise in handling the incoming data. Relational databases were a huge step forward in the 1970s and '80s, but they seem to have just about reached their limits in the 2010's. Today's Big Data management must handle terabytes, and in some cases petabytes, of data.
Handling this much data generally requires distributed cloud computing technology and advanced database engineering and management techniques. One of the prominent methods for handling the high numbers of servers and huge amounts of data is to implement Hadoop alongside a NoSQL database such as Mongo DB.
10gen recommends a model where Hadoop does the heavy lifting while Mongo DB provides scalable real-time transaction support. Director of product marketing, Jared Rosoff explains that while Hadoop is about processing large volumes of data in big batches, "Mongo DB is not about batch processesing; it's about real-time transactions." According to Rosoff, the NoSQL database hooks into the batch data output from Hadoop and allows "fine-grain access, sorting and filtering [of the data]." Other open source NoSQL databases to consider include CouchDB, HBase and Cassandra, among others.
Is your organization working on big data projects? How are you getting the job done? What are the most important skills for a Hadoop implementation? If you're just getting started into big data processing, check out this Hadoop skills story over at SearchSOA.com.