There is no doubt that big data management has become a hot topic in the enterprise development community. But why has the discussion of big data analytics become such a phenomenon of late? Why hasn't processing big data been part of our enterprise toolkit in the past, and what is it about the current Information Technology ecosystem that makes big data solutions especially prudent?
One of the key reasons why big data management is becoming so prevalent is almost self-explanatory -- many organizations have to deal with ever increasing amounts of data to manage. From Internet search engines that need to examine Brobdingnagian amounts of information to research projects working with genomics or atmospheric science, the size of the data sets that human endeavors are now concerned with is becoming enormous. Processing terabytes of data was once an intimidating prospect, but that pales in comparison to the petabytes of data various organizations are now faced with processing.
Processing power is the key. It is one thing to be able to store a huge amount of data, but it is another thing entirely to process it. After all, what use is storing big data if it can't be mined? And when we talk about mining data, we're talking about doing it with a significantly greater speed than mining for coal. Data is useless if we can't get meaningful information from it within a reasonable amount of time.
Right now, managing big data is more feasible due to the affordability of processing power. In the past, a Fortune 500 company would need to dilute their shares and issue more common stock in order to be able to purchase multi-processor machines that could efficiently eat away at terabytes of data. But nowadays, a grade school kid could purchase the equivalent processing power with her allowance money.
What's more, there really isn't the same need as there was in the past to go out and purchase big hardware and impressive workstations from companies such as Oracle and IBM. Instead, a prudent IT department can simply go online and purchase a few hundred motherboards and multi-core processors to be shipped directly from Taiwan at historically low prices. Open source software can then be used to organize the assortment of motherboards and processors into a simple grid, and then that homegrown processing power can be used to eat away at that petabyte of unstructured data.
Along with processing power, the availability of free software has also empowered the big data movement. Tools like HBase can be used to store big data in a single, massive database table that can scale to billions of rows and millions of columns. From there, if you're interested in mining your HBase data, Hadoop can be used to process that massive data set and make sense out of the information it's constantly amassing.
"If you want to get a particular thing, you access the data using the HBase side of the universe, but if you want to do something that involves analyzing everything, if you want to find the average age of the planet and you want to go through a billion records, then you use Hadoop." Says James Gosling, the father of Java. "It ends up being remarkably fast and remarkably efficient."
The accumulation of huge pools of data, the affordability of processing power, and the availability of specialized software make up the trifecta of reasons that have made 'big data' not only a keen topic of interest, but also a feasible approach to managing information. With the combination of cheap processing power, and the ability to use freely downloadable, open source software solutions like Hadoop and HBase, enterprise architects have new and effective tools with which to process big data. As more companies gather more pieces of information from a more disparate set of inputs, the power to process big data couldn't have come at a better time.
Taming The Big Data Tidal Wave By Bill Franks
NoSQL Distilled By Martin Fowler
MongoDB: The Definitive Guide By Michael Dirolf
MongoDB in Action By Kyle Banker
The Well-Grounded Java Developer By Martijn Verburg