The big data that companies successfully transform into usable business intelligence (BI) is just the tip of a massive data-iceberg, according to Jonathan Seidman, solutions architect at Cloudera. At Big Data Techcon 2014, Seidman hosted a session called “Extending your data infrastructure with Hadoop,� in which he explained how Hadoop could help the enterprise tap into that potential business intelligence below the water. “That data that’s getting thrown away can have a lot of value but it can be very difficult to fit that data into your data warehouse,� Seidman explained.
The problem with big data is that there’s so much of it. Data centers simply don’t have the capacity to store it all. “Would you put a petabyte of data in your warehouse?� Seidman asked the audience. “It’s a good way to get fired,� a member shot back. For this reason, enterprises focus their energy on the data points that give a high return-on-byte, to use Seidman’s term. That is, they capture and analyze the data that provides the most insight for the least amount of storage space. For example, a retailer would analyze the transactional dataset, focusing their attention on actual purchases. But Seidman pointed out that valuable data gets left out – behavioral, non-transactional data, in the retail example. “What if you don’t just want to know what the customer bought, but what they did on the site?� Seidman asked.
Enter Apache Hadoop, an open source framework designed to store and process large data-sets. Seidman described this technology as “scalable, fault tolerant and distributed.� With this framework, enterprises can load raw data into it and impose a schema onto the data, afterward. “This makes it easy for iterative, agile types of development,� Seidman said. He added that it made a good sandbox for more exploratory types of analysis.
How does your company make the most of its big data? Let us know!