Hadoop expands data infrastructure, boosts business intelligence

Discussions

News: Hadoop expands data infrastructure, boosts business intelligence

  1. The big data that companies successfully transform into usable business intelligence (BI) is just the tip of a massive data-iceberg, according to Jonathan Seidman, solutions architect at Cloudera. At Big Data Techcon 2014, Seidman hosted a session called “Extending your data infrastructure with Hadoop,â€? in which he explained how Hadoop could help the enterprise tap into that potential business intelligence below the water.  â€œThat data that’s getting thrown away can have a lot of value but it can be very difficult to fit that data into your data warehouse,â€? Seidman explained.

    The problem with big data is that there’s so much of it. Data centers simply don’t have the capacity to store it all. “Would you put a petabyte of data in your warehouse?â€? Seidman asked the audience. “It’s a good way to get fired,â€? a member shot back. For this reason, enterprises focus their energy on the data points that give a high return-on-byte, to use Seidman’s term.  That is, they capture and analyze the data that provides the most insight for the least amount of storage space. For example, a retailer would analyze the transactional dataset, focusing their attention on actual purchases. But Seidman pointed out that valuable data gets left out – behavioral, non-transactional data, in the retail example. “What if you don’t just want to know what the customer bought, but what they did on the site?â€? Seidman asked.

    Enter Apache Hadoop, an open source framework designed to store and process large data-sets. Seidman described this technology as “scalable, fault tolerant and distributed.� With this framework, enterprises can load raw data into it and impose a schema onto the data, afterward. “This makes it easy for iterative, agile types of development,� Seidman said. He added that it made a good sandbox for more exploratory types of analysis.

    How does your company make the most of its big data? Let us know! 

     

  2. Big data, not big information[ Go to top ]

    The idea of just storing everything because one day we might need it is appealing, since writing to Hadoop is a nice low cost. But imposing a schema onto the data afterwards however, might not be a trivial as you make it out to be. Making sense of unstructured and unscrubbed data is no small task. But if you can pull it off, you might get a big advantage (but if it goes wrong, you're worse of than just having a traditional data warehouse). It's just something to keep in mind when you shift the complexity from writing to your EDW to reading from your HDFS.