Feature:

Finding a needle in a big data haystack: Cloud based analytics to the rescue

By Cameron McKenzie

TheServerSide.com

How do you find a needle in a big data haystack? For enterprises that have a gargantuan amount of information to handle, not being able to locate pertinent and meaningful information is the first hurdle to overcome before analytics even comes into the picture. That was the case for HGST, a major manufacturer of computing hardware with a pressing need to track data collected inside their production facilities. According to David Hinz, Director of Cloud & HPC Solutions for HGST, finding the required data to evaluate production processes was a quest that could prove lengthy or impossible.

“We weren’t sure where it was, who had it, or how to get to it," said Hinz. "For HGST, we wanted to improve the capability to get the data in a timely manner and make it efficient for the team to find it. Then, we could turn around and start using analytics to find the insights to help us to improve manufacturing, customer responsiveness, and the product that goes out the door.” By centralizing and streamlining data in the cloud, the data search parties could find the information they needed in minutes instead of weeks. Instituting Hive data warehousing in the cloud proved to be the right choice for this company.

How does big data fit with the cloud?

We weren’t sure where it was, who had it, or how to get to it

David Hinz, HGST

The vast majority of today’s data is unstructured, and most of this data is user generated. As Ben Butler, Sr. Manager of big data & HPC at AWS, commented, “It’s now dramatically easier and lower in cost to generate that data. It’s putting a bit of pressure on the rest of the lifecycle: collection and storage, analytics and computation, being able to make sense of that data that is growing at an increasing rate.”

Today, it’s not unusual for large businesses to measure their data in petabytes with more streaming in all the time. This explosion of available information means there’s an increasing gap between the amount of data that can be collected and what can be effectively analyzed.  “With big data, you have the volume, variety, and velocity requiring new tools," said Butler. "In the cloud we have a combination of different compute, network, and storage tools you can use to address these issues.”

Unlocking the secrets of big data

Elasticity and on-demand provisioning provided by the cloud are key forces that empower organizations to experiment and try new approaches to big data problems. Organizations can experiment and manipulate their data in different, iterative ways, along with the provisioned infrastructure itself. The infrastructure no longer constrains what can be done with the data. These same flexible features allow businesses to avoid overspending even with a highly variable workload.

Scalability can also reduce latency for data processing as needed so there is less contention for resources. Access to unrestricted resources allows businesses to make better decisions on the fly using AWS solutions. For example, with EMR or other solutions plugged in to the backend of Kinesis, near real time analytics can be performed with viewable results delivered to a BI system within seconds of hitting the processing stream.

Dealing with big data is a challenge, and performing meaningful analytics on that data can be impossible without a significant amount of hardware and iron to process it. But fortunately, the ability to cloud burst and use processing power elastically means smaller organizations can now do what was once only the realm of the big bay area behemoths. The cloud has made big data more manageable, and it has made analyzing that data much, much faster. As big data keeps getting bigger, bigger clouds will be there to handle it.

How has the cloud helped you manage your big data solutions? Let us know.

19 May 2014

Related Content

Related Resources