This content is part of the Essential Guide: Maximizing and managing big data with SOA middleware

Learning to use Hadoop big data processing features to mine data

By using Hadoop's big data processing features, people are learning new ways to mine data and discover new relationships in their systems.

Software professionals are never surprised by the variety of ways that Hadoop's big data processing features can be used to mine data, discover relationships between seemingly unrelated information and quickly solve problems that software engineers wouldn't even try to address using traditional techniques.

Are hospital patients suffering from an elevated number of staph infections? When discussing the variety of ways that NoSQL and Hadoop are being used to solve real-world problems, Danielle Tomlinson, senior director of global education at Hortonworks, talked about an interesting use case where data was aggregated about the sanitary habits of health care professionals using the radio frequency identification chips in their badges. "They put a sensor on hand-washing sinks," Tomlinson said. "They were able to see which doctors weren't spending enough time at the hand-washing station and link that back to the spread of diseases."

They put a sensor on hand-washing sinks. They were able to see which doctors weren't spending enough time at the hand-washing station.

Danielle Tomlinson,
senior director of global education, Hortonworks

Solving big data problems in real time

Of course, the great thing about Hadoop is its ability to process huge volumes of data in real time. How can a software developer sitting in the depths of a data center know what traffic is like on the freeway? Some cities simply track the GPS locations of the millions of cell phone users, crunch that data and, according to Tomlinson, build "real-time traffic reports based on how those phones are moving through traffic. The gathering of that big data and the processing of that big data is all done through the Hadoop platform."

Of course, big data is still a mystery to many enterprise Java professionals, but Hortonworks is determined to change that. For those interested in learning more about Hadoop, the Hortonworks Sandbox is a smart place to start, because it provides a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials. And it's all packaged up in a virtual environment that you can get up and running in 15 minutes.

For software architects who are looking for ways to solve big data problems, or for application developers who are looking to brush up on the latest technologies, the Hortonworks Sandbox is a great place to start. Just be warned: Hadoop can be addictive. Pretty soon, every problem looks like a big data problem, and you'll be wanting to solve each of those big data processing problems with Hadoop.

Which tools have you found most effective for learning Hadoop? Let us know.

Dig Deeper on Big data architecture

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.