The latest trends in the Hadoop project

In this podcast, Tim Hall of Hortonworks discusses the Apache Hadoop project and how it has changed since its inception.

There was a time, not too long ago, when taking the temperature of the Hadoop project and finding out the latest trends and advancements in the world of distributed computing was a relatively easy thing to do. Perhaps it's a drawback to the success of the project, but Hadoop doesn't mean Hadoop anymore.

At its core, we have MapReduce, YARN and the Hadoop Distributed File System, but the number of peripheral Apache projects that compliment Hadoop -- including Ambari, Chukwa, Avro, HBase and Mahout -- can make the task of getting on top of the Hadoop lowdown a challenge. In this podcast, TheServerSide spoke with Hortonworks' vice president of product management Tim Hall, to get right to the heart of the matter.

Hadoop now is a catch-all for a phrase that might be better described as Apache Hadoop core, HTFS, MapReduce and YARN, surrounded by an ecosystem of products.

Tim Hall, vice president of product management, Hortonworks

"Hadoop now is a catch-all for a phrase that might be better described as Apache Hadoop core, [Hadoop Distributed File System], MapReduce and YARN, surrounded by an ecosystem of products that ride on top of that. But that's a pretty long phrase to say, so people typically just say Hadoop," said Hall, sympathizing with the overwhelming challenge of describing the state of Hadoop. Hall provided insights on some of the moans and groans his team at Hortonworks is trying to address, but also detailed a big picture view on what the future holds for distributed computing and storage.

One of the big topics on the tip of Hall's tongue was optimizing Hadoop performance. Of course, it's always been possible for a top committer to the Apache Hadoop project to get the best performance possible out of an installation, but it's never been a process that could be described as simple. Hortonworks is trying to change that with the introduction of SmartSense and Smart Configuration for their Hortonworks Data Platform, a product that brings together more 20 different Apache projects and components under one offering. Other questions you can hear TheServerSide pitch to Hall include:

  • What are the key things that are changing in the world of Hadoop?
  • What areas of the Hadoop project are we seeing progress and improvement that is going to excite the community?
  • What are some of the interesting applications you are seeing Hadoop used for today that might not have been envisioned when the project was originally conceived?
  • What are some of the most obvious parameters that organizations misconfigure?
  • What are the easiest things to fix in a misbehaving Hadoop cluster?

It's a great interview, and it's only 15 minutes long, so it won't consume you're entire lunch break if you give it a listen. You'll find Hall's take on performance optimization and the evolution of Hadoop to be most interesting.

Next Steps:

Maturity blamed for reason Hadoop architecture is not widely adopted

Choosing between Hadoop clusters and data warehouses

Dig Deeper on Front-end, back-end and middle-tier frameworks

App Architecture
Software Quality
Cloud Computing