About the Performance of Map Reduce Jobs


News: About the Performance of Map Reduce Jobs

  1. About the Performance of Map Reduce Jobs (2 messages)

    One of the big topics in the BigData community is Map/Reduce. There are a lot of good blogs that explain what Map/Reduce does and how it works logically, so I won’t repeat it (look here, here and here for a few). Very few of them however explain the technical flow of things, which I at least need, to understand the performance implications. You can always through more hardware at a map reduce job to improve the overall time. I don’t like that as a general solution and many Map/Reduce programs can be optimized quite easily, if you know what too look for. And optimizing a large map/reduce jobs can be instantly translated into ROI!

    Read the full article

  2. Great article. 

    Should we conclude than that:

    1. Dynamic languages are slower than static languages and are not a good fit in process intensive applications.

    2. Always make sure to send as little data as possible across network.

    3. Avoid serialization and hitting the disk.

    All the above have always been the main performance issues and process intensive apps like MapReduce clearly demonstrate that. 
    The moral of the story is you can either invest in hardware like fiber optics and SSD storage to have the perfornace or invest in developers to optimize at the software level or even both.




  3. All of the above, but above anything else: Map/Reduce and Cloud level horizontal scaling does not absolve you from thinking about your code, what it does and its performance implications.