Discussions

News: Why averages suck and percentiles are great

  1. Why averages suck and percentiles are great (5 messages)

    Anyone that ever monitored or analyzed an application uses or has used averages. They are simple to understand and calculate. We tend to ignore just how wrong the picture is that averages paint of the world. To emphasis the point let me give you a real world example outside of the performance space that I read recently in a newspaper.

    The article was explaining that the average salary in a certain region in Europe was 1900 Euro’s (to be clear this would be quite good in that region!). However when looking closer they found out that the majority, namely 9 out of 10 people, only earned around 1000 Euros and one would earn 10.000 (I over simplified this of course, but you get the idea). If you do the math you will see that the average of this is indeed 1900, but we can all agree that this does not represent the “average” salary as we would use the word in day to day live. So now let’s apply this thinking to application performance.

    Read the full Article

  2. Much of the argument can be addressed using "moving averages"

    http://en.wikipedia.org/wiki/Moving_average

    in combination with "random sampling"

    http://en.wikipedia.org/wiki/Random_sampling

    which we have used in assessing the accuracy and reliability of performance measurement, something I recently blogged about 

    http://www.jinspired.com/site/a-performance-measurement-of-a-canned-java-persistence-jpa-benchmark

    [I am guessing but] the reason for this Compuware/dynaTrace article is to put forward an argument against an automatic baselining technique based on averaging used by another vendor which I suspect is AppDynamics. AD has have promoted this "feature" to no end though in practice it still generates more noise than signals...and sometimes even silience.

    The problem with the title and article is there is no balance. Averaging is cheap. Cheap can be good when it allows you to collect more data at more collection points (methods) in the execution and not just at the request/transaction entry points which this article is focused exclusively on. Maintaining an accurate statistical distribution of each and every method instrumentation is far too costly, at least for those customers that care about overhead which might not be the target audience here. One approach that scales much better is quantization.

    http://www.jinspired.com/site/case-study-scala-compiler-part-7

    All said...I am curious what one does when there is a deviation detected in the percentile (or moving average). What would operations do to change this and when would they be sure (confident) it warrants intervention which is typically a kill and restart command sequence ;-). There are many variables and factors none of which will be answered with percentiles or baseling which can in some cases automatically smooth out growth. How can you truly distinguish unwanted variance when all you have timing.

  3. An average used as a measure of central tendency is mostly incorrect for many software measures. The measures can be static or dynamic (as mentioned in the post). Static measures that represent size/complexity (e.g., LOC, vg) follow power-law distributions where the upper tails have a disproportionate influence on the average. To understand size measures, applying a log normalization helps understand the distribution.

  4. We are talking about software performance which is meant to be under (adaptive) control by a monitoring and management agent. Seriously if outliers are creating such an impact you need to reconsider what it means to manage performance and now it is measured...maybe you are lumping two many execution paths under one category that is averaged...from experience I find it far more effective to have fine grain averages done much cheaper at many more "interesting" execution points and then to make such decisions across such a data set than just a single average or single distribution under one heading

    http://en.wikipedia.org/wiki/Statistical_process_control#Emphasis_on_early_detection

    ..such averaging (discarrding outliers...lessening impact via weighting based on prediction error rates) is useful in driving prediction and auto root cause detection routines which can then look at the signals that most likely are driving the variation

    http://www.jinspired.com/site/introducing-signals-the-next-big-thing-in-application-management

    In performance collection has a cost...so you must always balance with it accuracy/precision and (real)timeliness.

  5. but you are not exactly discounting the use of the mean (average) but instead saying that we should first look for a transformation which makes the distribution normal such as log10.

  6. Hi William,

    I of course agree that you should measure performance at a more fine granular level, which in fact as you know we do; we measure each transaction. I also agree that in a perfect world I would categorize transaction at a level where we don't have a high variability. In reallity that can never be achieved.

    In real world most transactions, especially user driven transactions, are heavily data dependent which ultimately leads to a certain volatility. There will always be those that are much slower than the norm. As Stephane put it they often follow a distribution patterns where the average is heavily influenced by the very fast or very slow ones.

    What I advocate in my post is not to use averages and that  percentiles show what really happens. The fact that averages are cheap is no real excuse, because that cost doesn't matter to the monitored application itself, only to the monitoring solution!

    Averages are easy to do for a monitoring solution, while doing percentiles or any form of statistic that understands the distribution is more complex and require some thought.

    As for operations. The importants for them is not to get overwhelmed with lots of useless alerts and false positives but at the same time not miss real problems. I will remeber to post about what operations will do when they identify a real violation.

    BTW I read your signals article, interesting stuff.