If we ever have had to describe Plumbr internals to an innocent bystander, we have always referred to "machine learning magic" taking place somewhere in our back-office labs. We decided it is time to reduce the magic surrounding the process, so we are describing how the leak detection algorithms were created and how we continue improving them

First, we started by picking variables we want to monitor to determine whether instances of a particular class leak. Based on our previous experience with leak detection, we might want to start by monitoring all the classes X using the following metrics:

  • A - number of Full GC runs the instances of X have survived
  • B - the number of classes from which the X is being referenced
  • C - % of instances of X relative to the all currently live instances

Now, if we are clever enough to gather this data from the JVM internals, we can move to the next logical steps in building a leak detection algorithm with the help of machine learning:

  • Gather enough sample data sets (in our case we already have got more than a million of those)
  • Annotate the data with the expert opinion whether a particular case represents a leak or not
  • Apply machine learning methods to create the best function to determine from the past knowledge whether a particular behaviour is symptomatic to a leak or not.

If you are interested in more depth how we accomplished it - check out the original blog post