Performance monitoring, theoretically at least, is all about being able to detect and diagnose an IT problem in the hopes that having that information at hand will enable IT teams to more quickly resolve the problem. While that's a noble goal, it's not nearly as helpful as being able to predict when an issue is going to arise -- long before it happens.
To achieve that goal, many IT organizations have been investing in predictive monitoring tools through which they can monitor and analyze various elements of the IT environment in a way that enables an IT team to intervene before any potential issue turns into a full-scale disruption in service.
The challenge, of course, is figuring out first what to measure and then how to do it. To make predictive monitoring really work, IT organizations need to have a keen understanding of the relationship between all the elements of the IT environment. This is especially important when it comes to understanding the cascading impact an issue with any given component may have on the rest of the IT environment.
To make predictive monitoring really work, IT organizations need to have a keen understanding of the relationship between all the elements of the IT environment.
Then, IT organizations need to figure out when to actually collect the data that informs the analytics. Collecting the same data at the same time every day is only going to expose an issue that occurs regularly at that moment. To be effective, predictive monitoring tools need to randomly collect data at multiple times to create a baseline against which the analytics can be accurately applied. It requires a deep understanding of the IT environment to know what to measure when. But once that's achieved, the return on that investment in terms of reducing IT operational costs is nothing less than profound.
In the not-too-distant future, machine learning algorithms will make it much simpler to employ predictive monitoring tools more effectively. Armed with a steady stream of data provided by both the IT infrastructure and the applications running on them, predictive monitoring tools will soon be making quantum leaps in terms of being able to accurately predict instances where and when one issue or another is about to disrupt an IT service.
In the meantime, the typical pattern -- where an IT organization spends most of their time chasing down alarms generated by performance monitoring tools -- is coming to an end. The trouble with those tools, of course, is that the signal-to-noise ratio is too high. The end result is an IT environment full of sound and fury that signifies nothing. Before anyone realizes it, there is a major IT disruption -- not because IT wasn't informed, but because "alarm fatigue" from using multiple performance monitoring tools has slowly blinded them from the building problem before the disruption occurred.
There will never be a perfect IT environment. Hardware will inevitably fail, and application code generated by humans will always be prone to errors. But the standard practice -- where IT teams responsible for various infrastructure and application silos inside the organization regularly come together to prove they are not responsible for one problem or another -- will soon be a thing of the past. In its place will be a weekly report, automatically produced, that lists all of the issues that, thanks to timely intervention, never became a major service issue in the first place.
What performance monitoring tools does your organization use? Let us know.