Sergey Nivens - Fotolia
At the Moogsoft AIOps Symposium in San Francisco, experts discussed the challenges and benefits of leveraging management information from complex infrastructure data to improve application performance. A key element of moving to DevOps and more Agile development practices involves taking advantage of application performance data running on live infrastructure. As organizations move more applications to the cloud, it becomes easier to aggregate this data, but making sense of it is another thing.
"There was an idea [that] we need to use algorithms to expand our alerting," said Thomas Duran, site reliability engineer at GoDaddy. Engineers had to go through every ticket and go through over 20 application monitoring systems to determine what they needed to work on. He said, "It is important to scale with automation and not brute force."
Legacy IT infrastructure predominantly used Simple Network Management Protocol (SNMP), which provided a consistent, albeit limited, view of the performance of computational hardware. Modern cloud infrastructure has replaced this simplicity with REST- and JSON-based infrastructure management APIs with subtle nuances between them.
Making sense of cloud alerts
To address this gap, a new generation of big data and machine-assisted analytics technologies that Gartner calls algorithmic IT operations (AIOps) translate these disparate APIs into a lingua franca accessible by a wide variety of operations management and programming tools. AIOps vendors include Elastic, Evolven Software, Hewlett Packard Enterprise, IBM, Moogsoft Inc., Nyansa, Rocana, Splunk and Sumo Logic.
Richard Whiteheadchief evangelist, Moogsoft
Richard Whitehead, chief evangelist at Moogsoft, said, "With the rise of the cloud, there has been a shift away from SNMP toward unstructured messages. Natural Language Processing techniques make sense of the messages in the way a human would. AIOps is the notion of using machine learning algorithms to reduce dependencies on specific management tools."
Algorithms can use fuzzy matching to be even more robust. Furthermore, they can group multiple messages related to one fault or programming bug. For example, a programming error that creates database performance issues could result in hundreds of separate messages, which humans would otherwise have to parse. Machine learning makes it easier to aggregate these collections of messages so that operations teams and developers can focus on the root cause of a problem.
Treating performance issues as bugs
The flexibility of using unstructured messages for infrastructure reporting provides new power to understand the root cause of problems. These payloads are almost always structured in a format like JSON, which makes them machine-readable, but there are not any standard formats for these. In some respects, these AIOps tools provide an integration platform for management and log information akin to API gateways for traditional app integration.
Whitehead said, "In the old world, enterprises had data which was fixed and went into rule systems. Now, the data can change. When the data changes, algorithms can adapt the management data to provide a consistent result." Improvements in machine learning and integration with bug tracking and issue-tracking services like JIRA and ServiceNow can help to streamline the resolution process of performance-related bugs.
Developers are already working hard enough building new features. Better tools for automatically associating performance issues with code promise to reduce the burden on developers. Whitehead said, "If I am an app developer and I am going to be woken up at three, it better be something I can fix. Likewise, if it is something that impacts me and might be due to an Amazon instance, then I want an Amazon expert in on that call, too."
Is AIOps the next new thing in the world of intelligent analytics?
Is your database the performance bottleneck?
Here's how to improve the performance of virtual infrastructure