Website and application downtime can cost Beachbody LLC a minimum of $150,000 an hour, and the company experienced several performance-related website outages in 2015. So Michael Lee, vice president of technology at Beachbody, set out to replace the online fitness program provider's legacy, ad hoc application performance monitoring tools.
Beachbody's legacy "hodgepodge ... of manual correlation and triage" application performance monitoring (APM) tools didn't enable fast fixes, according to Lee. "We were seeing performance-related outages due to poor-performing SQL that we couldn't trap." Developers could not see an end-to-end view of performance. "That was unavailable in our existing Nagios install," he said.
The legacy application performance monitoring tools' infrastructure focus was not a good fit for Beachbody's complex application portfolio, Lee said. A new system must monitor and manage application and user activity closely. At the project's onset, the company ran 31 distinct applications in 17 different environments -- though not all apps ran in all 17; 95% were Java apps, 4% PHP and 1% Node.js. So, application- and user-focus were Lee's top requirements for a new APM system.
We were seeing performance-related outages ... that we couldn't trap.
vice president of technology, Beachbody
Excellent website and app performance is crucial for Beachbody, a Santa Monica, Calif., company that markets fitness gear, nutritional supplements and exercise program packages. Streaming video fitness classes are delivered on the Beachbody Live! website. If Beachbody Live! goes down, users don't get to exercise at their desired times.
Looking for new APM tools, Beachbody's software engineering team first evaluated internally maintained, open source APM options. Generally, he said, the open source options required more internal overhead than desired. They next explored AppDynamics' Application Intelligence Platform (AIP), largely because Lee and his team were familiar with other AppDynamics' APM tools.
Lee's team chose Application Intelligence Platform because of the APM suite's whole-environment reach and wide support for databases and data types. They also liked the enhanced features in the AIP Winter '16 Release, which included advanced analytics that boost troubleshooting effectiveness and strong mobile application user support.
This was a large, application performance monitoring tools rip-and-replace project, and it brought many challenges, according to Aram Gasparyan, a software developer at San Francisco-based AppDynamics who worked on it with Beachbody's DevOps team. For one thing, the project had to touch over 30 applications, as mentioned earlier, run in a data center environment of 1,100 nodes split between physical and virtual machines, running mostly in Red Hat Enterprise Linux, with some CentOS mixed in. Managing the data center are over 20 people working in Linux and network operations center teams. About 150 developers work in Beachbody's mix of Scrum and Kanban teams.
"There were more than a dozen teams in need of coordination, planning, scheduling and implementation," Gasparyan said.
The project requirement for manual installation also increased complexity. The project called for manually installing everything on all agents, Gasparyan recalled. At the time, Beachbody had 865 app agents, 543 machine agents and three database agents serving 270 database collectors. "We were unable to deploy all environments for a particular application in one working session," Gasparyan said.
That deployment-scheduling challenge fell to AppDynamics project manager Miguel Rodriguez, who managed project planning, scheduling and coordination in order to deploy the various environments through all software development lifecycle phases; that is, from development to test and quality assurance to user acceptance training and production, to name just a few. Rodriguez's skills in planning and scheduling played a big role in the project's on-time completion, Gasparyan said.
In the few months since the APM tool replacement project was completed, Beachbody has seen a large decrease in outages on its Beachbody Live! site. Also, the project has forged a bond between Lee's development and operations team. "We're building a metrics-driven culture, where operations and engineering are speaking the same language," Lee said.
Building collaboration is crucial, because Beachbody's DevOps team has plenty to do. "We'll be investing in PHP and Java-based microservices," said Lee, who's leading the company's digital technology makeover. Also, Gasparyan said Beachbody's work on the APM system continues, as business transactions are evaluated in order to create appropriate alerts for failures.
Which types of application downtime incidents have been hard for your team to resolve?
APM doesn't need to be hard
Testing mobile app performance
Red Hat's app performance with containers