Does your Ops team care about the number of Exceptions thrown in the application – do they even monitor this number? Does your Test Team report the list of Exceptions thrown during a load test to engineering or are they just sending those that end up in a logfile? Is development interested in the Exceptions that are thrown within frameworks while executing their unit tests? Why should they care? Is there a real impact on performance that comes of a couple of exceptions?
Two years ago Alois Reitbauer wrote a nice article about The Cost of an Exception, which is typically hard to evaluate. After a recent deployment of a new version we saw that 30% of the CPU on our Application Server was consumed by creating Exception objects – these were Exceptions that never made it to a logfile – so nobody really cared until we identified it being a performance impact on the infrastructure and to the end user. The root cause is simple – but also not that easy to find if you don’t look at all Exceptions thrown and not just those that bubble up to the end user or as SEVERE messages into log files.
The big lesson learned was that Exceptions can have a severe impact on resource utilization as well as end user performance. After this discovery Ops, Test and Dev are now watching out for high Exception creation in order to ensure that code changes, configuration changes or deployment mistakes are detected before they impact the end user.
Symptom: High CPU Utilization on an Application Server
During a recent production load test that we ran against an updated version of our community site we noticed that the CPU was behaving differently on our Application Server compared to the previous tests. We ran this test outside of regular business hours in order to not impact the regular users on the production system. We expected that CPU utilization increased with increased load – but – comparing it to a previous production load test this was much higher than expected. The following screenshot shows the Process Health Dashboard of our Java Application Server (Tomcat) where the CPU displayed the unexpected behavior: