Troubleshooting response time problems – why you cannot trust your system metrics

Discussions

News: Troubleshooting response time problems – why you cannot trust your system metrics

  1. Production Monitoring is about ensuring the stability and health of our system, that also includes the application. A lot of times we encounter production systems that concentrate on System Monitoring, under the assumption that a stable system leads to stable and healthy applications. So let’s see what System Monitoring can tell us about our Application.


    Key points?

    Know your CPU metrics. But don't just know the numbers, know what those numbers mean. CPU metrics are one of the most commonly monitored stats, but they're also one of the most commonly misunderstood.

    Montior your memory. Memory is like gas in your car. It doesn't matter how much extra you have, all that matters is that you have enough. But many Linux systems will show close to 100% utilization even under small loads. Make sure you know why.

    And of course, there's the application and the database. There are so many pain points to monitor. Know what youre system is telling you, but more importantly, know what that information means.

    Read the rest of the article where I explain how to identify the impact on garbage collection and how to best avoid it.

    Threaded Messages (1)

  2. Metrics vs Meters[ Go to top ]

    Best explained by comparing the differences between metrics (system execution model) and meters/metering (software execution model).

    http://williamlouth.wordpress.com/2010/03/23/meters-versus-metrics/

    http://williamlouth.wordpress.com/2010/09/21/metrics-meters-and-metering/