Skip to main content

Metrics

Metrics are numbers that describe system behavior: requests processed, errors encountered, memory used, latency experienced. Unlike logs (which describe individual events), metrics aggregate and sample. A system processing 1 million requests generates millions of log entries but thousands of metric points.

Good metrics enable:

  • Capacity Planning: How much load can your system handle? When will you run out of resources?
  • Performance Analysis: Is latency degrading? Which endpoints are slow?
  • Cost Visibility: How does resource consumption map to business value?
  • Alerting: When metrics breach thresholds, wake up the on-call engineer
  • Dashboards: Visual overview of system health

This section covers metrics as a discipline: the golden signals that matter, methodologies for choosing what to measure, designing dashboards, and using metrics to drive decisions.