Skip to main content

Logs and Structured Logging

Collect and analyze logs with structure and correlation IDs to understand system behavior.

TL;DR

Logs are your first window into what happened. Unstructured logs are hard to parse and analyze at scale. Use structured logging: emit logs as JSON or key-value pairs so log aggregation systems can parse and query them. Include correlation IDs—unique identifiers that follow requests across services—so you can reconstruct the full request flow across service boundaries. Log at appropriate levels: DEBUG for development, INFO for important events, WARN for unusual but non-fatal conditions, ERROR for failures. Ship logs to a centralized system (ELK, Datadog, etc.). This enables searching, aggregating, and alerting across all services simultaneously.

Learning Objectives

  • Design structured log formats for programmatic parsing
  • Implement correlation IDs for request tracing
  • Choose appropriate log levels
  • Ship logs to centralized aggregation systems
  • Query and analyze logs across services
  • Set up alerts based on log patterns

Motivating Scenario

A user reports that their order didn't ship. The order service logs "Order created", the inventory service logs "Reserved", and the shipping service logs nothing. Where did the request fail? With correlation IDs, you can search all logs for a single trace ID and see the complete request flow: which services handled it, which succeeded, which failed, and why.

Core Concepts

Structured Logging

Emit logs as JSON or key-value pairs instead of free-form text. Structured logs enable log aggregation systems to parse, index, and query logs programmatically. A query like "show all orders from region=US with status=failed" becomes possible.

Correlation IDs

Assign a unique ID to each user request. Pass this ID through all service-to-service calls. Log the correlation ID in every log entry. When debugging, search for the correlation ID and get the complete request journey across all services.

Log Levels

DEBUG: verbose information for development. INFO: important events (user logged in, order created). WARN: unusual but non-fatal (high latency, missing optional data). ERROR: failures (invalid input, service unavailable). Use appropriate levels so you can filter logs by severity.

Centralized Log Aggregation

Collect logs from all services into a central system. This enables cross-service analysis, retention policies, and alerting. Tools like ELK (Elasticsearch, Logstash, Kibana) or Datadog make this accessible.

Practical Example

# ❌ POOR - Unstructured logging
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def create_order(user_id, items):
logger.info(f"Creating order for user {user_id}")
# ... later ...
logger.info(f"Order {order_id} created successfully")
# Can't easily search or filter these logs

# ✅ EXCELLENT - Structured logging with correlation IDs
import json
import logging
import uuid
from contextvars import ContextVar

# Correlation ID in context
correlation_id_var: ContextVar[str] = ContextVar('correlation_id')

class StructuredFormatter(logging.Formatter):
def format(self, record):
log_entry = {
'timestamp': self.formatTime(record),
'level': record.levelname,
'logger': record.name,
'message': record.getMessage(),
'correlation_id': correlation_id_var.get(None),
}
# Add extra fields from record
if hasattr(record, 'extra_fields'):
log_entry.update(record.extra_fields)

return json.dumps(log_entry)

# Configure logger
handler = logging.StreamHandler()
handler.setFormatter(StructuredFormatter())
logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

def set_correlation_id(request_id=None):
"""Set correlation ID for this request context"""
correlation_id = request_id or str(uuid.uuid4())
correlation_id_var.set(correlation_id)
return correlation_id

def create_order(user_id, items):
correlation_id = set_correlation_id()

logger.info("Order creation started", extra={
'extra_fields': {
'user_id': user_id,
'item_count': len(items),
'event': 'order_creation_started'
}
})

order = Order(user_id=user_id, items=items)
db.insert(order)

logger.info("Order creation completed", extra={
'extra_fields': {
'user_id': user_id,
'order_id': order.id,
'total': order.total,
'event': 'order_creation_completed'
}
})

return order

# Usage in request handler
def handle_order_request(request):
correlation_id = request.headers.get('X-Correlation-ID') or str(uuid.uuid4())
set_correlation_id(correlation_id)

try:
order = create_order(request.user_id, request.items)
return {'order_id': order.id}
except Exception as e:
logger.error("Order creation failed", extra={
'extra_fields': {
'error': str(e),
'error_type': type(e).__name__,
'user_id': request.user_id,
'event': 'order_creation_failed'
}
})
raise

When to Use / When Not to Use

When to Use Structured Logging
  1. Distributed systems with multiple services
  2. Systems requiring debugging across services
  3. High-volume logging where analysis is essential
  4. Regulatory/compliance systems requiring detailed audit trails
  5. Production systems where operational insight is critical
When Simple Logging is Fine
  1. Development and local testing
  2. Small applications with single service
  3. Scripts and one-off tools
  4. Systems where logs aren

Patterns and Pitfalls

Design Review Checklist

  • All logs are structured (JSON or key-value pairs, not free-form text)
  • Correlation IDs are included in every log entry
  • Correlation IDs are propagated across service boundaries
  • Appropriate log levels are used (DEBUG, INFO, WARN, ERROR)
  • Sensitive data (passwords, tokens) is never logged
  • Logs are shipped to centralized aggregation system
  • Log retention policies are defined and enforced

Self-Check

  • How would you trace a request across multiple services using logs?
  • What information should be included in a structured log entry?
  • How do you prevent log spam while maintaining visibility?
One Takeaway

Structured logging with correlation IDs transforms logs from write-only archives into queryable assets. Invest in this infrastructure early—debugging distributed systems without it is nearly impossible.

Next Steps

  • Implement structured logging in all services
  • Add correlation ID middleware to all entry points
  • Set up centralized log aggregation (ELK, Datadog, etc.)
  • Create dashboards and alerts based on log patterns

References

  • Google Cloud Logging: Structured Logging
  • ELK Stack Documentation
  • Datadog Logging Documentation