Skip to main content

Event-Driven Architecture (EDA)

Components react to and emit events, enabling asynchronous, decoupled communication

TL;DR

Event-driven architecture uses asynchronous events as the primary communication mechanism. Components emit domain events (OrderCreated, PaymentProcessed) to an event broker (Kafka, RabbitMQ), and other components react by subscribing. Enables loose coupling, scalability, and natural representation of business workflows. Complexity trade-off: eventual consistency, distributed tracing, and event ordering.

Learning Objectives

  • Understand event-driven patterns: mediator vs broker topology
  • Design domain events and event streams
  • Handle eventual consistency and event ordering challenges
  • Implement event processors and subscribers
  • Recognize when to use asynchronous vs synchronous communication

Motivating Scenario

Your e-commerce platform processes orders. When an order is created, multiple things must happen: charge payment, reserve inventory, send confirmation email, update analytics. In synchronous architecture, OrderService calls PaymentService, InventoryService, NotificationService sequentially. If one fails, the whole thing fails. In event-driven architecture, OrderService publishes "OrderCreated" event. Payment, Inventory, and Notification services independently subscribe and react. If Notification fails, Order and Payment still succeed.

Core Concepts

Event-driven architecture revolves around events and event handling:

Event: Immutable record of something that happened. Example: OrderCreated(id=123, user=456, total=99.99, timestamp=...).

Event Broker: Central pub-sub system (Kafka, RabbitMQ, AWS SNS) that routes events to subscribers.

Event Processor: Component that reacts to events and potentially emits new events.

Event-driven architecture with broker topology

Two Main Topologies

Broker Topology: Central event broker (Kafka, RabbitMQ). Components publish to broker; broker routes to subscribers. Loose coupling, but broker is central point of failure.

Mediator Topology: Central mediator orchestrates event flow. Good for simple workflows but can become bottleneck. Less common than broker.

Mediator vs Broker Comparison

Practical Example

# events.py - Define domain events
from dataclasses import dataclass
from datetime import datetime
from typing import List

@dataclass
class OrderCreated:
"""Event: Order has been created."""
order_id: int
user_id: int
total: float
items: List[dict]
timestamp: datetime

@dataclass
class PaymentProcessed:
"""Event: Payment has been processed."""
order_id: int
amount: float
status: str # 'success' or 'failed'
timestamp: datetime

@dataclass
class InventoryReserved:
"""Event: Inventory has been reserved."""
order_id: int
items: List[dict]
timestamp: datetime

@dataclass
class OrderConfirmed:
"""Event: Order confirmed (all steps done)."""
order_id: int
user_id: int
timestamp: datetime

When to Use / When Not to Use

Use Event-Driven Architecture When:
  1. Need loose coupling between services/components
  2. Business workflows involve multiple systems reacting to state changes
  3. High-throughput, real-time data processing (financial transactions, analytics)
  4. Need to maintain audit trail of what happened and when
  5. Can tolerate eventual consistency (payment takes a few seconds to appear)
  6. Scaling specific processors without scaling entire system
Avoid Event-Driven Architecture When:
  1. Need immediate, synchronous responses (user-facing forms)
  2. Events have complex ordering dependencies
  3. Transactions must be ACID across multiple services
  4. Team is unfamiliar with asynchronous programming and event semantics
  5. System is simple enough that synchronous calls work fine

Patterns and Pitfalls

Patterns and Pitfalls

Event published but never persisted. System crashes, event is gone. Use durable event broker (Kafka, not in-memory). Implement dead letter queues for failed processing.
Events arrive out of order. OrderCreated arrives before PaymentProcessed, causing logic errors. Design events to be order-independent. Use sequence numbers or timestamps. Single partition per user ID.
User creates order, page shows order as paid before payment actually processed. Be explicit about consistency guarantees. Update UI optimistically, then correct if payment fails.
Store the entire event stream as the source of truth, not just current state. Every state change is an event. Replay events to reconstruct state. Enables auditing and recovery.
Events that fail processing are routed to a separate queue for investigation. Implement retry logic with exponential backoff. Route persistent failures to DLQ.
Track a request through multiple event hops for debugging and tracing. Include correlation_id in every event. Log with correlation ID for easy troubleshooting.

Design Review Checklist

  • Are events immutable and capture what happened, not what to do?
  • Is the event broker production-ready (durable, replicated, monitored)?
  • Do event processors handle duplicate events gracefully (idempotent)?
  • Can you replay events from scratch to verify system behavior?
  • Is event schema versioning strategy defined (breaking changes)?
  • Do you have a dead letter queue for failed event processing?
  • Is the order of event processing well-defined (or irrelevant)?
  • Can you trace a request through the event stream (correlation IDs)?
  • Is eventual consistency acceptable for this use case?
  • Are event processors tested with realistic event scenarios?

Self-Check

  1. What's the main benefit of event-driven architecture? Loose coupling and ability to add new processors without modifying existing code. Sources don't need to know about processors.
  2. Why is eventual consistency a challenge? Users expect immediate feedback. Payment takes a few seconds to process; UI might show paid before payment actually succeeded.
  3. When would you choose synchronous communication over async events? User-facing operations that need immediate feedback (form submission, login).
info

One Takeaway: Event-driven architecture excels at decoupling and scalability, but introduces complexity around ordering, consistency, and debugging. Use it when you have naturally asynchronous workflows (order processing, data pipelines). Don't force it on simple, synchronous operations.

Next Steps

  • Event Sourcing: Store events as the system of record, not derived state
  • CQRS: Separate read and write models with events as the bridge
  • Message Broker Patterns: Kafka, RabbitMQ, AWS SNS/SQS trade-offs
  • Distributed Tracing: Track requests through event processors (OpenTelemetry)
  • Complex Event Processing: Correlate and aggregate events (Flink, Kafka Streams)

References

  • Richards, M., & Ford, N. (2020). Fundamentals of Software Architecture. O'Reilly. ↗️
  • Newman, S. (2015). Building Microservices. O'Reilly. ↗️
  • Humble, J., & Farley, D. (2010). Continuous Delivery. Addison-Wesley. ↗️
  • Fowler, M. (2017). Event Sourcing. martinfowler.com ↗️