Skip to main content

Schema Evolution and Versioning

Evolve data schemas safely without breaking clients as systems change over time.

TL;DR

Systems evolve: fields are added, removed, renamed. With monoliths, you deploy code and schema together. With microservices, different services update at different times—old versions coexist with new versions. Design schemas for backward compatibility: accept new fields you don't recognize, provide defaults for missing optional fields. Version your APIs/events explicitly. Use feature flags to roll out schema changes gradually. Treat schema evolution as a deployment process: add new fields/columns first (all services ignore unknowns), then deploy code that uses them, then remove deprecated fields only after all clients are upgraded. This requires coordination but ensures zero-downtime deployments.

Learning Objectives

  • Design schemas that evolve without breaking clients
  • Implement backward and forward compatibility
  • Version APIs and events to manage compatibility
  • Deploy schema changes safely with feature flags
  • Handle field removal and renaming
  • Plan gradual deprecation of schema elements

Motivating Scenario

A service adds a required field "region" to orders. The schema changes: orders now require a region. But existing code doesn't provide regions. Old clients reading orders without region fail parsing. You need zero-downtime deployment: new code handles both with-and-without-region cases. How do you manage this across distributed services?

Core Concepts

Backward Compatibility

New code must understand old data. When you add a field, make it optional with a default. When you read old events without the field, use the default. This lets new code handle data created by old code.

Forward Compatibility

Old code must handle data from new code. When old code reads a message from new code with unknown fields, it should ignore them rather than crash. This lets old code tolerate upgrades.

Schema Versioning

Explicitly version schemas: message v1, v2, etc. Receivers check the version and handle accordingly. This is clearer than implicit compatibility assumptions.

Gradual Rollout

Don't deploy schema changes all at once. Add fields first (backward compatible), deploy new code that uses them, mark old fields deprecated, wait for clients to upgrade, then remove deprecated fields. This takes time but ensures safety.

Practical Example

# ❌ POOR - Breaking schema change
# Old schema
class Order:
def __init__(self, order_id, user_id, items):
self.order_id = order_id
self.user_id = user_id
self.items = items

# New schema adds required field
class Order:
def __init__(self, order_id, user_id, items, region): # region is required
self.order_id = order_id
self.user_id = user_id
self.items = items
self.region = region

# Old code creating orders without region breaks
# Old data in database doesn't have region field

# ✅ EXCELLENT - Backward compatible schema evolution
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class OrderV2:
order_id: str
user_id: str
items: list
region: Optional[str] = None # Optional with default
schema_version: int = 2

@classmethod
def from_dict(cls, data):
"""Parse from dict, handling both old and new formats"""
return cls(
order_id=data['order_id'],
user_id=data['user_id'],
items=data['items'],
region=data.get('region'), # Default to None if missing
schema_version=data.get('schema_version', 1)
)

def to_dict(self):
"""Serialize with version"""
return {
'order_id': self.order_id,
'user_id': self.user_id,
'items': self.items,
'region': self.region,
'schema_version': self.schema_version
}

# Gradual rollout process
def handle_order_creation(order_data):
"""Accept both old and new formats"""
# Version 1: without region
if 'region' not in order_data:
order_data['region'] = None # Default value

# Version 2: with region
order = OrderV2.from_dict(order_data)

# Use region if provided, otherwise use user's default region
region = order.region or get_default_region(order.user_id)

db.insert('orders', {
**order.to_dict(),
'region': region
})

# Event versioning
@dataclass
class OrderCreatedEvent:
event_type: str = "OrderCreated"
schema_version: int = 2
order_id: str = ""
user_id: str = ""
items: list = field(default_factory=list)
region: Optional[str] = None

@classmethod
def from_dict(cls, data):
"""Handle both v1 and v2 events"""
version = data.get('schema_version', 1)

if version == 1:
# Upcasting: v1 to v2
return cls(
order_id=data['order_id'],
user_id=data['user_id'],
items=data['items'],
region=None, # v1 doesn't have region
schema_version=2
)
else:
return cls(
order_id=data['order_id'],
user_id=data['user_id'],
items=data['items'],
region=data.get('region'),
schema_version=2
)

When to Use / When Not to Use

When to Prioritize Compatibility
  1. Large distributed systems with many independent services
  2. APIs consumed by external clients (can
  3. ,
  4. ,
  5. ,
When Strict Evolution Can Be Relaxed
  1. Monolithic applications (single deployment)
  2. Internal systems where all services upgrade together
  3. Green-field projects with full control over clients
  4. Systems with scheduled maintenance windows
  5. When backward compatibility has prohibitive costs

Patterns and Pitfalls

Design Review Checklist

  • New optional fields have sensible defaults
  • Code gracefully ignores unknown fields in messages
  • All messages include explicit schema version
  • Upcasting logic exists for handling older message versions
  • Deprecated fields are marked with timeline for removal
  • Feature flags control rollout of schema changes
  • Compatibility testing is part of CI/CD pipeline

Self-Check

  • How would you add a required field to an existing message in a distributed system?
  • What does forward compatibility mean and why is it important?
  • How do you handle field renaming without breaking clients?
One Takeaway

Schema evolution is an operational challenge in distributed systems. Design for compatibility first: make fields optional, version explicitly, and roll out changes gradually. This is slower but safer.

Next Steps

  • Add schema versioning to all APIs and events
  • Implement upcasting for evolving message formats
  • Set up feature flags for gradual rollout of schema changes
  • Build compatibility testing into CI/CD

References

  • Martin Kleppmann, Designing Data-Intensive Applications (O'Reilly)
  • Mike Amundsen, Designing Hypermedia APIs
  • Avro, Protocol Buffers, and JSON Schema documentation