Skip to main content

Strong Consistency

TL;DR

Strong consistency (linearizability or strict serializability) ensures every read reflects the latest committed write globally. This simplifies application logic and prevents data anomalies but requires coordination overhead, reducing availability during network partitions and increasing latency. Use strong consistency only for critical invariants where stale reads are unacceptable.

Learning Objectives

After reading this article, you will be able to:

  • Define linearizability and strict serializability in plain language.
  • Identify which application invariants require strong consistency.
  • Explain the availability and latency trade-offs of strong consistency.
  • Design systems that apply strong consistency selectively (only where needed).
  • Distinguish between strong consistency and weaker models to choose the right fit.

Motivating Scenario

Your company operates a payment platform. A customer makes a purchase of $100, and the transaction succeeds. Immediately after, they check their account balance from a different device. If weak consistency were used, the balance might still show the pre-purchase amount on the second device, causing confusion and trust issues. With strong consistency, every read globally reflects the latest write: the balance updates atomically, and the second device immediately sees the deducted amount. This eliminates anomalies and keeps financial state predictable—critical for compliance and user confidence.

The cost: every write must coordinate with distributed replicas, introducing latency (often 50–500ms in wide-area systems). If your system is partitioned and cannot reach the primary, writes must be blocked to maintain the guarantee.

Core Concepts

Definition: Linearizability

Linearizability is a safety property for concurrent systems. It guarantees that all operations appear to execute in a single, global total order, and each operation takes effect instantaneously between its invocation and response. In plain English:

  1. Every read returns the result of the most recent write.
  2. Operations don't see out-of-order effects (no time travel).
  3. The order in which operations finish is the order in which they "really happened."

Example:

Timeline:
Client A writes X=1 → succeeds at t=100
Client B reads X → returns 1 at t=150 (sees A's write)
Client C writes X=2 → succeeds at t=200
Client B reads X → returns 2 at t=250 (sees C's write, not A's)

Definition: Strict Serializability

Strict serializability combines linearizability with serializability of transactions. It ensures:

  1. Multi-operation transactions execute atomically (all-or-nothing).
  2. Transactions are ordered globally by commit time.
  3. Each read in a transaction sees values committed before the transaction started.

This is the "holy grail" of consistency: the behavior users intuitively expect from a database.

Implementation Mechanisms

Primary-Backup Replication

One primary replica accepts all writes and coordinates with backups (synchronously or asynchronously). Reads happen from the primary, ensuring global freshness. On primary failure, a backup is elected. This is simple but introduces a bottleneck: all traffic flows through one node.

Quorum-Based Systems

Writes must be acknowledged by a majority (quorum) of replicas before succeeding. Reads also consult a quorum to find the latest write. Example: Paxos, Raft, Zookeeper. This is more resilient than primary-backup but slower (require multiple round-trips).

Consensus Protocols

Replicas coordinate via algorithms like Raft or Paxos to agree on a total order of operations. Every replica applies operations in the same order, guaranteeing all copies are identical. Latency depends on the number of replicas and network quality.

Atomic Clocks and TrueTime

Google's Spanner uses atomic clocks with bounded error to assign logical timestamps to transactions. Combined with Paxos across regions, this provides strict serializability with global clock semantics.

CAP and Trade-offs

The CAP Theorem states that in a distributed system, you can have at most two of:

  • Consistency (strong)
  • Availability (all requests succeed)
  • Partition tolerance (system survives network splits)

Strong consistency sacrifices availability: if the system is partitioned and cannot reach quorum, writes are rejected to maintain the guarantee. This is the right choice for financial systems but wrong for real-time feeds.

Decision flow: strong consistency is necessary only when invariants break under staleness.

Practical Example

Here's how to implement a strongly consistent counter using a replicated state machine (Raft-like consensus):

import time
from typing import Dict, List, Tuple
from enum import Enum

class ServerState(Enum):
FOLLOWER = 1
CANDIDATE = 2
LEADER = 3

class LogEntry:
def __init__(self, term: int, command: str, value: int):
self.term = term
self.command = command
self.value = value

class RaftServer:
"""
Simplified Raft implementation to demonstrate strong consistency.
All servers maintain a log; the leader applies commands and replicates them.
Reads must be served from the leader to ensure linearizability.
"""

def __init__(self, server_id: str, peers: List["RaftServer"]):
self.server_id = server_id
self.peers = peers

# Raft state
self.current_term = 0
self.voted_for = None
self.log: List[LogEntry] = []
self.state = ServerState.FOLLOWER

# Volatile state
self.commit_index = -1
self.last_applied = -1
self.state_machine = {"counter": 0}

# Leader state
self.next_index: Dict[str, int] = {}
self.match_index: Dict[str, int] = {}

def become_leader(self):
"""Become the leader and start replication."""
self.state = ServerState.LEADER
for peer in self.peers:
self.next_index[peer.server_id] = len(self.log)
self.match_index[peer.server_id] = -1
print(f"[{self.server_id}] Became leader in term {self.current_term}")

def append_entry(self, command: str, value: int) -> bool:
"""
Client write request. Only the leader can accept writes.
In real Raft, this would replicate to followers synchronously.
"""
if self.state != ServerState.LEADER:
print(f"[{self.server_id}] Not leader, rejecting write")
return False

# Append to log
new_entry = LogEntry(self.current_term, command, value)
self.log.append(new_entry)

print(f"[{self.server_id}] Appended log entry: {command}={value} at index {len(self.log) - 1}")

# In real Raft, we'd replicate to followers and await majority ACK
# For this demo, assume successful replication to all
self.match_index[self.server_id] = len(self.log) - 1

# Update commit index (simple version: all entries from the leader are committed)
old_commit = self.commit_index
self.commit_index = len(self.log) - 1

if old_commit != self.commit_index:
self.apply_committed_entries()

return True

def apply_committed_entries(self):
"""Apply all committed entries to the state machine."""
while self.last_applied < self.commit_index:
self.last_applied += 1
entry = self.log[self.last_applied]
if entry.command == "increment":
self.state_machine["counter"] += entry.value
elif entry.command == "set":
self.state_machine["counter"] = entry.value
print(f"[{self.server_id}] Applied entry {self.last_applied}: counter={self.state_machine['counter']}")

def read_counter(self) -> int:
"""
Read the counter value. Only the leader can serve reads
to guarantee linearizability (leader lease ensures staleness freedom).
"""
if self.state != ServerState.LEADER:
print(f"[{self.server_id}] Not leader, cannot serve linearizable read")
return None

# In real Raft, we'd check leader lease or issue a no-op to confirm leadership
return self.state_machine["counter"]

def replicate_to_peer(self, peer: "RaftServer"):
"""
Replicate log entries to a peer (simplified).
In real Raft, this respects next_index and match_index.
"""
if self.state != ServerState.LEADER:
return

# Send all entries this peer hasn't seen
for idx in range(self.next_index.get(peer.server_id, 0), len(self.log)):
entry = self.log[idx]
peer.replicate_entry(entry, idx, self.current_term)

def replicate_entry(self, entry: LogEntry, index: int, leader_term: int):
"""Receive replicated entry from leader."""
if leader_term >= self.current_term:
self.current_term = leader_term
self.state = ServerState.FOLLOWER

# Append to log if not already present
while len(self.log) <= index:
self.log.append(None)
if self.log[index] is None:
self.log[index] = entry
print(f"[{self.server_id}] Replicated entry at index {index}")

def get_state(self) -> str:
"""Return the current state."""
return f"State: {self.state.name}, Term: {self.current_term}, LogLen: {len(self.log)}, Counter: {self.state_machine['counter']}"

# Example usage: a cluster of 3 servers
def main():
print("=== Strong Consistency with Raft ===\n")

# Create servers (initially empty; will link peers)
servers = [RaftServer(f"Server{i}", []) for i in range(1, 4)]

# Link peers
for server in servers:
server.peers = [s for s in servers if s.server_id != server.server_id]

# Elect Server1 as leader
print("--- Election: Server1 becomes leader ---")
servers[0].become_leader()
servers[0].current_term = 1
for server in servers:
server.current_term = 1

# Client writes increment counter
print("\n--- Client writes increment=5 ---")
servers[0].append_entry("increment", 5)

# Replicate to followers
print("\n--- Replication to followers ---")
servers[0].replicate_to_peer(servers[1])
servers[0].replicate_to_peer(servers[2])

# Apply on followers (in real Raft, done by followers after receiving)
for server in servers[1:]:
if len(server.log) > 0:
entry = server.log[0]
server.state_machine["counter"] = entry.value

# Leader reads
print("\n--- Leader read ---")
val = servers[0].read_counter()
print(f"[Server1] Counter value: {val}")

# Another client write
print("\n--- Client writes increment=3 ---")
servers[0].append_entry("increment", 3)

# Replicate
print("\n--- Replication to followers ---")
servers[0].replicate_to_peer(servers[1])
servers[0].replicate_to_peer(servers[2])

# All servers update
for server in servers[1:]:
while len(server.log) <= servers[0].commit_index:
idx = len(server.log)
if idx < len(servers[0].log):
server.log.append(servers[0].log[idx])
# Apply
if len(server.log) > 1:
server.state_machine["counter"] = sum(e.value for e in server.log if e and e.command == "increment")

# Final state
print("\n--- Final state (all servers synchronized) ---")
for server in servers:
print(f"[{server.server_id}] {server.get_state()}")

if __name__ == "__main__":
main()

When to Use / When NOT to Use

Use Strong Consistency
  1. Financial transactions and money transfers (invariants on account balance)
  2. Global unique constraints (user IDs, order numbers, email addresses)
Avoid Strong Consistency
  1. Consensus-critical data (configuration, permissions, role assignments)
  2. Regulatory or audit requirements (banking, healthcare, legal holds)
  3. Small, read-heavy datasets where coordination cost is acceptable

Patterns & Pitfalls

Not all data requires strong consistency. Partition your system: critical data (user identity, financial state) uses strong consistency and quorum writes; non-critical data (analytics, cache) uses eventual consistency. This reduces coordination overhead while protecting invariants.
Strong consistency requires the leader to be authoritative. Use read leases: after a leader reads, it holds a lease guaranteeing no other leader is elected during that time. This avoids expensive quorum reads on every operation.
Coordinate multiple writes in a single consensus round. Instead of synchronizing every write individually, batch writes and commit them together. This amortizes coordination cost and improves throughput.
If all writes funnel through a single leader, you have a performance bottleneck. Mitigation: partition data by region or sharding key, so each shard has its own leader. Then use eventual consistency between shards.
During a network partition, if quorum is not reachable, writes block indefinitely. This is the necessary trade-off for strong consistency. Monitor partition detection and have failover procedures ready.
Strong consistency claims are easy to make, hard to verify. Use Jepsen, Chaos Mesh, or similar to inject faults (network delays, partitions, crashes) and verify invariants. Test with real workloads and fault scenarios.

Design Review Checklist

  • Have you identified which data truly requires strong consistency vs. eventual?
  • Is the coordination cost (latency, reduced availability) acceptable for your SLOs?
  • Are you using a battle-tested consensus protocol (Raft, Paxos) or building your own?
  • Do you have read-lease or leader-election mechanisms to reduce quorum reads?
  • Is data partitioned/sharded to avoid a single coordination bottleneck?
  • Have you tested availability under network partitions and leader failures?
  • Is there a clear failover plan if quorum is lost (e.g., manual intervention)?
  • Are you monitoring replication lag, quorum round-trip time, and write latency?
  • Have you tested invariants with Jepsen or similar fault-injection tools?
  • Is the cost of strong consistency clearly documented for the team?

Self-Check

Before adopting strong consistency, ask yourself:

  1. Invariant criticality: Will stale reads break my application's correctness or user trust? If yes, strong consistency may be necessary. If no, eventual consistency is simpler and faster.

  2. Scale and latency: Can my users tolerate 50–500ms coordination overhead per write? If your SLO is under 100ms, strong consistency is risky unless data volume is small and co-located.

  3. Partition tolerance: Does my deployment require partition tolerance (global regions, WAN), or is it single-region? Wide-area strong consistency is expensive; local systems can afford it more easily.

  4. Operational readiness: Do I have the operational expertise to debug and recover from consensus failures? Strong consistency requires sophisticated tooling and deep understanding.

Next Steps

  • Classify your data: Audit what data truly needs strong consistency (usually 10–20% of total data).
  • Choose a consensus library: Use a proven implementation (Raft, Paxos) rather than rolling your own.
  • Design read leases: Implement mechanisms to reduce quorum reads and improve read latency.
  • Partition data: Shard by region or sharding key so coordination is local, not global.
  • Monitor and test: Track replication lag, latency, availability; use Jepsen to verify invariants.
  • Document trade-offs: Make it explicit which data is strongly consistent and why.

References

  1. Jepsen: Consistency Models ↗️
  2. Martin Kleppmann: Consistency and Consensus ↗️
  3. Raft Consensus Protocol ↗️
  4. Google Spanner: TrueTime ↗️