Skip to main content

Test Isolation and Determinism

Write repeatable tests that pass consistently, independent of timing and state.

TL;DR

Flaky tests (sometimes pass, sometimes fail) erode trust and slow development. Isolation: each test runs independently; test A's failure doesn't affect test B. Determinism: same input always produces same output. Avoid: global state, time-dependent code, external services without mocking, non-deterministic ordering. Use fixtures for setup, mocks for dependencies, time travel libraries for deterministic time. If a test is flaky, fix it immediately—don't ignore or quarantine. Flakiness is a design problem, not a test problem.

Learning Objectives

  • Understand causes of flakiness and how to eliminate them
  • Design tests with clear setup/teardown (fixtures)
  • Mock external dependencies consistently
  • Control time in tests (don't rely on system clock)
  • Detect and fix flaky tests
  • Measure test reliability (re-run percentage, pass rate)
  • Apply test isolation patterns across multiple languages
  • Design fixtures for complex integration scenarios
  • Implement deterministic testing in distributed systems

Motivating Scenario

Tests pass locally but fail in CI. A test depends on execution order (test A must run before test B). Another test fails randomly because it checks the system clock (time.now()). A third test creates shared database state that interferes with other tests. Flakiness causes developers to ignore test failures ("it failed again, rerun it"), defeating the purpose of tests. In a microservices environment, flaky tests become worse: they block deployments, erode confidence, and mask real bugs. Your CI/CD pipeline becomes unreliable, and teams start to distrust test results—the worst outcome for code quality.

Core Concepts

Sources of Flakiness

SourceExampleFix
Shared stateTests modify global variableUse setup/teardown; reset state
Execution orderTest B depends on test A running firstUse independent fixtures
Time-dependentTest checks if time.now() < deadlineUse dependency injection; mock time
External servicesTest calls real API (sometimes flakes)Mock/stub API responses
RandomnessTest uses random IDs without seedingSeed RNG; use deterministic data
ThreadingRace condition in async codeUse deterministic threading; avoid timeouts
Network latencyTest assumes fast networkUse synchronous test doubles
Floating point== comparison on floatsUse approximate equality or decimal arithmetic
File systemTests write to shared temp directoryUse isolated temp directories per test
Database stateTests share connection poolUse transactions rolled back per test

Isolation Levels

Unit Test Isolation: Single function/method, no external dependencies. Each test is independent; can run in any order.

Integration Test Isolation: Multiple components, minimal external dependencies. Uses databases, caches in-memory. Clean up after each test.

End-to-End Test Isolation: Full system, external services. Harder to isolate; use test doubles (mocks, stubs) for external APIs.

Determinism Guarantees

Deterministic Input: Same input always produces same output. No randomness, no system clock.

Deterministic Execution: No race conditions, no non-deterministic ordering, no timeouts.

Deterministic Assertions: Assertions always pass or fail the same way. No floating-point comparisons, no time-dependent checks.

Practical Examples

Python: Fixture-Based Isolation

import pytest
import freezegun
from unittest.mock import Mock, patch, MagicMock
from datetime import datetime, timedelta

# ❌ FLAKY: Depends on time, shared state
class BadOrderService:
def __init__(self):
self.orders = {} # Shared state across tests!

def create_order(self, order_id, deadline):
self.orders[order_id] = {
"deadline": deadline
}
return self.orders[order_id]

def is_expired(self, order_id):
# Relies on system time - will fail at wrong times!
return datetime.now() > self.orders[order_id]["deadline"]

# FLAKY TEST: Will fail randomly depending on timing
def test_bad_order_creation():
service = BadOrderService()
deadline = datetime.now() + timedelta(seconds=2)
order = service.create_order("order-1", deadline)

# This sleep won't reliably work - system might lag
import time
time.sleep(2)

# Flaky: time.sleep might not be exactly 2 seconds
assert service.is_expired("order-1")

# ✅ RELIABLE: Deterministic, isolated with fixtures
class GoodOrderService:
def __init__(self, clock):
self.orders = {}
self.clock = clock # Injected clock for testing

def create_order(self, order_id, deadline):
self.orders[order_id] = {
"deadline": deadline,
"created_at": self.clock.now()
}
return self.orders[order_id]

def is_expired(self, order_id):
# Uses injected clock, not system time
return self.clock.now() > self.orders[order_id]["deadline"]

class TestClock:
def __init__(self, current_time):
self._current_time = current_time

def now(self):
return self._current_time

def advance(self, delta):
self._current_time += delta
return self._current_time

@pytest.fixture
def order_service():
"""Fixture: clean state before each test"""
clock = TestClock(datetime(2024, 1, 1, 12, 0, 0))
service = GoodOrderService(clock)
yield service
# Cleanup happens automatically

@pytest.fixture
def frozen_time():
"""Fixture: deterministic time"""
with freezegun.freeze_time("2024-01-01 12:00:00") as frozen:
yield frozen

def test_order_not_expired_before_deadline(order_service):
"""Deterministic test: no flakiness"""
deadline = datetime(2024, 1, 1, 12, 30, 0)
order = order_service.create_order("order-1", deadline)

# No sleep needed - test clock is deterministic
assert not order_service.is_expired("order-1")
assert order["created_at"] == datetime(2024, 1, 1, 12, 0, 0)

def test_order_expired_after_deadline(order_service):
"""Completely isolated from previous test"""
deadline = datetime(2024, 1, 1, 12, 10, 0)
order_service.create_order("order-2", deadline)

# Advance the clock deterministically
order_service.clock.advance(timedelta(minutes=11))

assert order_service.is_expired("order-2")

def test_multiple_orders_independent(order_service):
"""Tests with multiple orders - all isolated"""
order_service.create_order("order-A", datetime(2024, 1, 1, 12, 5, 0))
order_service.create_order("order-B", datetime(2024, 1, 1, 12, 15, 0))
order_service.create_order("order-C", datetime(2024, 1, 1, 12, 25, 0))

# Advance past first deadline
order_service.clock.advance(timedelta(minutes=10))

assert order_service.is_expired("order-A")
assert not order_service.is_expired("order-B")
assert not order_service.is_expired("order-C")

# Database isolation pattern
class TestDatabaseIsolation:
@pytest.fixture(autouse=True)
def db_transaction(self):
"""Auto-rollback database changes after each test"""
from contextlib import contextmanager

@contextmanager
def transaction():
db = get_test_database()
db.begin_transaction()
yield db
db.rollback_transaction() # Clean up!

yield transaction

def test_user_creation(self, db_transaction):
"""Users created in this test are rolled back"""
with db_transaction() as db:
user = User(email="test@example.com")
db.session.add(user)
db.session.commit()

result = db.session.query(User).filter_by(
email="test@example.com"
).first()
assert result is not None

# After test, transaction is rolled back
# Database returns to clean state

def test_different_user_creation(self, db_transaction):
"""This test doesn't see users from previous test"""
with db_transaction() as db:
users = db.session.query(User).all()
assert len(users) == 0 # Previous test's users are gone!

# Mock external services
class PaymentService:
def __init__(self, gateway):
self.gateway = gateway

def charge_order(self, order_id, amount):
# Calls external payment gateway
try:
response = self.gateway.charge(amount, order_id)
return response
except Exception as e:
raise PaymentError(f"Failed to charge: {e}")

def test_payment_success_with_mock():
"""Mock external service - no real API calls!"""
mock_gateway = Mock()
mock_gateway.charge.return_value = {
"status": "success",
"transaction_id": "txn-123"
}

service = PaymentService(mock_gateway)
result = service.charge_order("order-1", 99.99)

assert result["status"] == "success"
# Verify the mock was called correctly
mock_gateway.charge.assert_called_once_with(99.99, "order-1")

def test_payment_failure_with_mock():
"""Test error handling without real API"""
mock_gateway = Mock()
mock_gateway.charge.side_effect = ConnectionError("Gateway unreachable")

service = PaymentService(mock_gateway)

with pytest.raises(PaymentError) as exc_info:
service.charge_order("order-1", 99.99)

assert "Failed to charge" in str(exc_info.value)

# Randomness isolation
class OrderIDGenerator:
def __init__(self, rng):
self.rng = rng # Injected RNG

def generate_order_id(self):
random_part = self.rng.randint(1000000, 9999999)
return f"ORD-{random_part}"

def test_order_id_generation_deterministic():
"""Deterministic RNG - same sequence every time"""
import random
rng = random.Random(42) # Seed!

generator = OrderIDGenerator(rng)

# Same seed always produces same IDs
id1 = generator.generate_order_id()
id2 = generator.generate_order_id()

# Reset and verify
rng = random.Random(42)
generator = OrderIDGenerator(rng)

assert id1 == generator.generate_order_id()
assert id2 == generator.generate_order_id()

def test_order_id_uniqueness():
"""Test uniqueness property"""
import random
rng = random.Random(123)

generator = OrderIDGenerator(rng)
ids = set()

for _ in range(1000):
id = generator.generate_order_id()
ids.add(id)

# All IDs should be unique
assert len(ids) == 1000

Real-World Examples

E-Commerce Platform: Order Processing Tests

In a high-traffic e-commerce system, order tests need strict isolation and determinism:

  • Payment Processing: Mock payment gateways to avoid real charges during testing. Use seeded random order IDs to make transactions reproducible.
  • Inventory Management: Each test needs its own inventory state. Use fixtures to reset stock levels.
  • Time-Sensitive Discounts: Flash sales expire at specific times. Use test clocks to advance time without waiting.

Problem: Tests sometimes fail because two tests create orders with the same ID, causing key collisions. Solution: Use deterministic ID generation with seeded randomness. Each test uses a different seed.

Problem: Payment tests sometimes timeout waiting for the real payment gateway. Solution: Always mock external services. Keep integration tests separate with @integration tag.

Microservices: Service-to-Service Tests

When testing microservices communication:

  • Stub dependent services: Use test doubles (mocks, stubs) for other services. Don't call real services.
  • Use contract testing: Define expected request/response format. Both services test against the contract.
  • Isolated databases: Each service test uses its own test database. No shared data.

Problem: Service A test fails because Service B is down. Solution: Mock Service B responses. Use contract testing to ensure compatibility.

Common Mistakes and Pitfalls

Mistake 1: Relying on Test Execution Order

# ❌ WRONG: Test B depends on Test A
def test_a_create_user():
global current_user
current_user = User(email="test@example.com")
assert current_user is not None

def test_b_update_user():
# BUG: current_user not defined if tests run in reverse order!
current_user.name = "Updated"
assert current_user.name == "Updated"

# ✅ CORRECT: Each test is independent
def test_create_user():
user = User(email="test@example.com")
assert user is not None

def test_update_user():
# Fresh user, no dependency on other tests
user = User(email="test@example.com")
user.name = "Updated"
assert user.name == "Updated"

Mistake 2: Assertions on Floating-Point Numbers

# ❌ WRONG: Floating-point precision issues
def test_discount_calculation():
price = 99.99
discount = 0.1
result = price * (1 - discount)
assert result == 89.991 # Might fail due to precision!

# ✅ CORRECT: Use approximate equality
def test_discount_calculation():
price = 99.99
discount = 0.1
result = price * (1 - discount)
assert abs(result - 89.991) < 0.001 # Allow small epsilon

Mistake 3: Using time.sleep() in Tests

# ❌ WRONG: Fragile sleep-based tests
def test_cache_expiration():
cache.set("key", "value", ttl=1)
time.sleep(1.1) # Unreliable!
assert cache.get("key") is None

# ✅ CORRECT: Mock time or use test clocks
def test_cache_expiration(frozen_time):
with frozen_time.freeze_time("2024-01-01 12:00:00"):
cache.set("key", "value", ttl=1)
frozen_time.move_to("2024-01-01 12:00:01.1")
assert cache.get("key") is None

Mistake 4: Shared Database Connections

# ❌ WRONG: Tests share database connection
@pytest.fixture(scope="module")
def db():
return Database.connect() # Shared across all tests!

def test_user_a(db):
db.create_user("user-a@example.com")

def test_user_b(db):
# Sees user from test_user_a!
users = db.query("SELECT * FROM users")
assert len(users) == 2 # Flaky!

# ✅ CORRECT: Transaction rollback per test
@pytest.fixture
def db():
connection = Database.connect()
connection.begin_transaction()
yield connection
connection.rollback_transaction() # Clean up

Mistake 5: Non-Deterministic Randomness

# ❌ WRONG: Random behavior is unrepeatable
def test_shuffle_algorithm():
items = list(range(100))
random.shuffle(items)
# Different order every time!
assert items[0] == 42 # Flaky!

# ✅ CORRECT: Seed randomness
def test_shuffle_algorithm():
items = list(range(100))
random.seed(42)
random.shuffle(items)
# Same order every time
assert items[0] == 42 # Reliable

Production Considerations

Testing in Multi-Threaded/Async Code

Async and concurrent code make flakiness worse. Deterministic testing is even more critical:

  • Use deterministic schedulers: For Goroutines, use select carefully. For async/await, use FakeTimers.
  • Avoid real sleep/timers: Use mock clocks.
  • Test race conditions explicitly: Don't rely on timing; structure code to avoid races.

Testing Distributed Systems

Tests of distributed systems are inherently flaky (network delays, partial failures):

  • Use test containers: Spin up real services in Docker for integration tests.
  • Mock failure scenarios: Test network partitions, timeouts, service crashes.
  • Use chaos testing: Deliberately inject failures to test resilience.

Measuring Test Reliability

Track test reliability metrics:

  • Flakiness Rate: Percentage of tests that fail intermittently.
  • Re-run Success Rate: Does test pass on re-run? High rate = flaky.
  • Test Stability Index: 1.0 = 100% reliable, 0.9 = 1 in 10 failures.

Continuous Integration Pipeline

  • Fail on flaky tests: Don't allow flaky tests to merge.
  • Quarantine tests: Temporarily disable flaky tests while fixing.
  • Re-run before merge: Run tests multiple times to catch flakiness.
  • Monitor test metrics: Track flakiness over time.

Self-Check

  • Why are flaky tests worse than failing tests?
  • How do fixtures improve test isolation?
  • Why can't you rely on system time in tests?
  • What's a deterministic test vs. a flaky test?
  • How would you fix a test that depends on execution order?
  • How do you mock external services without making tests brittle?
  • What's the difference between a stub and a mock?
  • How do you test time-sensitive code?

Design Review Checklist

  • Tests independent (no setup/teardown dependencies)?
  • Shared state eliminated (databases, files, globals)?
  • Time mocked consistently (not system clock)?
  • External services mocked (not real APIs)?
  • Randomness seeded deterministically?
  • Test execution order doesn't matter?
  • Timeout values reasonable (not race conditions)?
  • Fixtures clear setup/teardown?
  • Tests re-runnable 100% pass rate?
  • Coverage metrics tracked?
  • Flakiness metrics monitored?
  • CI pipeline fails on flaky tests?
  • Mock objects verified for correct calls?
  • Database transactions rolled back per test?
  • Clock/time mocking tested in isolation?

Next Steps

  1. Run tests multiple times — Identify flaky tests
  2. Root cause analysis — Global state? Time? Randomness? Ordering?
  3. Fix flakiness — Remove global state, mock time, use fixtures
  4. Measure reliability — Track pass rate, flakiness over time
  5. Enforce isolation — Code review, linting, standards
  6. Monitor CI — Alert on test failures, quarantine flaky tests

References