Chatty Services Anti-Pattern
Services making excessive network calls for operations that could be done locally or batched.
TL;DR
Chatty services make excessive individual API calls instead of batching requests. A typical pattern: loop through 100 items, making one request per item (100 network hops). Each hop adds 10-50ms latency; total time becomes seconds instead of milliseconds. Fix: Design batch APIs, implement caching layers, and use prefetching strategies to reduce network chatiness.
Learning Objectives
- Identify chatty service interactions in your architecture
- Understand the performance cost of excessive network calls
- Design batch APIs and caching strategies
- Implement prefetching and CQRS patterns
- Measure and monitor network efficiency
Motivating Scenario
Imagine an order service that needs to enrich 50 orders with customer data and payment history from two separate services. A naive implementation loops through each order, fetching customer details (1 API call per order) and payment history (1 more API call per order). Result: 100 network calls, each 20ms = 2 seconds minimum. A user clicks "view my orders" and waits 2 seconds for data that could load in 100ms with proper batching. The service separation that was meant to improve scalability instead creates a performance bottleneck.
Core Concepts
The Cost of Network Calls
Every network call incurs overhead: serialization, network transmission, deserialization, routing. Even on a fast network (1ms latency), multiple calls dominate. Consider:
- 1 batched request for 100 items: ~20ms
- 100 individual requests: 100 × 20ms = 2000ms (100x slower)
This overhead is why microservices require different API design than monoliths. In a monolith, function calls are microseconds; in distributed systems, calls are milliseconds—a 1000x difference.
Common Chatty Patterns
Practical Example
- Chatty (Anti-Pattern)
- Batched (Better)
- Cached (Best)
# Order Service - BAD: Chatty calls
async def get_orders_with_details(customer_id):
orders = db.query("SELECT * FROM orders WHERE customer_id = ?", customer_id)
result = []
for order in orders:
# Call User Service 50 times
user = requests.get(f"http://user-service/users/{order.user_id}")
# Call Payment Service 50 times
payment = requests.get(f"http://payment-service/payments/{order.payment_id}")
result.append({
"order": order,
"user": user.json(),
"payment": payment.json()
})
return result # 100 HTTP calls for 50 orders
# Order Service - GOOD: Single batch calls
async def get_orders_with_details(customer_id):
orders = db.query("SELECT * FROM orders WHERE customer_id = ?", customer_id)
# Extract IDs for batch requests
user_ids = [o.user_id for o in orders]
payment_ids = [o.payment_id for o in orders]
# Single batch request to User Service
users = requests.post(
"http://user-service/batch",
json={"ids": user_ids}
).json()
# Single batch request to Payment Service
payments = requests.post(
"http://payment-service/batch",
json={"ids": payment_ids}
).json()
# Map results
user_map = {u["id"]: u for u in users}
payment_map = {p["id"]: p for p in payments}
return [{
"order": order,
"user": user_map[order.user_id],
"payment": payment_map[order.payment_id]
} for order in orders] # Only 2 HTTP calls
# Order Service - BEST: Cached data, no external calls
async def get_orders_with_details(customer_id):
orders = db.query("SELECT * FROM orders WHERE customer_id = ?", customer_id)
result = []
for order in orders:
# Get from local cache (Redis, in-memory)
user = cache.get(f"user:{order.user_id}")
payment = cache.get(f"payment:{order.payment_id}")
# Cache miss? Trigger async background fetch
if not user:
async_fetch_user(order.user_id)
if not payment:
async_fetch_payment(order.payment_id)
result.append({
"order": order,
"user": user or {"id": order.user_id}, # graceful degradation
"payment": payment or {"id": order.payment_id}
})
return result # Zero or very few external calls
When to Use / When to Avoid
- Loop through items, one API call per item
- No batching support in downstream services
- Each call waits for previous to complete
- Latency: O(n) where n = number of items
- Fragile: If downstream is slow, entire flow blocks
- Single API call for multiple items
- Downstream service processes all at once
- Parallel execution possible
- Latency: O(1) or O(log n)
- Resilient: One batch operation instead of many
Patterns & Pitfalls
Design Review Checklist
- Identified hot paths with excessive API calls?
- Downstream services support batch endpoints?
- Batch sizes reasonable (100-1000 typical)?
- Caching strategy defined for read-heavy operations?
- Cache invalidation events published on data changes?
- Prefetching implemented for predictable patterns?
- Fallback behavior defined if batch call fails?
- Monitoring in place to detect chatty patterns?
- Load tests show acceptable latency with real data volumes?
- Circuit breakers protect against cascading failures?
Self-Check
- What happens if you make 100 sequential API calls at 10ms each? 1000ms = 1 second minimum. With batching, could be 20-30ms total.
- When is caching appropriate? When data is read-heavy and changes are infrequent. Most user profiles, product catalogs, configuration—perfect candidates.
- How do you detect chatty interactions? Monitor API call counts per logical operation. If count > 10 for simple operation, likely chatty. Use APM tools.
- What's the difference between batching and caching? Batching groups many calls into one. Caching eliminates calls by storing data locally.
- How do you handle cache invalidation? Publish events when data changes. Consumers subscribe and invalidate. Or use TTLs and accept staleness.
Next Steps
- Audit current services — Measure API call counts for typical operations using APM tools
- Design batch endpoints — Add POST /batch endpoints that accept arrays of IDs
- Implement caching — Add Redis or in-memory cache for frequently accessed data
- Set cache TTLs — Start with 5 minutes; adjust based on data freshness requirements
- Monitor improvements — Measure latency reduction after optimizations
- Document patterns — Create team guidelines on when to batch vs. cache vs. fetch
Advanced Optimization Techniques
Technique 1: Request Collapsing
Merge multiple concurrent requests into one.
class RequestCollapser {
constructor(fetchFn, delayMs = 10) {
this.fetchFn = fetchFn;
this.delayMs = delayMs;
this.pending = null;
this.queue = [];
}
async fetch(...args) {
const promise = new Promise((resolve, reject) => {
this.queue.push({ args, resolve, reject });
});
if (!this.pending) {
this.pending = setTimeout(() => this._flush(), this.delayMs);
}
return promise;
}
async _flush() {
const queue = this.queue;
this.queue = [];
this.pending = null;
if (queue.length === 0) return;
const uniqueIds = [...new Set(queue.map(q => q.args[0]))];
try {
const results = await this.fetchFn(uniqueIds);
const resultMap = new Map(results.map(r => [r.id, r]));
for (const item of queue) {
const [id] = item.args;
item.resolve(resultMap.get(id));
}
} catch (error) {
for (const item of queue) {
item.reject(error);
}
}
}
}
// Usage
const userFetcher = new RequestCollapser(async (ids) => {
return userService.batch(ids); // Single batch call
});
// These 5 calls become 1 request
userFetcher.fetch(1);
userFetcher.fetch(2);
userFetcher.fetch(1); // Duplicate ID
userFetcher.fetch(3);
userFetcher.fetch(2);
Technique 2: GraphQL (Single Request, Multiple Resources)
Instead of N+1 API calls, one GraphQL query gets everything needed.
query GetOrderDetails($orderId: ID!) {
order(id: $orderId) {
id
total
customer {
id
name
email
}
items {
id
productId
quantity
}
payment {
status
method
}
}
}
Without GraphQL: 4 API calls (order, customer, items, payment). With GraphQL: 1 API call, backend resolves dependencies.
Technique 3: Async/Await with Parallel Execution
async def get_order_details(order_id):
order = await orders_db.get(order_id)
# Fetch multiple resources in parallel
customer, payment, inventory = await asyncio.gather(
customers_service.get(order.customer_id),
payments_service.get(order.payment_id),
inventory_service.check(order.item_ids)
)
return {
'order': order,
'customer': customer,
'payment': payment,
'inventory': inventory
}
# If sequential (bad): 3 waits = 60ms
# If parallel (good): max(20ms, 20ms, 20ms) = 20ms
Measuring Chatty Patterns
Metric 1: API Call Count Per Operation
def track_api_calls(operation_name):
initial_count = metrics.api_call_count
# Run operation
do_something()
final_count = metrics.api_call_count
calls = final_count - initial_count
print(f"{operation_name}: {calls} API calls")
if calls > 5:
print(f" WARNING: High call count for {operation_name}")
Metric 2: Latency Breakdown
Operation: GetOrderWithDetails
Total latency: 500ms
Breakdown:
- Database query: 50ms
- User service call: 100ms
- Payment service call: 150ms
- Inventory service call: 100ms
- Serialization: 100ms
Insight: 350ms (70%) spent in external calls
Optimization: Batch user/payment/inventory into 1 call
Expected latency: 50ms + 200ms + 100ms = 350ms
Self-Check
- What happens if you make 100 sequential API calls at 10ms each? 1000ms = 1 second minimum. With batching, could be 20-30ms total.
- When is caching appropriate? When data is read-heavy and changes are infrequent.
- How do you detect chatty interactions? Monitor API call counts per operation. If count greater than 10 for simple operation, likely chatty.
- What's the difference between batching and caching? Batching groups calls into one. Caching eliminates calls.
- How do you handle cache invalidation? Publish events when data changes. Consumers subscribe and invalidate, or use TTLs.
One Takeaway: Chatty services are a hidden performance killer. Each network call adds 10-100ms of latency. One endpoint calling ten others means 100-1000ms of extra latency. Fix with batching (reduce number of calls), caching (eliminate calls), or GraphQL (combine requests). Monitor API call counts religiously.
Next Steps
- Audit current services — Measure API call counts for typical operations
- Design batch endpoints — Add POST /batch endpoints
- Implement caching — Add Redis for frequently accessed data
- Use GraphQL — If complexity justifies it
- Monitor improvements — Measure latency reduction