Skip to main content

API Gateway

Single entry point for microservices with routing, auth, and rate limiting.

TL;DR

API Gateway: Single entry point for all client requests. Routes requests to appropriate microservice. Handles: authentication (JWT, OAuth), rate limiting, request/response transformation, logging, compression. Benefits: clients don't know service URLs, can change internals, centralized auth, single place to enforce policies. Tradeoff: single point of failure (need HA, failover), added latency (but negligible). Popular: NGINX, Kong, AWS API Gateway, Envoy. Don't put business logic in gateway (should be in services).

Learning Objectives

  • Understand API gateway role in microservices
  • Compare gateway implementations
  • Implement routing rules
  • Add authentication (JWT, OAuth)
  • Implement rate limiting
  • Request/response transformation
  • Monitor gateway health
  • Avoid anti-patterns (business logic in gateway)
  • Design for high availability

Motivating Scenario

Clients call individual services directly: /auth-service/login, /order-service/orders, /payment-service/charge. Each service implements auth separately. Rate limiting in each service. Requests cross-origin (CORS headers). Fragmented. API Gateway: Single endpoint /api/login, /api/orders, /api/charge. All requests go through gateway. Gateway handles auth, rate limiting, CORS. Services simplified (no auth logic). Clients simplified (single URL).

Core Concepts

API Gateway Architecture

Client 1 ──┐
Client 2 ──┤
Client 3 ──┼──→ API Gateway ─┬→ Auth Service
Client 4 ──┘ ├→ Order Service
├→ Payment Service
└→ Shipping Service

Gateway Responsibilities

ResponsibilityPurposeExample
RoutingRoute to correct service/api/orders → order-service
AuthenticationValidate JWT, OAuth tokensVerify user identity
AuthorizationCheck permissionsUser can access /orders
Rate LimitingPrevent overload100 req/min per user
TransformationModify requests/responsesAdd headers, compress
LoggingTrack all requestsAudit trail
CachingCache responsesCache GET requests
Circuit BreakingFail gracefullyStop calling dead service

Gateway Patterns

PatternPurposeTradeoff
Backend for Frontend (BFF)Separate gateway per clientMore flexibility, more code
Single GatewayOne gateway for all clientsSimpler, single point of failure
Service MeshNo gateway, sidecars handle routingNo single point of failure, complex

Implementation

# Main load balancer / router
http {
upstream auth_service {
server auth:3000;
}

upstream order_service {
server order:3001;
}

upstream payment_service {
server payment:3002;
}

# Rate limiting
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
limit_req_zone $http_x_api_key zone=key_limit:10m rate=100r/s;

# Authentication
map $http_authorization $jwt_claim_sub {
default "";
~^Bearer\ (?<jwt>[\w-]+\.[\w-]+\.[\w-]+)$ $jwt;
}

server {
listen 8080;
server_name _;

# Logging
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;

# CORS headers
add_header 'Access-Control-Allow-Origin' '*' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, PUT, DELETE' always;
add_header 'Access-Control-Allow-Headers' 'Content-Type, Authorization' always;

# Health check endpoint
location /health {
access_log off;
return 200 "healthy\n";
}

# Auth endpoints
location ~ ^/api/auth/ {
limit_req zone=api_limit burst=20 nodelay;

proxy_pass http://auth_service;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}

# Order endpoints (requires authentication)
location ~ ^/api/orders/ {
# Require JWT token
if ($http_authorization = "") {
return 401 "Unauthorized";
}

limit_req zone=api_limit burst=10 nodelay;

proxy_pass http://order_service;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header Authorization $http_authorization;
proxy_set_header X-User-ID $jwt_claim_sub;
}

# Payment endpoints (requires API key)
location ~ ^/api/payment/ {
if ($http_x_api_key = "") {
return 403 "Forbidden";
}

limit_req zone=key_limit burst=5 nodelay;

proxy_pass http://payment_service;
proxy_set_header X-API-Key $http_x_api_key;
}

# Catch-all 404
location / {
return 404 "Not Found";
}
}
}

Real-World Examples

Scenario 1: Mobile + Web Clients

Mobile Client:
/api/mobile/orders → optimized response (less data)
/api/mobile/user → only essential fields

Web Client:
/api/web/orders → full response
/api/web/user → all fields

Backend services are identical
Gateway serves different responses per client

Scenario 2: Rate Limiting Hierarchy

Free tier:   100 req/min
Paid tier: 1000 req/min
Premium: 10000 req/min
Enterprise: Unlimited

Gateway enforces limits before routing

Scenario 3: Graceful Degradation

If order-service down:
Gateway returns 503 (Service Unavailable)
Client shown: "Checkout temporarily unavailable, try again"

If auth-service down:
Gateway can't authenticate new users
Existing sessions (cached) still work
Gateway returns 503 for new users

Common Mistakes

Mistake 1: Business Logic in Gateway

# ❌ WRONG: Business logic in gateway
@app.post("/api/orders")
async def create_order(item):
if item.quantity > 100:
item.discount = 0.2 # Business rule!
# ...

# ✓ CORRECT: Only routing/auth/rate-limiting
@app.post("/api/orders")
async def create_order(item, user = Depends(verify_jwt)):
return await proxy("/orders", "POST", "order", body=item)

Mistake 2: No Timeout/Retry

# ❌ WRONG: No timeout
response = await client.get(url)

# ✓ CORRECT: Timeout + retry
response = await client.get(url, timeout=10)
# Implement exponential backoff retry

Mistake 3: Single Point of Failure

# ❌ WRONG: Single gateway
Client → Gateway → Services

# ✓ CORRECT: HA gateway
Client → [Gateway-1] ─┐
[Gateway-2] ─→ Services
[Gateway-3]

Production Considerations

High Availability

  • Load balance gateway across 3+ instances
  • Health checks every 5 seconds
  • Automatic failover
  • No shared state (stateless gateways)

Monitoring

  • Request rate per client
  • Error rate by service
  • Latency p99 through gateway
  • Backend service health

Security

  • TLS/HTTPS for all traffic
  • Rate limit by IP + API key
  • JWT signature verification
  • CORS headers validation
  • Request size limits

Design Checklist

  • Single gateway or multiple (BFF)?
  • HA setup (multiple instances)?
  • Authentication mechanism (JWT, OAuth)?
  • Rate limiting per client?
  • Request/response transformation?
  • Timeout values configured?
  • Circuit breaker for dead services?
  • Comprehensive logging?
  • CORS headers correct?
  • No business logic in gateway?
  • Monitoring and alerting?
  • Runbook for gateway failure?

Next Steps

  1. Choose gateway (NGINX, Kong, AWS)
  2. Design routing rules
  3. Implement authentication
  4. Add rate limiting
  5. Configure monitoring
  6. Test failover scenarios
  7. Document client usage

References

API Gateway Patterns

Backend for Frontend (BFF)

Separate API gateways per client type:

Web Clients:
Client → API Gateway (Web) → Services
- Desktop-optimized responses
- WebSocket support
- Cache static assets

Mobile Clients:
Client → API Gateway (Mobile) → Services
- Bandwidth-optimized (smaller payloads)
- Offline-first responses
- Push notifications

Admin Clients:
Client → API Gateway (Admin) → Services
- Full data (no filtering)
- Extended rate limits for bulk operations
- Audit logging

Benefits:

  • Client-specific optimization
  • Independent scaling per gateway
  • Easier feature rollout (change mobile gateway without affecting web)

Rate Limiting Strategies

Token Bucket Algorithm:

class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity # Max tokens
self.tokens = capacity
self.refill_rate = refill_rate # Tokens/sec
self.last_refill = time.time()

def allow_request(self):
# Refill tokens
now = time.time()
elapsed = now - self.last_refill
self.tokens = min(
self.capacity,
self.tokens + elapsed * self.refill_rate
)
self.last_refill = now

if self.tokens >= 1:
self.tokens -= 1
return True
return False

# Usage: 100 req/min = 100/60 = 1.67 tokens/sec
bucket = TokenBucket(capacity=100, refill_rate=100/60)

for request in requests:
if bucket.allow_request():
process(request)
else:
return 429 # Too Many Requests

Leaky Bucket Algorithm:

class LeakyBucket:
def __init__(self, capacity, leak_rate):
self.capacity = capacity
self.current = 0
self.leak_rate = leak_rate
self.last_leak = time.time()

def add_request(self):
# Leak requests
now = time.time()
elapsed = now - self.last_leak
leaked = elapsed * self.leak_rate
self.current = max(0, self.current - leaked)
self.last_leak = now

if self.current < self.capacity:
self.current += 1
return True
return False

Request Transformation

Modify requests before routing to backend:

# Kong request transformer
plugins:
- name: request-transformer
config:
# Add headers
add:
headers:
- X-Service: checkout
- X-Forwarded-By: Kong

# Remove sensitive data
remove:
headers:
- Authorization # Don't forward to backend

# Rename query params
rename:
querystring:
- old_param:new_param

# Replace values
replace:
headers:
- Authorization: Bearer ${jwt_token}

Versioning Strategy

API versions:

/api/v1/orders  → Old clients (legacy)
/api/v2/orders → New clients (recommended)
/api/v3/orders → Future version

Gateway routes:
- /orders → /api/v2/orders (default)
- /api/v1/orders → /api/v1/orders (legacy)
- /api/v3/orders → /api/v3/orders (preview)

Upgrade path:

  1. Deploy v2 alongside v1 (6 months)
  2. Inform clients about deprecation (3 months)
  3. Remove v1 (3 months after deprecation)

Monitoring Gateway Health

Key metrics:

  • Request latency (p50, p95, p99)
  • Error rate by status code
  • Upstream service latency
  • Rate limit violations
  • Cache hit rate (if caching)

Alerts:

  • Error rate > 1%
  • Latency p99 > 1 second
  • Downstream service down

Conclusion

API Gateway is essential for microservices:

  • Centralized routing
  • Unified authentication
  • Rate limiting
  • Request transformation
  • Observability

Choose based on:

  • Scale: NGINX for high throughput, Kong for features
  • Language: AWS API Gateway for AWS-only, Envoy for Kubernetes
  • Features: Istio for service mesh, Kong for standalone

Common pitfalls:

  • Business logic in gateway (don't!)
  • Lack of timeout/retry (add them)
  • Single point of failure (run HA)

In production: Monitor latency (gateway adds ~1-5ms), error rates, cache hit rate.

Gateway Implementation Details

OAuth2 in API Gateway

# Validate JWT token
import jwt
from functools import wraps

def require_auth(f):
@wraps(f)
def decorated(*args, **kwargs):
token = request.headers.get('Authorization', '').split(' ')[1]
try:
payload = jwt.decode(token, PUBLIC_KEY, algorithms=['RS256'])
request.user = payload
except jwt.InvalidTokenError:
return {'error': 'Unauthorized'}, 401
return f(*args, **kwargs)
return decorated

@app.post('/api/orders')
@require_auth
def create_order():
user_id = request.user['sub']
# Process order for user
return {'order_id': '123'}

Request/Response Caching

Cache at gateway to reduce backend load:

from functools import lru_cache

@app.get('/api/products')
def get_products(category: str):
# Cache GET requests (safe to cache)
cache_key = f'products:{category}'
cached = cache.get(cache_key)
if cached:
return cached

result = proxy('/products', 'GET', 'product-service', params={'category': category})

# Cache 1 hour
cache.set(cache_key, result, ttl=3600)
return result

@app.post('/api/orders')
def create_order():
# Don't cache POST (non-idempotent)
return proxy('/orders', 'POST', 'order-service', body=request.json())