Skip to main content

Serverless & Functions-as-a-Service

TL;DR

Serverless platforms (AWS Lambda, Google Cloud Functions, Azure Functions) execute event-driven workloads with zero infrastructure provisioning. Pay only for execution duration (millisecond-seconds). Auto-scaling is instantaneous from 0 to thousands of concurrent invocations. Idle time costs nothing. Trade cold-start latency (100ms-1s first invocation), statelessness (external stores), and execution time limits (15 minutes) for operational simplicity and cost efficiency on bursty, event-driven workloads. Not suitable for continuous, high-throughput workloads (containers cheaper).

Learning Objectives

  • Understand event-driven execution model and when to use FaaS
  • Design functions with minimal cold-start impact (provisioned concurrency, warm pools, code optimization)
  • Manage function state across invocations using external stores
  • Design idempotent, atomic operations (handle duplicate invocations)
  • Architect serverless workflows with orchestration, error handling, timeouts
  • Compare serverless vs containers vs managed services

Motivation: Scaling to Zero

Your e-commerce platform: quiet nights (minimal traffic), sales spikes during Black Friday. Options:

Traditional VMs: Reserve for peak (10x cost for 95% idle time). Or undersized and crash during spikes.

Containers: Minimum footprint still costs 24/7 (even when idle). HPA scales slower than traffic spikes.

Serverless: Pay $0 when idle. Scales to 10,000 concurrent invocations in milliseconds. During quiet night: $0. During spike: pay actual execution time. Cost savings: 90%.

Core Concepts

Event-Driven: Function triggered by event (API request, S3 upload, database change, scheduled task). No polling, no idle containers.

Cold Start: First invocation incurs latency (100ms-1s) for container initialization. Subsequent invocations in same container reuse warm state (~5-50ms). Provisioned concurrency keeps containers warm.

Stateless Execution: Each invocation starts with clean environment. State persisted in external services (DynamoDB, S3, RDS, cache). Enables horizontal scaling.

Managed Isolation: Each function runs in isolated container. Platform handles resource limits, security, monitoring.

Pricing Model: Invocations (per 1 million) + compute (GB-seconds). Example: 1M invocations × 512MB × 1s = ~$21/month (if you've heard "pay for what you use", this is it).

Serverless execution flow: events → platform → functions → results

Practical Examples

import json
import boto3
import os
from datetime import datetime
from uuid import uuid4

dynamodb = boto3.resource('dynamodb')
s3 = boto3.client('s3')
orders_table = dynamodb.Table(os.environ['ORDERS_TABLE'])

# Cold start: runs once per container
print("Lambda function loaded") # Warm containers skip this

def lambda_handler(event, context):
"""
Process order from API Gateway or SQS.
Returns JSON response.
"""
try:
# Extract order data
body = json.loads(event.get('body', '{}')) if 'body' in event else event
order_id = str(uuid4())
customer_id = body['customer_id']
items = body['items']
total = body['total']

# Idempotent write (prevent duplicates on retries)
orders_table.put_item(
Item={
'order_id': order_id,
'customer_id': customer_id,
'items': items,
'total': total,
'timestamp': datetime.utcnow().isoformat(),
'status': 'PENDING_PAYMENT'
},
ConditionExpression='attribute_not_exists(order_id)'
)

# Log to S3 for audit trail
s3.put_object(
Bucket=os.environ['AUDIT_BUCKET'],
Key=f'orders/{order_id}.json',
Body=json.dumps(body)
)

return {
'statusCode': 201,
'headers': {'Content-Type': 'application/json'},
'body': json.dumps({
'order_id': order_id,
'status': 'CREATED',
'message': 'Order created successfully'
})
}

except Exception as e:
print(f"ERROR: {str(e)}", exc_info=True)
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}

# Timeout context
def lambda_handler_with_timeout(event, context):
"""Example: handle timeout gracefully"""
remaining_ms = context.get_remaining_time_in_millis()

if remaining_ms < 10000:
# Only 10 seconds left, don't start long operation
return {'statusCode': 202, 'message': 'Processing in background'}

# Safe to do long operation
process_large_dataset()

Cold Start Mitigation Strategies

Problem: Cold start latency (100ms-1s) unacceptable for user-facing APIs

Solution 1: Provisioned Concurrency
- AWS Lambda: Provisioned Concurrency (keeps N containers warm)
- Cost: $0.40/hour per provisioned concurrency
- Example: 10 provisioned = $96/month
- Benefit: Eliminates cold starts for first 10 concurrent requests
- Downside: Additional cost

Solution 2: Scheduled Warmup
- Invoke function every 5 minutes (synthetic request)
- Cost: Low (just pays for invocations)
- Benefit: Keeps warm containers active
- Downside: Not guaranteed (platform may evict), adds latency variable

Solution 3: Optimize Code
- Move imports outside handler (one-time at cold start)
- Lazy-initialize heavy libs (on first use)
- Reduce package size (fewer MB to load)
- Use Lambda layers for common code
- Example: 3s cold start → 200ms by optimizing

Solution 4: Use Containers for Critical Paths
- Hybrid approach: serverless for async, containers for API
- Containers boot in 1-2s, predictable
- Serverless for event handlers (cold starts acceptable)

Example: E-commerce
✓ API Gateway → Container (fast, consistent)
✓ Order received → Lambda → Process async (cold start OK)
✓ Scheduled reports → Lambda → Run on schedule (cold start irrelevant)

Idempotency and Duplicate Handling

Serverless functions may be invoked twice (retries, timeouts). Design for idempotency:

# Non-idempotent (WRONG):
def lambda_handler(event, context):
order_id = event['order_id']
balance = db.get_balance(event['customer_id'])
db.update_balance(balance - 100) # Charged twice if retried!
return {'charged': True}

# Idempotent (RIGHT):
def lambda_handler(event, context):
order_id = event['order_id']
request_id = event['request_id'] # Unique per request

# Check if already processed
existing = db.query('SELECT * FROM charges WHERE request_id = ?', request_id)
if existing:
return {'already_charged': True}

# Atomic operation: charge only if request_id not exists
try:
db.execute(
'INSERT INTO charges (request_id, order_id, amount) VALUES (?, ?, ?)',
request_id, order_id, 100
)
return {'charged': True}
except DuplicateKeyError:
# Already charged, return same response
return {'already_charged': True}

Serverless vs Containers vs VMs

Serverless (Lambda)
  1. Pay-per-execution (no idle cost)
  2. Scales 0 → 10K in milliseconds
  3. Cold start latency (100ms-1s)
  4. Stateless (external state required)
  5. Max 15-minute execution time
  6. Event-driven (async workloads)
  7. Best for: Bursty, event-driven, unpredictable load
Containers (Kubernetes)
  1. Pay for reserved capacity (even idle)
  2. Scales slowly (seconds to minutes)
  3. Consistent latency (no cold starts)
  4. Stateful (local volumes possible)
  5. Unlimited execution time
  6. Long-lived processes
  7. Best for: High-throughput, steady load, complex apps
VMs (EC2)
  1. Hourly billing (expensive idle)
  2. Manual scaling (operator-driven)
  3. Boot time (30s-2m)
  4. Full OS (legacy app support)
  5. Unlimited execution time
  6. Persistent state (local disk)
  7. Best for: Legacy apps, long jobs, full control needed

Common Patterns and Pitfalls

Serverless Checklist

  • Is the workload event-driven (not continuous)?
  • Does execution time stay under 5 minutes (most cases)?
  • Is bursty scaling acceptable (not predictable load)?
  • Can state be stored externally (DynamoDB, S3)?
  • Is cold-start latency acceptable (or use provisioned concurrency)?
  • Are functions designed for idempotency?
  • Is error handling implemented (retries, DLQ)?
  • Are timeouts configured appropriately?
  • Is monitoring/logging in place (CloudWatch)?
  • Have you estimated costs (pay-per-execution)?

Self-Check

  1. What's a cold start, and when does it happen? First invocation in new container (~100ms-1s). Warm reuses (~5-50ms).
  2. Why design for idempotency? Functions may be invoked twice. Idempotency ensures same result.
  3. How do you keep containers warm? Provisioned concurrency or scheduled warmup invocations.
  4. What's the 15-minute limit? Max execution time per invocation. Design for shorter operations.
  5. When to use serverless vs containers? Serverless: bursty, event-driven. Containers: predictable, high-throughput.
info

One Takeaway: Serverless excels at event-driven workloads where scaling to zero saves costs. For user-facing APIs, use provisioned concurrency to hide cold starts. For background jobs, accept cold-start latency.

Next Steps

  • Set up monitoring/alarms (CloudWatch, X-Ray)
  • Implement structured logging (JSON logs for easier debugging)
  • Design error handling (DLQ, retry logic)
  • Learn Step Functions for complex workflows
  • Estimate costs using AWS calculator

References

Advanced Patterns and Production Considerations

Cost Optimization

Pricing model review:

  • Invocations: $0.20 per 1 million calls
  • Compute: $0.0000166667 per GB-second ($0.60 per GB-month)
  • Storage: Varies by service

Example cost calculation:

Scenario: Process 1M orders/month
- Function: 512MB memory, 1 second execution
- Invocations: 1,000,000 × $0.20/1M = $0.20
- Compute: 1,000,000 × 1s × (512/1024) GB = 500K GB-s
500K × $0.0000166667 = $8.33
- Total: ~$8.53/month (plus taxes, data transfer, etc.)

Same workload on containers:
- Single t3.medium (2 vCPU, 4GB): $30/month
- Reserve instance: $12-15/month (better deal if consistent)

Verdict: Serverless cheaper for <10K invocations/month
Containers cheaper for >50K invocations/month
Break-even around 20K invocations

Cost optimization strategies:

  1. Right-size memory: More memory = faster execution = lower compute cost
  2. Reduce invocations: Batch smaller requests
  3. Use provisioned concurrency sparingly (only critical paths)
  4. Cache results: Don't recompute
  5. Optimize cold starts: Less initialization = lower cost
  6. Monitor CloudWatch costs: Logs and metrics also add up

Error Handling and Resilience

Dead-Letter Queues (DLQ):

Normal Flow:
Event → SQS → Lambda → DynamoDB ✓

Error Case:
Event → SQS → Lambda → Error! → DLQ (for manual review)

Alerts: Ops team notified of failures
Recovery: Replay from DLQ once fixed

Retry policies:

MaximumEventAge: 3600  # Don't retry older than 1 hour
MaximumRetryAttempts: 2 # Retry up to 2 times

Timeout handling:

def lambda_handler(event, context):
remaining_ms = context.get_remaining_time_in_millis()

if remaining_ms < 5000:
# Less than 5 seconds left, don't start work
# Return gracefully, let system retry
return {'statusCode': 202, 'message': 'In progress'}

# Safe to do work
process_order()
return {'statusCode': 200}

Monitoring and Observability

Key metrics:

  • Invocation count: How many times called?
  • Duration: How long per invocation?
  • Error rate: What % fail?
  • Throttling: Hitting concurrency limits?
  • Cold starts: How often?

CloudWatch alarms:

Alert if:
- Error rate > 1%
- Duration p99 > 10 seconds (expected 1s)
- Throttling events occur
- DLQ has messages

X-Ray tracing:

from aws_xray_sdk.core import xray_recorder

@xray_recorder.capture('database_query')
def query_order(order_id):
# X-Ray tracks timing, errors, etc.
return dynamodb.get_item(...)

# Visualize: X-Ray service map shows which services called
# Performance timeline for each request

Hybrid Architectures

When to use which compute:

WorkloadBest OptionWhy
Real-time APIContainersLow latency, no cold start
Event processorServerlessCost-effective, scales with events
Scheduled jobServerlessRuns on schedule, pay for runtime
Long-runningContainersUnlimited execution time
Batch processingContainers or Spot VMsCost-optimized for large volume
Spiky trafficServerlessAuto-scales instantly
Predictable loadContainersReserved instances cheaper

Example e-commerce:

API Gateway → Containers (consistent latency)
├─ Orders Service (steady traffic, need low latency)
└─ User Service (steady traffic)

SNS → Lambda (event-driven)
├─ Process Payment (async, bursty)
├─ Send Notifications (bursty)
└─ Update Analytics (bursty)

Scheduled → Lambda
├─ Cleanup old sessions (daily)
└─ Generate reports (nightly)

Cost: Containers handle steady load efficiently
Serverless handles bursty events cheaply
Hybrid = best of both worlds