Skip to main content

ML Feature Stores and Model Serving Architecture

Build scalable feature stores and model serving pipelines supporting online/offline feature computation, real-time inference, and batch processing at enterprise scale.

TL;DR

Feature stores decouple feature engineering from model training and serving by managing a centralized repository of computed features. Implement dual-layer architecture: offline store (batch features, 100k+ rows, lower freshness) and online store (sub-50ms features, <1s freshness, high QPS). Support batch serving (daily reports), real-time serving (P99 <100ms), and streaming features (event-driven). Use feature versioning, lineage tracking, and point-in-time correctness to prevent training-serving skew and enable reproducibility.

Learning Objectives

By the end of this article, you'll understand:

  • Online vs offline feature store trade-offs and architecture
  • Batch, real-time, and streaming model serving patterns
  • Feature computation pipelines and orchestration
  • Training-serving skew prevention techniques
  • Model registry design and versioning strategies
  • Feature freshness and consistency guarantees

Motivating Scenario

Your recommendation engine experiences 5% CTR drop week-over-week. Investigation reveals offline features used for training are 48 hours stale in production serving, while newer features with different distributions are unavailable in batch training. Separate feature calculations in training and serving pipelines cause divergence. A customer on day 1 with < 24 hours of history gets stale features compared to day 2 when fresh data arrives. You need a unified feature platform ensuring training and serving use identical feature definitions with known staleness bounds.

Core Concepts

Feature Store Architecture

Offline Store:

  • Batch-computed features (daily/hourly cadence)
  • Data warehouse (Snowflake, BigQuery, Redshift)
  • Large feature matrices (millions of rows)
  • Used for training, batch serving, analytics
  • Freshness: 1-24 hours typical
  • Lower QPS, higher latency acceptable

Online Store:

  • Real-time feature computation or cached features
  • Low-latency backends (Redis, DynamoDB, Cassandra)
  • Subset of features needed for inference
  • Serves with <50ms latency at high QPS
  • Freshness: < 1 minute (streaming) to daily (batch sync)
  • Point-in-time joins prevent training-serving skew

Model Serving Patterns

Batch Serving:

  • Compute predictions for entire user base daily/hourly
  • Output to data warehouse or messaging
  • High throughput, latency not critical
  • Use for recommendations, daily emails, dashboards
  • Efficient for large-scale scoring

Real-Time Serving:

  • Single-instance prediction on user request
  • Latency-critical (P99 <100ms, P95 <50ms)
  • Requires fast feature lookup from online store
  • Horizontal scaling with load balancer
  • Return predictions synchronously

Streaming Serving:

  • Predictions on streaming events (Kafka, Pubsub)
  • Process events at source with pre-computed features
  • Update predictions as features change
  • Used for fraud detection, real-time personalization

Feature Freshness and Consistency

Staleness Windows:

  • User features: tolerate 1-24 hours stale
  • Item/catalog features: tolerate minutes stale
  • Real-time features: < 1 second stale
  • Select freshness by use case, not uniformly

Point-in-Time Correctness:

  • Training uses features as they existed at label time
  • Serving uses latest features (or specify as-of time)
  • Join historical feature snapshots at exact timestamps
  • Prevents training-serving divergence from feature evolution

Feature Versioning:

  • Track feature computation logic changes
  • Maintain multiple versions in online store
  • Experiment with new features before rollout
  • Document lineage and data sources

Practical Example

import pandas as pd
from datetime import datetime, timedelta
from typing import Dict, List, Optional
import json

class FeatureStore:
"""Feature store with online/offline layers"""

def __init__(self):
self.offline_features = {} # Simulates data warehouse
self.online_features = {} # Simulates Redis/DynamoDB
self.feature_metadata = {}
self.feature_versions = {}

def register_feature(self, feature_name: str, computation_func,
freshness_seconds: int = 3600,
is_streaming: bool = False):
"""Register a feature with metadata"""
self.feature_metadata[feature_name] = {
'created_at': datetime.utcnow(),
'freshness_seconds': freshness_seconds,
'is_streaming': is_streaming,
'version': 1
}
self.feature_versions[feature_name] = computation_func
print(f"Registered feature: {feature_name} (v1)")

def compute_offline_features(self, feature_name: str,
user_ids: List[int]) -> pd.DataFrame:
"""Compute features for batch training/serving"""
print(f"Computing offline features: {feature_name}")

# Simulate batch feature computation
features = []
for user_id in user_ids:
value = self.feature_versions[feature_name](user_id)
features.append({
'user_id': user_id,
'feature_value': value,
'timestamp': datetime.utcnow()
})

return pd.DataFrame(features)

def compute_online_feature(self, feature_name: str,
user_id: int,
cached: bool = False) -> Dict:
"""Compute single feature for online serving"""
key = f"{feature_name}:{user_id}"

# Check online cache
if cached and key in self.online_features:
cached_data = self.online_features[key]
age_seconds = (datetime.utcnow() -
cached_data['cached_at']).total_seconds()

if age_seconds < self.feature_metadata[feature_name][
'freshness_seconds']:
print(f"Cache HIT: {key}")
return cached_data

# Cache miss - compute online
print(f"Cache MISS: {key} - computing")
value = self.feature_versions[feature_name](user_id)

result = {
'user_id': user_id,
'feature_value': value,
'timestamp': datetime.utcnow(),
'cached_at': datetime.utcnow()
}

# Update online store
self.online_features[key] = result
return result

def point_in_time_join(self, user_id: int,
target_timestamp: datetime,
features: List[str]) -> Dict:
"""Get features as they existed at target time (training)"""
print(f"Point-in-time join for user {user_id} at {target_timestamp}")

# In production: query offline store for historical snapshots
result = {}
for feature_name in features:
# Simulate lookup of feature value at specific time
value = self.feature_versions[feature_name](user_id)
result[feature_name] = {
'value': value,
'as_of_timestamp': target_timestamp
}

return result

def get_feature_for_inference(self, user_id: int,
features: List[str],
use_fresh: bool = True) -> Dict:
"""Get features for real-time inference"""
print(f"Fetching features for inference: {features}")

result = {}
for feature_name in features:
# Use online store for real-time features
feature_data = self.compute_online_feature(
feature_name,
user_id,
cached=not use_fresh
)
result[feature_name] = feature_data['feature_value']

return result

def sync_offline_to_online(self, feature_name: str,
batch_df: pd.DataFrame):
"""Sync batch features from offline to online store"""
print(f"Syncing {feature_name} to online store")

for _, row in batch_df.iterrows():
key = f"{feature_name}:{row['user_id']}"
self.online_features[key] = {
'user_id': row['user_id'],
'feature_value': row['feature_value'],
'timestamp': row['timestamp'],
'cached_at': datetime.utcnow()
}

def update_feature_version(self, feature_name: str,
new_computation_func):
"""Update feature computation logic"""
old_version = self.feature_metadata[feature_name]['version']
new_version = old_version + 1

self.feature_versions[feature_name] = new_computation_func
self.feature_metadata[feature_name]['version'] = new_version

print(f"Updated {feature_name}: v{old_version} -> v{new_version}")

def get_feature_stats(self, feature_name: str) -> Dict:
"""Get feature statistics"""
online_keys = [k for k in self.online_features.keys()
if k.startswith(feature_name)]

return {
'feature_name': feature_name,
'version': self.feature_metadata[feature_name]['version'],
'freshness_seconds': self.feature_metadata[feature_name][
'freshness_seconds'],
'online_entries': len(online_keys),
'last_computed': self.feature_metadata[feature_name]['created_at']
}

# Example usage
def main():
store = FeatureStore()

# Register features
def user_total_spend(user_id):
return 100.0 + user_id * 10 # Simulated

def user_days_active(user_id):
return 30 + user_id % 7 # Simulated

store.register_feature('total_spend', user_total_spend,
freshness_seconds=3600)
store.register_feature('days_active', user_days_active,
freshness_seconds=86400)

print("=== Batch Feature Computation ===")
user_ids = [1, 2, 3, 4, 5]
batch_df = store.compute_offline_features('total_spend', user_ids)
print(batch_df)
print()

print("=== Online Feature Serving ===")
features = store.get_feature_for_inference(
user_id=1,
features=['total_spend', 'days_active']
)
print(f"Features for inference: {features}\n")

print("=== Sync to Online Store ===")
store.sync_offline_to_online('total_spend', batch_df)
print()

print("=== Feature Version Update ===")
def user_total_spend_v2(user_id):
return 150.0 + user_id * 15 # Updated calculation

store.update_feature_version('total_spend', user_total_spend_v2)
print()

print("=== Point-in-Time Join (Training) ===")
training_features = store.point_in_time_join(
user_id=1,
target_timestamp=datetime.utcnow() - timedelta(days=1),
features=['total_spend', 'days_active']
)
print(f"Training features: {training_features}\n")

print("=== Feature Stats ===")
stats = store.get_feature_stats('total_spend')
print(json.dumps(stats, indent=2, default=str))

When to Use / When Not to Use

Implement Feature Store When:
  1. Multiple models share feature definitions (code reuse)
  2. Feature engineering is expensive (compute-intensive)
  3. Training-serving skew is causing production issues
  4. Real-time inference requires sub-100ms latency
  5. Need strict feature versioning and lineage tracking
  6. Many data scientists collaborating on features
Keep Features In-Pipeline When:
  1. Single model with unique features (no reuse)
  2. Simple feature engineering (linear transformations)
  3. Batch processing only (no real-time serving)
  4. Small team, low complexity (early stage)
  5. Features change frequently (low stability)
  6. Managed ML platform handles feature management

Patterns & Pitfalls

Design Review Checklist

  • Feature definitions consistent between training and serving
  • Point-in-time joins implemented for training data
  • Online store latency SLA met (P99 &lt;100ms for serving)
  • Offline-to-online sync tested and monitored
  • Feature freshness SLAs defined and tracked for each feature
  • Model registry stores features, data version, and hyperparameters
  • Feature versioning enables A/B testing new feature logic
  • Monitoring alerts on training-serving data distribution divergence
  • Feature computation orchestration automated (Airflow, Kubeflow)
  • Documentation explains feature definitions and expected staleness

Self-Check

Ask yourself:

  • Are my training and serving features identical in definition and freshness?
  • Can I pinpoint which features were used in which model version?
  • What's the acceptable staleness for each feature?
  • Do I test inference serving with the same feature versions as training?
  • Is feature computation deterministic and reproducible?

One Key Takeaway

info

Feature stores solve training-serving skew by centralizing feature definitions and management. The key is point-in-time correctness during training and strict synchronization between offline computation and online serving. Without this foundation, model performance inevitably diverges between training and production.

Next Steps

  1. Audit current features - Map all features used in training and serving
  2. Identify skew sources - Find where training/serving features diverge
  3. Implement offline store - Batch compute and warehouse features
  4. Build online store - Deploy low-latency feature lookup (Redis/DynamoDB)
  5. Synchronize layers - Automate offline-to-online sync
  6. Add point-in-time joins - Retrain with historical feature snapshots
  7. Establish monitoring - Track freshness, distribution drift, latency SLAs

References