Storage Models
Understanding the landscape of modern databases and persistence technologies
The Storage Model Landscape
Storage models define how data is organized, stored, and retrieved. Each model makes different trade-offs between consistency, availability, partition tolerance, query flexibility, and performance.
Storage Models Overview
- Relational (RDBMS) - SQL databases with ACID guarantees, normalized schemas, foreign keys. PostgreSQL, MySQL, Oracle
- Key-Value Stores - Ultra-fast lookups by key, simple get/put/delete. Redis, Memcached, DynamoDB
- Document Stores - Flexible JSON/BSON documents, nested data structures. MongoDB, CouchDB, Firebase
- Wide-Column Stores - Massive scale column-oriented storage, sparse tables. HBase, Cassandra, BigTable
- Graph Databases - Optimized for relationships and traversals. Neo4j, ArangoDB, Neptune
- Time-Series Databases - Optimized for timestamped metrics and events. InfluxDB, Prometheus, TimescaleDB
- Search Engines - Full-text search, relevance ranking, analytics. Elasticsearch, Solr, Meilisearch
- In-Memory Systems - Sub-millisecond access, distributed caches, data grids. Redis, Memcached, Hazelcast
- Object Storage - Unstructured data, blobs, files at petabyte scale. S3, GCS, Azure Blob Storage
Choosing Your Storage Model
The right choice depends on your access patterns:
Relational (RDBMS)
- Structured data with clear schema
- Complex queries with JOINs
- Strong consistency requirements
- ACID transactions needed
- Financial, healthcare, business data
Key-Value Store
- Simple get/set access pattern
- Sub-millisecond latency required
- Session storage, caching
- Distributed high-throughput needs
- Real-time gaming, ads, recommendations
Document Store
- Flexible, evolving schemas
- Nested/hierarchical data
- Horizontal scaling important
- JSON/document-oriented
- User profiles, content management
Graph Database
- Highly connected relationships
- Path finding, recommendation
- Complex queries on relationships
- Social networks, knowledge graphs
- Fraud detection, access control
CAP Theorem Trade-Offs
All distributed systems must choose two of Consistency, Availability, Partition Tolerance:
| System | Preference | Best For |
|---|---|---|
| RDBMS | CA (Consistency + Availability) | Transactional systems, single datacenter |
| DynamoDB | AP (Availability + Partition) | Distributed, eventual consistency acceptable |
| Cassandra | AP (Availability + Partition) | High-scale, distributed, fault-tolerant |
| Elasticsearch | CA (Consistency + Availability) | Search, analytics, single cluster |
Polyglot Persistence Pattern
Modern applications use multiple storage models for different purposes:
┌─────────────┬──────────────┬─────────────────┐
│ Write │ Cache │ Read │
│ (RDBMS) │ (Redis) │ (Elasticsearch)│
└─────────────┴──────────────┴─────────────────┘
↓ Async Pipeline ↓
┌─────────────────────────────────────────────┐
│ Data Lake / Warehouse / Analytics │
│ (S3 + Athena/BigQuery) │
└─────────────────────────────────────────────┘
Performance Characteristics
| Model | Latency | Throughput | Consistency | Complexity |
|---|---|---|---|---|
| RDBMS | 1-10ms | 1K-10K ops/sec | Strong | High |
| Key-Value | <1ms | 100K-1M ops/sec | Eventual | Low |
| Document | 5-50ms | 10K-100K ops/sec | Eventual | Medium |
| Graph | 10-100ms | 1K-10K ops/sec | Strong | High |
| Time-Series | 1-5ms | 100K-1M ops/sec | Eventual | Medium |
| Search | 10-100ms | 10K-100K ops/sec | Eventual | High |
Next Steps
Explore each storage model in detail:
- Start with Relational (RDBMS) for foundational understanding
- Explore Key-Value Stores for caching and high-performance patterns
- Dive into Document Stores for flexible schema approaches
- Study Wide-Column Stores for massive-scale analytics
- Learn Graph Databases for relationship-heavy data
- Understand Time-Series Databases for metrics and monitoring
- Master Search Engines for full-text search and analytics
- Discover In-Memory Systems for low-latency requirements
- Implement Object Storage for unstructured data
Each model has detailed guides on use cases, trade-offs, implementation patterns, and operational considerations.