Skip to main content

Relational Databases (RDBMS)

ACID-compliant data management with proven consistency and queryability

TL;DR

Relational databases (RDBMS) like PostgreSQL and MySQL provide strong ACID guarantees, flexible querying via SQL, and normalized schema design. Use them for transactional systems, financial data, and whenever consistency matters more than raw throughput. Trade-off: eventual consistency of NoSQL for guaranteed data integrity.

Learning Objectives

  • Understand ACID properties and when they matter
  • Design normalized schemas that prevent anomalies
  • Recognize scalability patterns and limitations
  • Choose between RDBMS and NoSQL for your use case

Motivating Scenario

You're building a banking application. A customer transfers $100 from account A to B. What happens if:

  • Database crashes after debit, before credit?
  • Two transfers happen simultaneously?
  • Network partition occurs mid-transaction?

RDBMS handles all these through ACID guarantees. NoSQL leaves you to handle these yourself.

Core Concepts

ACID Explained

Atomicity: All-or-nothing transactions. Transfer either completes fully or rolls back entirely.

Consistency: Database constraints maintained. Foreign keys, unique constraints, check constraints all enforced before commit.

Isolation: Concurrent transactions don't see partial results. Serialization, repeatable reads, or snapshot isolation depending on level.

Durability: Once committed, data survives server crashes, power loss. Typically via write-ahead logging to persistent storage.

Normalization vs Denormalization

Normalized (3NF+)
  1. Write-heavy, transactional
  2. Complex relationships
  3. Data consistency critical
  4. Frequent schema changes
  5. Moderate query volume
Denormalized
  1. Read-heavy, analytical
  2. Simple relationships
  3. Consistency can be eventual
  4. Stable schema
  5. High query volume

Practical Example

-- Normalized schema for e-commerce
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(id),
total_amount DECIMAL(10, 2) NOT NULL,
status VARCHAR(50) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
price_per_unit DECIMAL(10, 2) NOT NULL
);

-- ACID transaction: transfer funds
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
INSERT INTO transactions (from_id, to_id, amount) VALUES (1, 2, 100);
COMMIT;
-- All three operations succeed or all rollback

When to Use RDBMS / When Not to Use

Use RDBMS When
  1. Transactional consistency is critical
  2. Data has clear relational structure
  3. Complex multi-table queries needed
  4. ACID guarantees required (financial, healthcare)
  5. Schema is relatively stable
Use NoSQL When
  1. Eventual consistency acceptable
  2. Massive horizontal scale needed (>10K writes/sec)
  3. Schema frequently changes
  4. Simple key-based access patterns
  5. Unstructured or nested data

Patterns and Pitfalls

Design Review Checklist

  • Schema is normalized to at least 3NF
  • Primary keys on all tables
  • Foreign keys enforce referential integrity
  • Indexes on frequently queried columns
  • Connection pooling configured
  • Prepared statements used throughout
  • Transaction isolation level defined for use case
  • Backup and recovery strategy documented
  • Monitoring and alerting for slow queries
  • Read replicas planned for scaling
  • Data archival strategy for old data
  • PII encryption at rest and in transit

Deep Dive: Scaling RDBMS Beyond Single Node

While RDBMS databases weren't designed for massive horizontal scaling, several patterns enable them to handle large volumes:

Read Replicas and Replication

Read replicas distribute read traffic across secondary nodes, scaling read capacity without changing writes:

-- Primary handles all writes
INSERT INTO users VALUES (...)
UPDATE orders SET status = 'shipped' WHERE id = 123

-- Replicas handle reads
SELECT * FROM users WHERE email = 'alice@example.com' -- from replica-1
SELECT COUNT(*) FROM orders WHERE status = 'pending' -- from replica-2

Trade-offs:

  • Replication lag: Replicas lag primary by milliseconds to seconds; reads may see stale data
  • Write bottleneck: All writes still go to primary; doesn't scale writes
  • Failover complexity: Promoting replica to primary requires careful orchestration
  • Network overhead: Replicating every change to multiple nodes consumes bandwidth

Sharding (Horizontal Partitioning)

Distribute data across independent databases based on a shard key:

def get_user_by_id(user_id):
shard_id = user_id % NUM_SHARDS # Determine shard
db = get_database(shard_id) # Route to correct database
return db.query("SELECT * FROM users WHERE id = ?", user_id)

def create_order(user_id, items):
shard_id = user_id % NUM_SHARDS
db = get_database(shard_id)
db.execute("INSERT INTO orders VALUES (...)", user_id, items)

Trade-offs:

  • Scales writes: Each shard handles fraction of load
  • Cross-shard complexity: Queries joining multiple shards are expensive or impossible
  • Rebalancing: Adding shards requires redistributing data (expensive operation)
  • Hotspots: If shard key is poor (e.g., shard by country), some shards overloaded
  • Distributed transactions: ACID guarantees don't span shards; coordinating updates is complex

Vertical Partitioning

Split large tables into narrower schemas to reduce I/O:

-- Before: users table with 50 columns
SELECT id, name, email FROM users -- 50 columns scanned; slow

-- After: split into users (core) and user_profiles (extra fields)
CREATE TABLE users (
id PRIMARY KEY,
name, email, created_at
);

CREATE TABLE user_profiles (
user_id PRIMARY KEY,
bio, preferences, settings -- rarely accessed together
);

-- Query scans fewer columns; faster
SELECT id, name, email FROM users

Consistency Levels and Guarantees

RDBMS traditionally offered strong consistency: all readers see the same committed data. Different isolation levels trade consistency for concurrency:

Isolation Levels (Trade-offs)

LevelReadsPhantom ReadsDirty ReadsNon-repeatableBest For
READ UNCOMMITTEDFastestYesYesYesAnalytics on stale data
READ COMMITTEDFastYesNoYesDefault; most apps
REPEATABLE READSlowYesNoNoFinancial systems
SERIALIZABLESlowestNoNoNoCritical transactions
-- Transaction isolation in practice
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- No other transaction can interfere
SELECT balance FROM accounts WHERE id = 1; -- sees 1000
-- If another transaction modifies account 1, this transaction rolls back
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

Practical Tuning: Query Optimization

Most RDBMS performance issues stem from inefficient queries, not database design:

Using EXPLAIN Plans

EXPLAIN ANALYZE
SELECT o.id, o.created_at, COUNT(i.id) as item_count
FROM orders o
LEFT JOIN order_items i ON o.id = i.order_id
WHERE o.created_at > '2025-01-01'
GROUP BY o.id
ORDER BY o.created_at DESC;

-- Output shows:
-- Seq Scan on orders (cost 0..1000) -- BAD: full table scan
-- vs Index Scan on orders_created_at -- GOOD: uses index

Common Query Anti-patterns

# Anti-pattern 1: N+1 queries
orders = db.query("SELECT * FROM orders")
for order in orders:
items = db.query("SELECT * FROM order_items WHERE order_id = ?", order.id)
# 1 query for orders + N queries for items = slow

# Fixed: Join or eager load
orders = db.query("""
SELECT o.*, i.* FROM orders o
JOIN order_items i ON o.id = i.order_id
""")

# Anti-pattern 2: SELECT * instead of needed columns
SELECT * FROM users -- fetches all 50 columns; slow

# Fixed: name only needed columns
SELECT id, name FROM users

# Anti-pattern 3: Functions in WHERE disables indexes
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com' -- index ignored
# Fixed:
SELECT * FROM users WHERE email = 'alice@example.com' -- uses index

Modern RDBMS Features

Recent versions added features historically reserved for NoSQL:

JSON/Document Support

-- PostgreSQL: JSON columns
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT,
metadata JSON -- nested data without normalization
);

INSERT INTO users VALUES (1, 'Alice', '{"preferences":{"theme":"dark","language":"en"}}');

-- Query nested fields
SELECT metadata -> 'preferences' ->> 'theme' FROM users WHERE id = 1;

-- Index JSON paths
CREATE INDEX idx_user_theme ON users USING gin(metadata -> 'preferences');
-- PostgreSQL: full-text search
CREATE INDEX idx_posts_search ON posts USING gin(to_tsvector('english', body));

SELECT id, title FROM posts
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'database & performance')
ORDER BY ts_rank(...) DESC;

When RDBMS Hits Limits

RDBMS excels at transactional consistency but struggles with:

  1. Massive scale (> 10K writes/sec): Writes still go to single primary; replication can't keep up
  2. Geographically distributed: Replication across continents introduces latency; transactions slow
  3. Unstructured data: JSON/blob columns lose query power; NoSQL schema-less is better
  4. Real-time analytics on live data: OLTP databases designed for small queries; BI queries slow production
  5. Time-series data: Metrics, logs, traces are append-only; RDBMS not optimized for time-bucketed queries

Self-Check

  • What are the four ACID properties? When would breaking each one cause problems?
  • Why do most applications use RDBMS for transactional data despite NoSQL popularity?
  • What's the difference between strong and eventual consistency?
  • How do indexes improve query performance? What's the trade-off?
  • How would you scale RDBMS writes beyond a single primary?
  • What's the difference between read replicas and sharding?
  • When would you denormalize an RDBMS schema?
One Takeaway

RDBMS databases provide strong ACID guarantees and flexible querying at the cost of distributed scalability. Use them for transactional systems where consistency matters; use NoSQL when you need horizontal scale and can accept eventual consistency. Master replication, sharding, and query optimization to extend RDBMS far beyond single-node limits.

Next Steps

  • Explore Data Modeling & Access for schema design patterns
  • Learn Indexing Strategies for query optimization
  • Study Read Replicas and Replication for scaling RDBMS
  • Dive into Caching Patterns for high-performance RDBMS systems

References

  • PostgreSQL Official Documentation
  • "Database Internals" by Alex Petrov
  • "Use the Index, Luke" by Markus Winand
  • High Performance MySQL (O'Reilly)