Denormalization Strategies: Mistakes
Module: Schema Design & Advanced DDL
Denormalizing without measuring performance first
Use EXPLAIN ANALYZE to measure query performance, check frequency in logs, calculate total cost before denormalizing
Developers assume joins are slow and denormalize preemptively. But a query joining 3 tables with proper indexes might run in 20ms. Denormalizing adds complexity (triggers, consistency checks) for no benefit. Always measure: Run EXPLAIN ANALYZE, check query frequency, calculate total cost (execution time × frequency). Only denormalize if savings justify maintenance cost. Example: Query runs 10,000 times/day at 500ms = 5,000 seconds/day. If denormalization reduces to 50ms, you save 4,500 seconds/day.
High
graph TD
A[Slow Query?] --> B{Measured?}
B -->|No| C[DON'T denormalize yet]
C --> D[Run EXPLAIN ANALYZE]
D --> E[Check query frequency]
E --> F[Calculate total cost]
F --> G{Cost > threshold?}
G -->|Yes| H[Denormalize]
G -->|No| I[Try indexes first]
B -->|Yes| G
style C fill:#FFB6C6
style H fill:#90EE90
style I fill:#90EE90
Forgetting to update denormalized copies when source data changes
Use database triggers or application-level transactions to update all copies atomically
You add customer_name to orders table for performance. Later, a customer changes their name. You update customers table but forget orders. Data is inconsistent. Solution: Use triggers to automatically update denormalized copies, or wrap updates in transactions. Document all denormalized relationships. Create automated consistency checks that alert on mismatches.
Critical
sequenceDiagram
participant App
participant Customers
participant Orders
participant Trigger
App->>Customers: UPDATE name = 'New Name'
Customers->>Trigger: AFTER UPDATE fires
Trigger->>Orders: UPDATE customer_name = 'New Name'
Orders-->>App: Both tables consistent
Note over App,Orders: Without trigger: Orders has old name
Over-denormalizing by adding redundancy everywhere
Denormalize selectively based on query patterns - only the top 10 slowest frequent queries
After seeing denormalization improve one query, developers denormalize everything. They add 20 redundant columns to every table. Write performance drops 80%. Storage doubles. Maintenance nightmare. Solution: Use query logs to identify top 10 slowest queries that run frequently. Denormalize only those. Follow 80/20 rule: 20% of queries cause 80% of load. Optimize those 20%.
High
graph LR
A[Query Logs] --> B[Identify top 10 slow queries]
B --> C[Check frequency]
C --> D{Runs > 1000/day?}
D -->|Yes| E[Denormalize this one]