SQL Practice Logo

SQLPractice Online

Denormalization Strategies: Mistakes

Module: Schema Design & Advanced DDL

Denormalizing without measuring performance first

Use EXPLAIN ANALYZE to measure query performance, check frequency in logs, calculate total cost before denormalizing

Developers assume joins are slow and denormalize preemptively. But a query joining 3 tables with proper indexes might run in 20ms. Denormalizing adds complexity (triggers, consistency checks) for no benefit. Always measure: Run EXPLAIN ANALYZE, check query frequency, calculate total cost (execution time × frequency). Only denormalize if savings justify maintenance cost. Example: Query runs 10,000 times/day at 500ms = 5,000 seconds/day. If denormalization reduces to 50ms, you save 4,500 seconds/day.

High

graph TD

A[Slow Query?] --> B{Measured?}

B -->|No| C[DON'T denormalize yet]

C --> D[Run EXPLAIN ANALYZE]

D --> E[Check query frequency]

E --> F[Calculate total cost]

F --> G{Cost > threshold?}

G -->|Yes| H[Denormalize]

G -->|No| I[Try indexes first]

B -->|Yes| G

style C fill:#FFB6C6

style H fill:#90EE90

style I fill:#90EE90

Forgetting to update denormalized copies when source data changes

Use database triggers or application-level transactions to update all copies atomically

You add customer_name to orders table for performance. Later, a customer changes their name. You update customers table but forget orders. Data is inconsistent. Solution: Use triggers to automatically update denormalized copies, or wrap updates in transactions. Document all denormalized relationships. Create automated consistency checks that alert on mismatches.

Critical

sequenceDiagram

participant App

participant Customers

participant Orders

participant Trigger

App->>Customers: UPDATE name = 'New Name'

Customers->>Trigger: AFTER UPDATE fires

Trigger->>Orders: UPDATE customer_name = 'New Name'

Orders-->>App: Both tables consistent

Note over App,Orders: Without trigger: Orders has old name

Over-denormalizing by adding redundancy everywhere

Denormalize selectively based on query patterns - only the top 10 slowest frequent queries

After seeing denormalization improve one query, developers denormalize everything. They add 20 redundant columns to every table. Write performance drops 80%. Storage doubles. Maintenance nightmare. Solution: Use query logs to identify top 10 slowest queries that run frequently. Denormalize only those. Follow 80/20 rule: 20% of queries cause 80% of load. Optimize those 20%.

High

graph LR

A[Query Logs] --> B[Identify top 10 slow queries]

B --> C[Check frequency]

C --> D{Runs > 1000/day?}

D -->|Yes| E[Denormalize this one]