Statistics & Query Planning: Real-World

Module: Query Optimization & Performance

Query suddenly slow after bulk data load - statistics outdated, optimizer chooses nested loop instead of hash join (100x slower). Dashboard query estimates 100 rows but returns 1M rows - histogram missing, causing memory spill. Multi-column filter on correlated columns - optimizer assumes independence, estimates 1000 rows but gets 1M rows.

Shopify: Black Friday ETL Statistics Disaster

Shopify loads 500M orders during Black Friday weekend (10x normal volume). ETL job completes but forgets to run ANALYZE. Monday morning, merchant dashboard queries take 5-10 minutes instead of 5-10 seconds. Customer support flooded with complaints. EXPLAIN shows nested loop joins on orders table. Statistics show 50M rows (old), actual 550M rows (11x underestimate).

Immediately run ANALYZE on orders table and related tables (order_items, payments). Query plans switch from nested loop to hash join. Dashboard queries return to 5-10 seconds. Add ANALYZE to ETL job as mandatory step. Implement statistics age monitoring - alert if statistics >24 hours old on critical tables. Set up pre-query statistics check: if table modified >20% since last ANALYZE, run ANALYZE before critical queries.

-- Emergency fix (run immediately)

ANALYZE orders;

ANALYZE order_items;

ANALYZE payments;

-- Add to ETL job (prevent recurrence)

BEGIN;

-- Load data

INSERT INTO orders SELECT * FROM staging_orders;

INSERT INTO order_items SELECT * FROM staging_order_items;

-- Update statistics (mandatory)

ANALYZE orders;

ANALYZE order_items;

-- Verify statistics updated

SELECT tablename, last_analyze, n_live_tup

FROM pg_stat_user_tables

WHERE tablename IN ('orders', 'order_items');

COMMIT;

-- Monitoring query (run every hour)

SELECT tablename,

last_analyze,

n_live_tup as current_rows,

n_mod_since_analyze as rows_changed,

ROUND(100.0 * n_mod_since_analyze / NULLIF(n_live_tup, 0), 2) as pct_changed

FROM pg_stat_user_tables

WHERE schemaname = 'public'

AND (last_analyze < NOW() - INTERVAL '24 hours'

OR n_mod_since_analyze > n_live_tup * 0.2)

ORDER BY pct_changed DESC;

Dashboard queries: 5-10 minutes → 5-10 seconds (60-120x faster)

Customer complaints: 1000+ → 0

Support ticket volume: -80%

Prevented revenue loss from merchants unable to access dashboards

Added statistics monitoring to prevent future incidents

PostgreSQL

Netflix: Viewing History Histogram Missing - Memory Spill

Netflix viewing_history table has 100B rows. user_status column: 95% active, 4% inactive, 1% suspended. Query for active users (daily recommendation job) estimates 33B rows (assumes uniform distribution), actual 95B rows. Allocates 10GB memory, needs 30GB. Memory spill to disk. Job takes 6 hours instead of 20 minutes. Blocks other jobs, delays recommendation updates.