Statistics & Query Planning: Examples
Module: Query Optimization & Performance
Stale Statistics Causing Nested Loop Disaster
advanced
After bulk data load, query suddenly takes 10 minutes instead of 5 seconds. EXPLAIN shows nested loop join instead of hash join. Statistics are outdated - optimizer thinks table has 100K rows but it actually has 10M rows.
-- Initial state: 100K orders, statistics current
SELECT o.order_id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending';
-- EXPLAIN shows hash join (good for 100K rows)
-- Execution time: 5 seconds
-- Bulk load: Insert 9.9M more orders (now 10M total)
INSERT INTO orders SELECT ... FROM staging_orders;
-- 9.9M rows inserted
-- Run same query again (statistics NOT updated)
SELECT o.order_id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending';
-- EXPLAIN shows nested loop join (disaster!)
-- Optimizer thinks: 100K orders (old stats)
-- Actual: 10M orders
-- Execution time: 10 minutes (120x slower)
-- Fix: Update statistics
ANALYZE orders;
-- Run query again
SELECT o.order_id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending';
-- EXPLAIN shows hash join again (correct plan)
-- Execution time: 8 seconds (75x faster than broken plan)
Before ANALYZE (stale statistics):
------------------------------------------------------------
Hash Join (cost=2500..5000 rows=10000)
-> Seq Scan on customers (cost=0..1000 rows=50000)
-> Hash (cost=1000..1000 rows=10000) -- WRONG estimate
-> Seq Scan on orders (cost=0..1000 rows=10000)
Filter: (status = 'pending')
-- Optimizer thinks 10K rows, actually 8M rows
Actual execution: