Caching & Memory Management: Examples
Module: Query Optimization & Performance
Hash Join Memory Spill - 60x Slowdown
advanced
Query joins orders (10M rows) with customers (1M rows). Hash join builds hash table on customers (100MB). work_mem = 4MB (default). Hash table doesn't fit, spills to disk in 25 batches. Query takes 10 minutes instead of 10 seconds (60x slower). Increase work_mem to 150MB, hash table fits in memory, query takes 10 seconds.
-- Query: Join orders with customers
EXPLAIN ANALYZE
SELECT o.order_id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > '2024-01-01';
-- With default work_mem = 4MB:
-- Hash table size: 100MB (1M customers × 100 bytes)
-- work_mem: 4MB
-- Result: Doesn't fit, spills to disk
-- EXPLAIN ANALYZE output:
Hash Join (cost=50000..500000 rows=8000000)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0..200000 rows=8000000)
Filter: (created_at > '2024-01-01')
-> Hash (cost=20000..20000 rows=1000000)
Buckets: 131072 Batches: 25 Memory Usage: 4MB Disk Usage: 96MB
-> Seq Scan on customers c (cost=0..20000 rows=1000000)
-- Key indicators of spill:
-- Batches: 25 (should be 1)
-- Disk Usage: 96MB (should be 0)
-- Execution:
-- Build phase: Write 96MB to temp files (2 seconds)
-- Probe phase: Read temp files 25 times (8 seconds per pass = 200 seconds)
-- Total time: 600 seconds (10 minutes)
-- Fix: Increase work_mem
SET work_mem = '150MB';
-- Run query again
EXPLAIN ANALYZE
SELECT o.order_id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > '2024-01-01';
-- With work_mem = 150MB:
Hash Join (cost=50000..500000 rows=8000000)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0..200000 rows=8000000)