SQL Practice Logo

SQLPractice Online

Caching & Memory Management: Examples

Module: Query Optimization & Performance

Hash Join Memory Spill - 60x Slowdown

advanced

Query joins orders (10M rows) with customers (1M rows). Hash join builds hash table on customers (100MB). work_mem = 4MB (default). Hash table doesn't fit, spills to disk in 25 batches. Query takes 10 minutes instead of 10 seconds (60x slower). Increase work_mem to 150MB, hash table fits in memory, query takes 10 seconds.

-- Query: Join orders with customers

EXPLAIN ANALYZE

SELECT o.order_id, c.name, o.total

FROM orders o

JOIN customers c ON o.customer_id = c.id

WHERE o.created_at > '2024-01-01';

-- With default work_mem = 4MB:

-- Hash table size: 100MB (1M customers × 100 bytes)

-- work_mem: 4MB

-- Result: Doesn't fit, spills to disk

-- EXPLAIN ANALYZE output:

Hash Join (cost=50000..500000 rows=8000000)

Hash Cond: (o.customer_id = c.id)

-> Seq Scan on orders o (cost=0..200000 rows=8000000)

Filter: (created_at > '2024-01-01')

-> Hash (cost=20000..20000 rows=1000000)

Buckets: 131072 Batches: 25 Memory Usage: 4MB Disk Usage: 96MB

-> Seq Scan on customers c (cost=0..20000 rows=1000000)

-- Key indicators of spill:

-- Batches: 25 (should be 1)

-- Disk Usage: 96MB (should be 0)

-- Execution:

-- Build phase: Write 96MB to temp files (2 seconds)

-- Probe phase: Read temp files 25 times (8 seconds per pass = 200 seconds)

-- Total time: 600 seconds (10 minutes)

-- Fix: Increase work_mem

SET work_mem = '150MB';

-- Run query again

EXPLAIN ANALYZE

SELECT o.order_id, c.name, o.total

FROM orders o

JOIN customers c ON o.customer_id = c.id

WHERE o.created_at > '2024-01-01';

-- With work_mem = 150MB:

Hash Join (cost=50000..500000 rows=8000000)

Hash Cond: (o.customer_id = c.id)

-> Seq Scan on orders o (cost=0..200000 rows=8000000)