Caching & Memory Management: Real-World
Module: Query Optimization & Performance
Query doing hash join spills to disk - takes 10 minutes instead of 10 seconds (60x slower). Increase work_mem from 4MB to 100MB, query runs in memory, 60x faster. Buffer pool too small (1GB for 100GB database) - cache hit ratio 60%, disk I/O bottleneck. Increase to 25GB, hit ratio 95%, queries 10x faster. Sort operation creates 50GB temp files, fills disk, query fails.
Reddit: Hash Join Memory Spill - 100x Slowdown
Reddit comment aggregation query joins comments (1B rows) with users (100M rows). Hash join builds hash table on users (10GB). work_mem = 4MB (default). Hash table spills to disk in 2500 batches. Query takes 2 hours instead of 1 minute (120x slower). Temp files fill disk (500GB), causing disk full errors. Other queries fail.
Increase work_mem to 12GB for this specific query using SET work_mem. Hash table fits in memory, no spill. Query takes 1 minute. Temp file usage drops to zero. Add query-specific work_mem setting to application code. Monitor temp file usage to catch future spills. For global increase, use connection pooling to limit total memory usage.
-- Problem query
EXPLAIN ANALYZE
SELECT u.username, COUNT(c.comment_id) as comment_count,
AVG(c.score) as avg_score
FROM comments c
JOIN users u ON c.user_id = u.user_id
WHERE c.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.username;
-- With default work_mem = 4MB:
-- Hash Join (cost=5000000..50000000 rows=100000000)
-- Hash Cond: (c.user_id = u.user_id)
-- -> Seq Scan on comments c
-- -> Hash (cost=2000000..2000000 rows=100000000)
-- Buckets: 131072 Batches: 2500 Memory Usage: 4MB Disk Usage: 10GB
-- -> Seq Scan on users u
-- Time: 7200 seconds (2 hours)
-- Temp files: 500GB
-- Check temp file usage
SELECT temp_files, pg_size_pretty(temp_bytes)
FROM pg_stat_database
WHERE datname = 'reddit';
-- Result: temp_files = 15000, temp_bytes = 500GB
-- Fix: Increase work_mem for this query
BEGIN;
SET LOCAL work_mem = '12GB';
EXPLAIN ANALYZE
SELECT u.username, COUNT(c.comment_id) as comment_count,
AVG(c.score) as avg_score
FROM comments c
JOIN users u ON c.user_id = u.user_id
WHERE c.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.username;
COMMIT;
-- With work_mem = 12GB:
-- Hash Join (cost=5000000..50000000 rows=100000000)
-- Hash Cond: (c.user_id = u.user_id)