Join Optimization Strategies: Examples
Module: Query Optimization & Performance
Join Algorithm Comparison - Orders & Customers
advanced
Compare three join algorithms for the same query. Shows when each algorithm is optimal and performance differences.
-- Scenario: 1M orders, 500K customers
-- Query: Get customer name for each order
-- 1. NESTED LOOP JOIN (Best: Small outer, indexed inner)
-- Without index: O(n * m) = 1M * 500K = 500 billion comparisons
SELECT o.order_id, o.order_date, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date > '2024-01-01'; -- Filters to 100K orders
-- Time without index: 45 seconds (full table scan on customers for each order)
-- Add index on join column
CREATE INDEX idx_customers_id ON customers(id); -- Usually PRIMARY KEY
-- Time with index: 0.8 seconds (56x faster)
-- Algorithm: Nested Loop with index lookup
-- Why: Small outer (100K), indexed inner (500K)
-- 2. HASH JOIN (Best: Large tables, no index, equality join)
-- Remove index to force hash join
DROP INDEX idx_customers_id;
SELECT o.order_id, o.order_date, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date > '2024-01-01';
-- Time: 3.2 seconds
-- Algorithm: Hash Join
-- Process:
-- 1. Build hash table from customers (smaller table) - 1.5s
-- 2. Probe with orders - 1.7s
-- Memory: 250MB for hash table
-- Why: No index, large datasets, equality join
-- 3. MERGE JOIN (Best: Both tables sorted/indexed)
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_customers_id ON customers(id);
SELECT o.order_id, o.order_date, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.order_date > '2024-01-01'
ORDER BY o.customer_id; -- Hint for merge join
-- Time: 1.2 seconds