Join Optimization Strategies: Examples

Module: Query Optimization & Performance

Join Algorithm Comparison - Orders & Customers

advanced

Compare three join algorithms for the same query. Shows when each algorithm is optimal and performance differences.

-- Scenario: 1M orders, 500K customers

-- Query: Get customer name for each order

-- 1. NESTED LOOP JOIN (Best: Small outer, indexed inner)

-- Without index: O(n * m) = 1M * 500K = 500 billion comparisons

SELECT o.order_id, o.order_date, c.customer_name

FROM orders o

JOIN customers c ON o.customer_id = c.id

WHERE o.order_date > '2024-01-01'; -- Filters to 100K orders

-- Time without index: 45 seconds (full table scan on customers for each order)

-- Add index on join column

CREATE INDEX idx_customers_id ON customers(id); -- Usually PRIMARY KEY

-- Time with index: 0.8 seconds (56x faster)

-- Algorithm: Nested Loop with index lookup

-- Why: Small outer (100K), indexed inner (500K)

-- 2. HASH JOIN (Best: Large tables, no index, equality join)

-- Remove index to force hash join

DROP INDEX idx_customers_id;

SELECT o.order_id, o.order_date, c.customer_name

FROM orders o

JOIN customers c ON o.customer_id = c.id

WHERE o.order_date > '2024-01-01';

-- Time: 3.2 seconds

-- Algorithm: Hash Join

-- Process:

-- 1. Build hash table from customers (smaller table) - 1.5s

-- 2. Probe with orders - 1.7s

-- Memory: 250MB for hash table

-- Why: No index, large datasets, equality join

-- 3. MERGE JOIN (Best: Both tables sorted/indexed)

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

CREATE INDEX idx_customers_id ON customers(id);

SELECT o.order_id, o.order_date, c.customer_name

FROM orders o

JOIN customers c ON o.customer_id = c.id

WHERE o.order_date > '2024-01-01'

ORDER BY o.customer_id; -- Hint for merge join

-- Time: 1.2 seconds