Join Order & Optimization Strategies: Interview
Module: Joins & Relationships
How does the database optimizer choose join order?
Optimizer uses table statistics (row counts, data distribution, indexes) to estimate costs for different join orders. It considers table sizes, selectivity of WHERE conditions, available indexes, and join algorithms. Generally processes smaller/more selective tables first to minimize intermediate result sizes.
When should you override the optimizer's join order choice?
Override only with evidence from EXPLAIN showing suboptimal plans. Common cases: outdated statistics, complex queries where optimizer lacks information, specific knowledge about data distribution. Always test performance before and after. Most of the time, trust the optimizer.
What factors affect join order performance?
Table sizes, available indexes, WHERE clause selectivity, join algorithms (nested loop, hash, merge), memory available for joins, data distribution, and statistics accuracy. Smaller intermediate results generally perform better.
Analyze why this query might be slow and suggest optimization.
-- Original slow query
EXPLAIN SELECT c.name, o.total, p.name
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.order_date >= '2024-01-01';
-- Optimized with early filtering
SELECT c.name, filtered_orders.total, p.name
FROM (
SELECT customer_id, order_id, total
FROM orders
WHERE order_date >= '2024-01-01'
) filtered_orders
JOIN customers c ON filtered_orders.customer_id = c.id
JOIN order_items oi ON filtered_orders.order_id = oi.order_id
JOIN products p ON oi.product_id = p.id;
Subquery filters orders first, reducing join complexity. Compare EXPLAIN outputs to verify improvement.
How would you optimize a query joining 6 tables with various sizes?
-- Use CTEs to pre-filter and control join order
WITH recent_orders AS (
SELECT order_id, customer_id, total_amount
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL 90 DAY
),
active_customers AS (
SELECT customer_id, customer_name
FROM customers
WHERE status = 'active'
)
SELECT ac.customer_name, ro.total_amount, p.product_name
FROM recent_orders ro
JOIN active_customers ac ON ro.customer_id = ac.customer_id