SQL Practice Logo

SQLPractice Online

Join Order & Optimization Strategies: Interview

Module: Joins & Relationships

How does the database optimizer choose join order?

Optimizer uses table statistics (row counts, data distribution, indexes) to estimate costs for different join orders. It considers table sizes, selectivity of WHERE conditions, available indexes, and join algorithms. Generally processes smaller/more selective tables first to minimize intermediate result sizes.

When should you override the optimizer's join order choice?

Override only with evidence from EXPLAIN showing suboptimal plans. Common cases: outdated statistics, complex queries where optimizer lacks information, specific knowledge about data distribution. Always test performance before and after. Most of the time, trust the optimizer.

What factors affect join order performance?

Table sizes, available indexes, WHERE clause selectivity, join algorithms (nested loop, hash, merge), memory available for joins, data distribution, and statistics accuracy. Smaller intermediate results generally perform better.

Analyze why this query might be slow and suggest optimization.

-- Original slow query

EXPLAIN SELECT c.name, o.total, p.name

FROM customers c

JOIN orders o ON c.id = o.customer_id

JOIN order_items oi ON o.id = oi.order_id

JOIN products p ON oi.product_id = p.id

WHERE o.order_date >= '2024-01-01';

-- Optimized with early filtering

SELECT c.name, filtered_orders.total, p.name

FROM (

SELECT customer_id, order_id, total

FROM orders

WHERE order_date >= '2024-01-01'

) filtered_orders

JOIN customers c ON filtered_orders.customer_id = c.id

JOIN order_items oi ON filtered_orders.order_id = oi.order_id

JOIN products p ON oi.product_id = p.id;

Subquery filters orders first, reducing join complexity. Compare EXPLAIN outputs to verify improvement.

How would you optimize a query joining 6 tables with various sizes?

-- Use CTEs to pre-filter and control join order

WITH recent_orders AS (

SELECT order_id, customer_id, total_amount

FROM orders

WHERE order_date >= CURRENT_DATE - INTERVAL 90 DAY

),

active_customers AS (

SELECT customer_id, customer_name

FROM customers

WHERE status = 'active'

)

SELECT ac.customer_name, ro.total_amount, p.product_name

FROM recent_orders ro

JOIN active_customers ac ON ro.customer_id = ac.customer_id