Subquery Optimization: Concept
Module: Query Optimization & Performance
Subquery optimization is about making queries-within-queries fast. Subqueries come in two types: non-correlated (runs once) and correlated (runs for every row - expensive!). Most slow subqueries can be rewritten as JOINs, CTEs, or window functions for 10-1000x speedup.
Think of it like looking up information. Bad approach: for each customer, search through all orders to find theirs (correlated subquery). Good approach: join customers with orders once, then filter (JOIN). The difference: 1 million lookups vs 1 join.
**Two Types of Subqueries:**
1. **Non-Correlated Subquery** - Independent, executes once
- Example: WHERE customer_id IN (SELECT id FROM customers WHERE country = 'US')
- Performance: Executes once, result cached
- Cost: O(n) for subquery + O(m) for outer query
- When OK: Small result sets (<1000 rows)
2. **Correlated Subquery** - References outer query, executes per row
- Example: WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id)
- Performance: Executes for EVERY outer row
- Cost: O(n * m) - quadratic time
- Problem: 1000 customers = 1000 subquery executions
**Performance Impact:**
Correlated subquery in SELECT (worst case):
SELECT c.name, (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) FROM customers c;
- 1M customers = 1M subquery executions
- Each subquery scans orders table
- Total: 1M table scans
- Time: 45+ seconds
JOIN alternative:
SELECT c.name, COUNT(o.id) FROM customers c LEFT JOIN orders o ON c.id = o.customer_id GROUP BY c.id;
- 1 join operation
- Time: 0.5 seconds (90x faster)
**Subquery Locations:**
1. **SELECT clause** (scalar subquery) - WORST for performance
- Executes per row
- Use window functions or JOINs instead
2. **WHERE clause** - Common, can be optimized
- IN: OK for small lists, slow for large
- EXISTS: Better than IN, short-circuits
- JOIN: Usually fastest
3. **FROM clause** (derived table) - Usually OK
- Executes once
- Modern optimizers handle well
- CTEs more readable
**Optimization Strategies:**
1. **Convert to JOIN**
- Correlated subquery → JOIN with GROUP BY
- IN subquery → INNER JOIN