Subquery Optimization: Performance
Module: Query Optimization & Performance
**Correlated Subquery Cost:**
Example: 1M customers, 10M orders
Correlated in SELECT:
SELECT c.*, (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) FROM customers c;
- Executions: 1M subquery runs
- Scans: 1M full table scans of orders (10M rows each)
- Comparisons: 10 trillion
- Time: 45-60 seconds
JOIN alternative:
SELECT c.*, COUNT(o.id) FROM customers c LEFT JOIN orders o ON c.id = o.customer_id GROUP BY c.id;
- Executions: 1 join
- Scans: 1 scan of each table
- Comparisons: 11M
- Time: 0.5 seconds (90-120x faster)
**IN vs EXISTS vs JOIN:**
| Approach | Time | Rows Processed | Short-Circuit | NULL Safe |
|----------|------|----------------|---------------|------------|
| IN (small) | 0.5s | All subquery rows | No | No |
| IN (large) | 8s | All subquery rows | No | No |
| EXISTS | 1.2s | Until first match | Yes | Yes |
| JOIN | 0.8s | All matching rows | No | Yes |
Rule: EXISTS for existence, JOIN for data retrieval
**Subquery Materialization:**
Modern optimizers can materialize subqueries:
- Execute subquery once
- Store results in temp table
- Use temp table in outer query
But: Not all subqueries can be materialized
- Correlated subqueries: NO
- Large result sets: May spill to disk
- Complex subqueries: May not optimize
**Index Impact:**
Correlated subquery without index:
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id)
- Full table scan per customer
- 1M customers × 10M orders = 10 trillion comparisons
- Time: 60+ seconds
With index on orders(customer_id):
- Index lookup per customer
- 1M customers × log(10M) ≈ 23M comparisons