Subqueries in WHERE Clause: Performance
Module: Subqueries & CTEs
**Performance Characteristics:**
1. IN Operator:
- Execution: Subquery runs once
- Memory: Creates hash table of results
- Best for: Small to medium result sets (<1000 values)
- Worst case: Large result sets (>10K values) - slow hash table creation
2. EXISTS Operator:
- Execution: Runs per outer row (correlated)
- Memory: Minimal (stops at first match)
- Best for: Large result sets, existence checks
- Optimization: Stops at first match (doesn't need to find all)
3. Scalar Subquery (=, >, <):
- Execution: Once (if non-correlated)
- Memory: Single value
- Best for: Comparing to aggregates
- Fast and efficient
4. ANY/ALL Operators:
- Execution: Subquery runs once
- Memory: Stores all values for comparison
- Best for: Small to medium result sets
- Similar performance to IN
**Optimization Strategies:**
1. Index foreign key columns:
CREATE INDEX idx_orders_customer ON orders(customer_id);
- Critical for EXISTS and correlated subqueries
- Enables index seek instead of table scan
2. Use EXISTS instead of IN for large datasets:
- IN with 10,000 values: 850ms
- EXISTS with same data: 180ms (5x faster)
3. Always use NOT EXISTS instead of NOT IN:
- Avoids NULL trap
- Usually faster
- More predictable behavior
4. Consider JOINs as alternative:
-- Subquery approach:
SELECT c.name FROM customers c
WHERE EXISTS (SELECT 1 FROM orders WHERE customer_id = c.id);
-- JOIN approach (often faster):
SELECT DISTINCT c.name FROM customers c
INNER JOIN orders o ON o.customer_id = c.id;