Subquery Optimization: Functions
Module: Query Optimization & Performance
**Problem: Correlated Subquery in SELECT**
Slow (executes subquery per row):
SELECT
c.customer_name,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.id) as order_count,
(SELECT SUM(total) FROM orders WHERE customer_id = c.id) as total_spent
FROM customers c;
-- 1M customers = 2M subquery executions
-- Time: 45 seconds
Fast (single JOIN with aggregation):
SELECT
c.customer_name,
COUNT(o.id) as order_count,
SUM(o.total) as total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.customer_name;
-- Time: 0.5 seconds (90x faster)
**Problem: IN with Large Subquery**
Slow (checks all values):
SELECT * FROM products
WHERE category_id IN (
SELECT id FROM categories WHERE parent_id = 5
);
-- Subquery returns 10K categories
-- Time: 8 seconds
Fast (JOIN):
SELECT p.* FROM products p
INNER JOIN categories c ON p.category_id = c.id
WHERE c.parent_id = 5;
-- Time: 0.3 seconds (27x faster)
**Problem: NOT IN with NULLs**
Dangerous (returns no rows if subquery has NULL):
SELECT * FROM customers
WHERE id NOT IN (SELECT customer_id FROM orders);
-- If any order has NULL customer_id, returns 0 rows!
Safe (handles NULLs correctly):
SELECT c.* FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.customer_id IS NULL;