Correlated Subqueries Deep Dive: Mistakes
Module: Subqueries & CTEs
-- Correlated subquery without index - DISASTER!
SELECT
e1.name,
e1.salary,
(SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department) AS dept_avg
FROM employees e1;
-- No index on (department, salary)
-- For 10,000 employees:
-- 10,000 full table scans
-- 100 million row reads
-- Query time: 8000ms (8 seconds!)
-- Add composite index first
CREATE INDEX idx_employees_dept_salary
ON employees(department, salary);
-- Now query is much faster
SELECT
e1.name,
e1.salary,
(SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department) AS dept_avg
FROM employees e1;
-- Query time: 450ms (18x faster!)
-- Or use window function (best)
SELECT
name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- Query time: 95ms (84x faster than no index!)
Correlated subquery executes once per outer row. Without index, each execution does full table scan. For 10,000 employees, that is 10,000 full scans = 100 million row reads! Composite index on (department, salary) enables index seek - each execution reads only ~50 rows instead of 10,000. Even better: window function scans table once (10,000 rows total) instead of 10,000 times.
ALWAYS create composite index on (foreign_key, aggregate_column) for correlated subqueries. Check with EXPLAIN to verify index usage. Consider window functions as faster alternative.
Critical
Extremely slow query due to missing index on correlated columns
graph TB
A["10,000 Employees"] --> B["No Index"]
B --> C["10,000 × Full Scan<br/>100M row reads<br/>8000ms ❌"]
A --> D["With Index"]