Correlated Subqueries Deep Dive: Concept
Module: Subqueries & CTEs
Correlated subqueries are the "row-by-row calculators" of SQL. Unlike regular subqueries that execute once and return a result, correlated subqueries execute once for EVERY row in the outer query, using values from that specific row.
Think of it like this: You have a list of employees. For each employee, you want to know if their salary is above their department's average. A regular subquery can't do this because it would calculate ONE company-wide average. A correlated subquery can because it calculates a DIFFERENT department average for each employee based on THEIR department.
The key characteristic: The subquery references columns from the outer query. This creates a dependency - the subquery result depends on which outer row is being processed. That's why it must execute per row.
Example:
SELECT e1.name, e1.salary
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department -- References outer query!
);
For Alice in Engineering: subquery calculates Engineering average
For Bob in Sales: subquery calculates Sales average
For Carol in Marketing: subquery calculates Marketing average
Each employee gets their own custom calculation. Powerful, but expensive.
**How Correlated Subqueries Execute:**
Step-by-step execution for query:
SELECT e1.name, e1.salary
FROM employees e1
WHERE e1.salary > (SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department = e1.department);
1. Outer query starts: Process first employee (Alice, Engineering, $95K)
2. Subquery executes with Alice's values: AVG(salary) WHERE department = 'Engineering' → $92K
3. Compare: $95K > $92K? Yes → Include Alice
4. Outer query continues: Process second employee (Bob, Engineering, $88K)
5. Subquery executes AGAIN with Bob's values: AVG(salary) WHERE department = 'Engineering' → $92K
6. Compare: $88K > $92K? No → Exclude Bob
7. Repeat for ALL 10,000 employees
Result: 10,000 subquery executions. If each takes 0.5ms, total = 5 seconds just for subqueries!
**Why Correlated Subqueries Are Slow:**
1. Repeated Execution:
- Non-correlated: Executes once
- Correlated: Executes N times (once per outer row)
- For 10,000 rows: 10,000 executions
2. No Result Caching:
- Even if multiple rows have same department
- Subquery still executes for each row
- Database doesn't cache intermediate results
3. Potential Full Table Scans:
- Without proper indexes
- Each execution scans entire table