Correlated Subqueries Deep Dive: Concept

Module: Subqueries & CTEs

Correlated subqueries are the "row-by-row calculators" of SQL. Unlike regular subqueries that execute once and return a result, correlated subqueries execute once for EVERY row in the outer query, using values from that specific row.

Think of it like this: You have a list of employees. For each employee, you want to know if their salary is above their department's average. A regular subquery can't do this because it would calculate ONE company-wide average. A correlated subquery can because it calculates a DIFFERENT department average for each employee based on THEIR department.

The key characteristic: The subquery references columns from the outer query. This creates a dependency - the subquery result depends on which outer row is being processed. That's why it must execute per row.

Example:

SELECT e1.name, e1.salary

FROM employees e1

WHERE e1.salary > (

SELECT AVG(e2.salary)

FROM employees e2

WHERE e2.department = e1.department -- References outer query!

);

For Alice in Engineering: subquery calculates Engineering average

For Bob in Sales: subquery calculates Sales average

For Carol in Marketing: subquery calculates Marketing average

Each employee gets their own custom calculation. Powerful, but expensive.

**How Correlated Subqueries Execute:**

Step-by-step execution for query:

SELECT e1.name, e1.salary

FROM employees e1

WHERE e1.salary > (SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department = e1.department);

1. Outer query starts: Process first employee (Alice, Engineering, $95K)

2. Subquery executes with Alice's values: AVG(salary) WHERE department = 'Engineering' → $92K

3. Compare: $95K > $92K? Yes → Include Alice

4. Outer query continues: Process second employee (Bob, Engineering, $88K)

5. Subquery executes AGAIN with Bob's values: AVG(salary) WHERE department = 'Engineering' → $92K

6. Compare: $88K > $92K? No → Exclude Bob

7. Repeat for ALL 10,000 employees

Result: 10,000 subquery executions. If each takes 0.5ms, total = 5 seconds just for subqueries!

**Why Correlated Subqueries Are Slow:**

1. Repeated Execution:

- Non-correlated: Executes once

- Correlated: Executes N times (once per outer row)

- For 10,000 rows: 10,000 executions

2. No Result Caching:

- Even if multiple rows have same department

- Subquery still executes for each row

- Database doesn't cache intermediate results

3. Potential Full Table Scans:

- Without proper indexes

- Each execution scans entire table