Subquery Types & Classification: Interview
Module: Subqueries & CTEs
What is a subquery and where can it appear in a SQL statement?
A subquery is a query nested inside another query, always enclosed in parentheses. It can appear in four locations: (1) SELECT clause - adds calculated column, must return single value per row. (2) WHERE clause - filters rows, can return single value or list. (3) FROM clause - creates derived table, must have alias. (4) HAVING clause - filters groups after aggregation. Subqueries let you break complex problems into steps: "first get this, then use it to get that." They're fundamental to advanced SQL patterns.
Explain the difference between correlated and non-correlated subqueries. Which is faster and why?
Non-correlated subquery is independent of outer query. It executes once, result is cached and reused for all outer rows. Example: WHERE salary > (SELECT AVG(salary) FROM employees) - AVG calculated once. Correlated subquery references outer query columns. It executes once per outer row. Example: WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept) - AVG calculated per employee's department. Non-correlated is much faster because it runs once. Correlated can be very slow on large datasets (10K rows = 10K executions). Use correlated only when necessary for row-by-row logic. Consider JOINs or window functions as alternatives.
What are the different types of subqueries by result type? Give examples of each.
Three main types: (1) Scalar subquery - returns exactly one value (1 row, 1 column). Example: (SELECT AVG(salary) FROM employees) returns 75000. Used with comparison operators. Error if returns multiple rows. (2) Multi-row subquery - returns list (N rows, 1 column). Example: (SELECT id FROM departments WHERE location = "NYC") returns (1,3,5). Used with IN, ANY, ALL, EXISTS. (3) Table subquery - returns result set (N rows, M columns). Example: (SELECT * FROM employees WHERE hire_date > "2023-01-01"). Used in FROM clause as derived table. Must have alias. Choose type based on what you need: single value for comparison, list for membership check, or table for further processing.
When should you use a subquery vs a JOIN? What are the trade-offs?
Use subquery when: (1) You only need to filter, don't need columns from both tables. (2) Logic is clearer with subquery ("find customers WHERE they have orders"). (3) Checking existence with EXISTS. Use JOIN when: (1) You need columns from multiple tables. (2) Performance is critical (JOINs usually faster). (3) Combining data, not just filtering. Trade-offs: Subqueries are often clearer for filtering logic but can be slower (especially correlated). JOINs are faster for combining data but can be less intuitive for existence checks. Modern optimizers often convert subqueries to JOINs automatically. Test both approaches with EXPLAIN to see actual execution plan.
Write a query to find employees earning more than the average salary in their department. Use a correlated subquery.
-- Correlated subquery approach
SELECT
e1.name,
e1.department,
e1.salary,
(SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department) AS dept_avg
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department -- Correlated!
)
ORDER BY e1.department, e1.salary DESC;
-- Alternative with CTE (often faster):
WITH dept_averages AS (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
)
SELECT
e.name,
e.department,
e.salary,
da.avg_salary AS dept_avg
FROM employees e
JOIN dept_averages da ON e.department = da.department
WHERE e.salary > da.avg_salary
ORDER BY e.department, e.salary DESC;
Correlated subquery references outer query (e1.department = e2.department). For each employee, calculates their department's average and compares. Executes once per employee. CTE alternative calculates each department average once, then joins - usually faster for large datasets.