Subqueries in SELECT Clause: Interview
Module: Subqueries & CTEs
Explain the difference between scalar and correlated subqueries in SELECT clause. Which is faster and why?
Scalar subquery in SELECT returns the same value for all rows. It doesn't reference the outer query, so it executes ONCE and the result is cached. Example: (SELECT AVG(salary) FROM employees) - calculates company average once, uses for all rows. Very fast. Correlated subquery in SELECT returns different values per row. It references outer query columns, so it executes ONCE PER ROW. Example: (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept) - calculates department average for each employee. For 10,000 employees, this runs 10,000 times. Much slower. Scalar is faster because it runs once vs per row. For correlated subqueries, you must index the columns in the WHERE clause or consider alternatives like JOINs or window functions.
When should you use a correlated subquery in SELECT vs a JOIN with GROUP BY? Give examples.
Use correlated subquery when: (1) Logic is clearer and more readable, (2) You need just one calculated value, (3) Outer table is small (<100 rows), (4) You're doing existence checks or simple counts. Example: SELECT c.name, (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) FROM customers - clear intent. Use JOIN with GROUP BY when: (1) Performance is critical, (2) You need multiple aggregates from same table, (3) Outer table is large (>1000 rows), (4) You're combining data from multiple tables. Example: SELECT c.name, COUNT(o.id), SUM(o.total) FROM customers c LEFT JOIN orders o ON o.customer_id = c.id GROUP BY c.id, c.name - faster, gets multiple aggregates in one pass. Rule of thumb: For large tables or multiple aggregates, use JOIN. For small tables or single simple calculation, correlated subquery is fine.
What happens when a subquery in SELECT returns NULL? How do you handle it?
When a subquery in SELECT finds no rows, it returns NULL. This is not an error - it's expected behavior. Example: (SELECT SUM(total) FROM orders WHERE customer_id = c.id) returns NULL for customers with no orders. Problem: NULL in calculations returns NULL. If you do NULL * 0.1, result is NULL (not 0). Solution: Use COALESCE to provide default value. COALESCE((SELECT SUM(total) ...), 0) returns 0 if subquery is NULL. COALESCE returns first non-NULL value. You can also use CASE: CASE WHEN (SELECT ...) IS NULL THEN 0 ELSE (SELECT ...) END. Or use IFNULL in MySQL: IFNULL((SELECT ...), 0). Always handle NULL when subquery might return no rows and you need a specific default value.
Why are window functions often better than correlated subqueries in SELECT? When would you still use correlated subqueries?
Window functions are better because they calculate per-group aggregates in a single table scan. Correlated subquery executes once per row. Example: For department averages with 10,000 employees, correlated subquery runs 10,000 times (450ms), window function scans once (95ms) - 5x faster. Window functions also have cleaner syntax: AVG(salary) OVER (PARTITION BY dept) vs (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept). Still use correlated subqueries when: (1) Window functions not available (MySQL <8.0, old databases), (2) Logic doesn't fit window function pattern (complex conditions, multiple tables), (3) You need existence checks or counts with complex WHERE clauses, (4) Readability is more important than performance for small datasets. Window functions are the modern best practice, but correlated subqueries still have their place.
Write a query showing each customer with their order count, total spent, and average order value. Use subqueries in SELECT.
-- Using correlated subqueries
SELECT
c.id,
c.name,
c.email,
(SELECT COUNT(*)
FROM orders
WHERE customer_id = c.id) AS order_count,
COALESCE(
(SELECT SUM(total)
FROM orders
WHERE customer_id = c.id),
0
) AS total_spent,
COALESCE(
(SELECT AVG(total)
FROM orders
WHERE customer_id = c.id),
0
) AS avg_order_value
FROM customers c
ORDER BY total_spent DESC;
-- Required index for performance:
CREATE INDEX idx_orders_customer_total
ON orders(customer_id, total);
Three correlated subqueries calculate per-customer statistics. Each references c.id from outer query. COALESCE handles NULL for customers with no orders. Index on (customer_id, total) is covering index - provides all needed columns without table lookup. For better performance, use JOIN with GROUP BY instead.
Rewrite the previous query using JOIN instead of correlated subqueries. Compare performance.
-- Using JOIN with GROUP BY (faster)
SELECT
c.id,
c.name,