Scalar Subqueries: Interview

Module: Subqueries & CTEs

What is a scalar subquery and what makes it different from other subquery types?

A scalar subquery returns exactly one value (1 row, 1 column). This is different from multi-row subqueries that return a list of values, or table subqueries that return multiple rows and columns. The key characteristic is "single value" - like 75000, or "2024-01-15", or NULL. Because it returns a single value, you can use it anywhere a regular value is expected: in SELECT to add columns, in WHERE for comparisons, in calculations, in CASE expressions. If a scalar subquery returns multiple rows, you get an error: "Subquery returns more than one row". To ensure single value, use aggregate functions (AVG, MAX, MIN, COUNT) which always return one value, or add LIMIT 1 to take just the first row.

Explain the difference between correlated and non-correlated scalar subqueries. Which is faster and why?

Non-correlated scalar subquery is independent - it doesn't reference the outer query. Example: (SELECT AVG(salary) FROM employees). It executes ONCE, the result is cached, and that cached value is reused throughout the query. Very fast. Correlated scalar subquery references columns from the outer query. Example: (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept). It executes ONCE PER ROW of the outer query. For 10,000 rows, it runs 10,000 times. Much slower without proper indexes. Non-correlated is faster because it runs once vs per row. For correlated subqueries, you must index the columns in the WHERE clause, or consider using JOINs or window functions instead.

What happens when a scalar subquery returns no rows? Is it an error?

When a scalar subquery finds no rows, it returns NULL - this is NOT an error. For example: (SELECT AVG(price) FROM products WHERE category = 'NonExistent') returns NULL if no products in that category. This can cause unexpected results in calculations because any operation with NULL returns NULL. If you do 100 - NULL, the result is NULL. To handle this, use COALESCE to provide a default value: COALESCE((SELECT AVG(price) ...), 0) returns 0 if the subquery is NULL. Or use CASE to check for NULL explicitly. Empty result = NULL is valid behavior, not an error. Only multiple rows cause an error.

When should you use a scalar subquery vs a JOIN? Give examples of when each is better.

Use scalar subquery when you only need to filter or compare, not when you need columns from both tables. Example: Finding employees earning above average - scalar subquery is perfect: WHERE salary > (SELECT AVG(salary)). Clean and clear. Use JOIN when you need columns from multiple tables. Example: Showing employee with their department name - JOIN is better: FROM employees e JOIN departments d ON e.dept_id = d.id. Scalar subquery would be awkward here. Also use JOIN for correlated aggregates - faster than correlated scalar subquery. For per-department average, JOIN with pre-aggregated data is faster than correlated subquery. Rule of thumb: Scalar subquery for filtering/comparing to aggregates. JOIN for combining data from multiple tables.

Write a query to find all products priced above the average price. Show product name, price, average price, and percentage above average.

SELECT

product_name,

price,

(SELECT AVG(price) FROM products) AS avg_price,

ROUND((price - (SELECT AVG(price) FROM products)) * 100.0 /

(SELECT AVG(price) FROM products), 1) AS pct_above_avg

FROM products

WHERE price > (SELECT AVG(price) FROM products)

ORDER BY price DESC;

Scalar subquery (SELECT AVG(price)) executes once and returns average. Used 4 times in query but only executes once - result is cached. WHERE filters to above-average products. Calculation shows percentage above average.

Find departments where the average salary is higher than the company average. Show department name, employee count, department average, and company average.

SELECT

department,

COUNT(*) AS employee_count,

ROUND(AVG(salary), 0) AS dept_avg,

(SELECT ROUND(AVG(salary), 0) FROM employees) AS company_avg

FROM employees

GROUP BY department

HAVING AVG(salary) > (SELECT AVG(salary) FROM employees)

ORDER BY dept_avg DESC;

Scalar subquery in HAVING clause filters groups. After GROUP BY creates department groups, HAVING compares each department's average to company average. Only departments above company average are returned. Scalar subquery executes once, result cached.

Write a query showing each employee with their salary and their department's average salary. Use a correlated scalar subquery.

-- First, create index for performance

CREATE INDEX idx_employees_dept_salary

ON employees(department_id, salary);

-- Query with correlated scalar subquery

SELECT

e1.name,

e1.department_id,

e1.salary,

(SELECT ROUND(AVG(e2.salary), 0)