Scalar Subqueries: Concept
Module: Subqueries & CTEs
Think of a scalar subquery as asking a quick question that has a single answer. "What's the average salary?" → 75000. "What's the highest price?" → 299.99. "When was the last order?" → 2024-01-15. That single answer can then be used anywhere you'd use a regular value - in comparisons, calculations, or as a column.
The key rule: scalar subquery MUST return exactly one value. Not two values, not a list, not a table - just one single value. If your subquery might return multiple rows, you'll get an error. That's why we often use aggregate functions (AVG, MAX, MIN, COUNT) in scalar subqueries - they always return a single value.
Scalar subqueries are incredibly useful for comparisons. Want to find employees earning above average? Compare each salary to (SELECT AVG(salary)). Want products priced higher than category median? Compare to (SELECT MEDIAN(price)). The subquery calculates the benchmark, then each row is compared to it.
**How Scalar Subqueries Work:**
When you write a scalar subquery, the database executes it and replaces it with the result value. Think of it like a calculator:
1. You write: WHERE salary > (SELECT AVG(salary) FROM employees)
2. Database calculates: (SELECT AVG(salary) FROM employees) → 75000
3. Query becomes: WHERE salary > 75000
4. Database filters rows using that value
For non-correlated scalar subqueries (most common), this happens ONCE. The result is cached and reused. If you use the same subquery 10 times in your query, it only executes once. This makes them very efficient.
**The Single Value Rule:**
Scalar subquery must return exactly one value. Three scenarios:
1. Returns one row, one column → Perfect! Value is used
2. Returns zero rows → Result is NULL (not an error)
3. Returns multiple rows → ERROR: "Subquery returns more than one row"
How to ensure single value:
- Use aggregate functions: AVG(), MAX(), MIN(), COUNT(), SUM()
- Use LIMIT 1 if you just need any one value
- Add WHERE conditions to filter to one row
- Use TOP 1 in SQL Server
**Non-Correlated vs Correlated:**
Non-correlated (independent, fast):
- Doesn't reference outer query
- Executes ONCE
- Result cached and reused
- Example: (SELECT AVG(salary) FROM employees)
Correlated (references outer, slower):
- References columns from outer query
- Executes ONCE PER ROW
- Can't be cached
- Example: (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept)
For 10,000 rows, non-correlated runs 1 time, correlated runs 10,000 times. Big difference!
**Where You Can Use Scalar Subqueries:**
1. SELECT clause - Add calculated columns
2. WHERE clause - Filter based on aggregates
3. HAVING clause - Filter groups
4. SET clause - Dynamic UPDATE values
5. VALUES clause - Dynamic INSERT values
6. CASE expressions - Conditional logic
7. ORDER BY clause - Sort by calculated values