Scalar Subqueries: Performance
Module: Subqueries & CTEs
**Performance Characteristics:**
1. Non-Correlated Scalar Subquery (Fast):
- Executes: Once
- Caching: Yes, result reused
- Cost: Very low
- Example: (SELECT AVG(salary) FROM employees)
- Best for: Comparisons to aggregates
2. Correlated Scalar Subquery (Can be slow):
- Executes: Once per outer row
- Caching: No
- Cost: Can be high for large tables
- Example: (SELECT AVG(salary) WHERE dept = e1.dept)
- Best for: Row-specific calculations
- Optimization: Index columns in subquery WHERE clause
3. Repeated Scalar Subqueries:
- If you use (SELECT AVG(salary)) 5 times, it executes once
- Database optimizer caches result
- No performance penalty for reuse
- But consider CTE for readability
**Optimization Strategies:**
1. Index columns used in subquery:
- If subquery has WHERE clause, index those columns
- For correlated subqueries, index is critical
- Example: Index on (department_id) for dept-based subqueries
2. Use variables for repeated subqueries:
-- Instead of repeating subquery:
SELECT name, salary,
salary - (SELECT AVG(salary) FROM employees) AS diff,
(salary - (SELECT AVG(salary) FROM employees)) * 100.0 /
(SELECT AVG(salary) FROM employees) AS pct
FROM employees;
-- Use CTE:
WITH avg_sal AS (
SELECT AVG(salary) AS avg_salary FROM employees
)
SELECT name, salary,
salary - avg_salary AS diff,
(salary - avg_salary) * 100.0 / avg_salary AS pct
FROM employees, avg_sal;
3. Consider JOINs for correlated subqueries: