Materialized CTEs: Concept
Module: Subqueries & CTEs
CTEs can behave in two ways: materialized (computed once, stored temporarily) or inlined (merged into main query, potentially executed multiple times). This matters when a CTE is referenced multiple times. Imagine a dashboard with 3 charts, all using the same expensive aggregation. If the CTE is inlined, that aggregation runs 3 times. If materialized, it runs once and results are reused. The difference can be 15 seconds vs 2 seconds. PostgreSQL defaults to inlining (fast for simple CTEs, slow for reused ones). SQL Server usually materializes. Understanding this behavior is critical for performance.
**
**How CTE Materialization Works:**
1. **Inlined (Not Materialized)**: Optimizer merges CTE into main query. CTE logic executes wherever referenced. If referenced 3 times, executes 3 times.
2. **Materialized**: Database executes CTE once, stores results in temporary memory (like a temp table), then reuses those results wherever CTE is referenced.
**PostgreSQL Behavior (12+):**
PostgreSQL 12+ defaults to inlining CTEs for optimization. This is great for simple CTEs but terrible for expensive CTEs referenced multiple times.
```sql
-- Without MATERIALIZED: Executes CTE twice
WITH expensive AS (
SELECT department, AVG(salary) as avg_sal
FROM employees -- 1M rows
GROUP BY department
)
SELECT * FROM expensive WHERE avg_sal > 70000
UNION ALL
SELECT * FROM expensive WHERE avg_sal <= 70000;
-- Scans employees table twice, aggregates twice
-- With MATERIALIZED: Executes CTE once
WITH expensive AS MATERIALIZED (
SELECT department, AVG(salary) as avg_sal
FROM employees -- 1M rows
GROUP BY department
)
SELECT * FROM expensive WHERE avg_sal > 70000
UNION ALL
SELECT * FROM expensive WHERE avg_sal <= 70000;
-- Scans employees table once, aggregates once, reuses results
```
**SQL Server Behavior:**
SQL Server usually materializes CTEs automatically (creates worktable spool). No explicit hint needed. Check execution plan to confirm.
**MySQL Behavior:**
MySQL 8.0+ treats CTEs as derived tables. Limited control over materialization. Optimizer decides based on cost.
**When to Materialize:**
1. **CTE referenced multiple times**: Most common reason
2. **Expensive computation**: Aggregations, joins, window functions
3. **Large result sets**: Scanning millions of rows
4. **Execution plan shows duplicate work**: Use EXPLAIN ANALYZE
**When NOT to Materialize:**
1. **CTE used only once**: Materialization adds overhead