SQL Practice Logo

SQLPractice Online

CTE Performance Optimization: Interview

Module: Subqueries & CTEs

What are the main strategies for optimizing CTE performance? Explain each.

Five main strategies:

1. **Check Execution Plan**: Use EXPLAIN ANALYZE to see actual performance. Shows execution time, rows scanned, index usage, materialization status. Never optimize without this data.

2. **Materialize Reused CTEs**: If CTE referenced 2+ times, add MATERIALIZED hint (PostgreSQL). Prevents duplicate execution. Check execution plan for "CTE Scan" vs inlined subquery.

3. **Filter Early**: Push WHERE conditions into CTE definition, not after. Reduces data volume early. Filtering 50M rows to 500K in CTE is 100x less work than filtering after.

4. **Index Join Columns**: CREATE INDEX on columns used in JOIN, WHERE, ORDER BY. Enables index scans instead of full table scans. Can be 60x faster.

5. **Consider Alternatives**: Use window functions for running totals/rankings (100x faster than correlated subqueries). Use JOINs for simple combinations. Use temp tables for very large intermediate results.

Priority: Check execution plan first, then apply optimizations based on what you find.

How do you determine if a CTE should be materialized? Walk through the decision process.

Decision process:

1. **Check how many times CTE is referenced**:

- Used once: Don't materialize (adds overhead)

- Used 2+ times: Consider materializing

2. **Check if CTE is expensive**:

- Simple filter (SELECT * WHERE): Don't materialize

- Aggregation (GROUP BY, SUM): Consider materializing

- Complex join: Consider materializing

- Correlated subquery: Definitely materialize

3. **Check execution plan**:

- Run EXPLAIN ANALYZE

- Look for "CTE Scan" (materialized) vs inlined subquery

- If CTE referenced 3 times but only executed once: Already materialized

- If CTE referenced 3 times and executed 3 times: Need MATERIALIZED

4. **Test both ways**:

- Run with MATERIALIZED: Check execution time

- Run with NOT MATERIALIZED: Check execution time

- Use whichever is faster

5. **Consider memory**:

- Large materialized CTEs use work_mem

- If CTE results are huge (>1GB), may spill to disk

- Consider temp table instead

Rule of thumb: Materialize if CTE is referenced 2+ times AND is expensive (aggregation, complex join).

Explain the difference between filtering inside a CTE vs after a CTE. Why does it matter for performance?

Key difference: WHERE the filtering happens in the execution pipeline.

Filtering AFTER CTE:

WITH all_data AS (

SELECT * FROM orders -- Scans 50M rows

)

SELECT * FROM all_data WHERE date >= '2024-01-01'; -- Filters to 500K

Execution: