Recursive CTEs: Interview

Q: Why must recursive CTEs use UNION ALL instead of UNION?

UNION removes duplicates by sorting/hashing results, which breaks recursion logic. UNION ALL preserves all rows including duplicates, allowing proper iteration. Exception: graph traversal where you want to eliminate duplicate paths, but then you need explicit cycle detection with path tracking.

Q: How do you prevent infinite recursion in a recursive CTE?

Three methods: 1) Add depth limit: WHERE level < 10. 2) Natural termination: JOIN returns no rows. 3) Cycle detection: Track path in array, check WHERE NOT new_id = ANY(path). 4) MAXRECURSION hint (SQL Server): OPTION (MAXRECURSION 100). Always use at least one method.

Q: What are the performance implications of recursive CTEs?

Can be expensive without optimization. Each iteration scans data. Without index on join column: full table scan per iteration (7 levels = 7 scans). With index: efficient seeks (45s → 0.3s). Memory: each iteration held in memory. Deep hierarchies (>10 levels) use GBs. Optimize: index join columns, add depth limits, use covering indexes.

Module: Subqueries & CTEs

What is a recursive CTE and how does it differ from a regular CTE?

A recursive CTE references itself in its definition, enabling iteration. Regular CTE is one-time query. Recursive CTE has anchor member (base case) and recursive member (iteration). Executes iteratively: anchor once, then recursive member repeatedly using previous iteration's results until no new rows. Used for hierarchies, graphs, sequences.

Explain the execution flow of a recursive CTE with an example.

Example: Generate 1-5. Iteration 0: Anchor returns 1. Iteration 1: Recursive takes 1, returns 2. Iteration 2: Takes 2, returns 3. Iteration 3: Takes 3, returns 4. Iteration 4: Takes 4, returns 5. Iteration 5: Takes 5, WHERE n<5 fails, returns nothing. Stops. Final result: 1,2,3,4,5. Each iteration uses previous iteration's output.

Why must recursive CTEs use UNION ALL instead of UNION?

UNION removes duplicates by sorting/hashing results, which breaks recursion logic. UNION ALL preserves all rows including duplicates, allowing proper iteration. Exception: graph traversal where you want to eliminate duplicate paths, but then you need explicit cycle detection with path tracking.

How do you prevent infinite recursion in a recursive CTE?

Three methods: 1) Add depth limit: WHERE level < 10. 2) Natural termination: JOIN returns no rows. 3) Cycle detection: Track path in array, check WHERE NOT new_id = ANY(path). 4) MAXRECURSION hint (SQL Server): OPTION (MAXRECURSION 100). Always use at least one method.

What are the performance implications of recursive CTEs?

Can be expensive without optimization. Each iteration scans data. Without index on join column: full table scan per iteration (7 levels = 7 scans). With index: efficient seeks (45s → 0.3s). Memory: each iteration held in memory. Deep hierarchies (>10 levels) use GBs. Optimize: index join columns, add depth limits, use covering indexes.

When would you use recursive CTE vs materialized path vs nested sets?

Recursive CTE: flexible, good for writes, slower reads. Materialized path: store full path (/1/5/23/), fastest reads, slower writes. Nested sets: store left/right values, fast subtree queries, very slow writes. Choose based on read/write ratio: read-heavy = materialized path, write-heavy = recursive CTE, mixed = recursive CTE with caching.

Write a recursive CTE to find all employees reporting to manager ID 5, including indirect reports.

WITH RECURSIVE reports AS (

SELECT id, name, manager_id, 0 AS level

FROM employees WHERE manager_id = 5

UNION ALL

SELECT e.id, e.name, e.manager_id, r.level + 1

FROM employees e JOIN reports r ON e.manager_id = r.id

WHERE r.level < 10

)

SELECT * FROM reports ORDER BY level, name;

Anchor finds direct reports (manager_id=5). Recursive joins employees to previous level's IDs. Level tracks depth. Terminates at 10 levels to prevent infinite loops.

Write a recursive CTE to generate all dates in January 2024.

WITH RECURSIVE dates AS (

SELECT DATE '2024-01-01' AS date

UNION ALL

SELECT date + INTERVAL '1 day'

FROM dates

WHERE date < DATE '2024-01-31'

)

SELECT date, TO_CHAR(date, 'Day') AS day_name FROM dates;

Anchor starts at Jan 1. Recursive adds 1 day. Terminates at Jan 31. Useful for filling gaps, generating calendars, time series analysis.

Write a recursive CTE to find the shortest path between two nodes in a graph.

WITH RECURSIVE paths AS (

SELECT node_id, target_id, 1 AS distance, ARRAY[node_id, target_id] AS path

FROM edges WHERE node_id = 1

UNION ALL

SELECT p.node_id, e.target_id, p.distance + 1, p.path || e.target_id

FROM edges e JOIN paths p ON e.node_id = p.target_id