Recursive CTEs for Hierarchical Data: Interview
Module: Subqueries & CTEs
How do recursive CTEs work for hierarchical data? Explain the execution flow.
Recursive CTEs have two parts: base case and recursive case, separated by UNION ALL.
Base case: Selects root nodes (WHERE parent_id IS NULL). For org charts, that's the CEO. For category trees, it's top-level categories.
Recursive case: Joins the table to the CTE itself to find children. Each iteration finds the next level down.
Execution flow:
1. Run base case, get root nodes (Level 1)
2. Join table to Level 1 results, get Level 2
3. Join table to Level 2 results, get Level 3
4. Continue until no more children found or depth limit reached
5. UNION ALL combines all levels into final result
Example: Org chart with CEO → 2 VPs → 4 Managers → 8 Engineers
- Iteration 0: Find CEO (1 row)
- Iteration 1: Find VPs reporting to CEO (2 rows)
- Iteration 2: Find Managers reporting to VPs (4 rows)
- Iteration 3: Find Engineers reporting to Managers (8 rows)
- Iteration 4: No more employees, stop
- Result: 15 total rows (1+2+4+8)
Key: Each iteration uses results from previous iteration. Stops when no new rows found.
How do you prevent infinite recursion in recursive CTEs? What causes it?
Infinite recursion happens when the recursive case never stops finding new rows. Two main causes:
1. Circular references: A reports to B, B reports to C, C reports to A. Query loops forever.
2. Missing termination condition: No depth limit, query keeps recursing even after finding all rows.
Prevention methods:
1. Limit depth with level counter (works on all databases):
WHERE level < 10
2. Track visited nodes (PostgreSQL with arrays):
SELECT ..., ARRAY[id] as visited
WHERE NOT id = ANY(visited)
3. Use MAXRECURSION hint (SQL Server):
OPTION (MAXRECURSION 100)
4. Validate data: Ensure no circular references exist
SELECT * FROM employees e1
JOIN employees e2 ON e1.manager_id = e2.id
WHERE e2.manager_id = e1.id; -- Finds cycles
Best practice: Always include WHERE level < max_depth in recursive part, even if you think data is clean. Better safe than sorry.
What's the difference between top-down and bottom-up traversal? When would you use each?
Top-down (most common): Start at root, find all descendants
- Base case: WHERE parent_id IS NULL (root nodes)
- Recursive: JOIN child.parent_id = parent.id
- Use cases: Org charts, category trees, file systems, BOM explosion