Recursive CTEs for Hierarchical Data: Concept
Module: Subqueries & CTEs
Hierarchical data has parent-child relationships where each record points to its parent. Think of an organization chart: each employee reports to a manager, who reports to their manager, and so on up to the CEO. Traditional SQL struggles with this because you don't know how many levels deep the hierarchy goes. You'd need to write separate JOINs for each level, which is impossible when depth varies. Recursive CTEs solve this by repeatedly joining the table to itself until no more children are found. They're perfect for org charts, category trees, file systems, bill-of-materials, and any nested structure.
**
**How Recursive CTEs Work for Hierarchies:**
1. **Base Case (Anchor)**: Start with root nodes (records with no parent). For org charts, that's the CEO (manager_id IS NULL). For category trees, it's top-level categories (parent_id IS NULL).
2. **Recursive Case**: Join the table to the CTE to find children. Each iteration finds the next level down. The query keeps running until no more children are found.
3. **Level Tracking**: Add a level counter that increments with each iteration. Level 1 is the root, level 2 is direct children, level 3 is grandchildren, etc.
4. **Path Building**: Concatenate names as you traverse to show the full path (CEO → VP Engineering → Engineering Manager).
**Execution Flow Example:**
```sql
-- Iteration 0 (Base Case): Find root nodes
SELECT id, name, manager_id, 1 as level, name as path
FROM employees
WHERE manager_id IS NULL;
-- Result: CEO (level 1)
-- Iteration 1: Find children of CEO
SELECT e.id, e.name, e.manager_id, 2 as level, 'CEO → ' || e.name
FROM employees e
JOIN previous_result ON e.manager_id = previous_result.id;
-- Result: VP Engineering, VP Sales (level 2)
-- Iteration 2: Find children of VPs
SELECT e.id, e.name, e.manager_id, 3 as level, path || ' → ' || e.name
FROM employees e
JOIN previous_result ON e.manager_id = previous_result.id;
-- Result: Engineering Manager, QA Manager, Sales Managers (level 3)
-- Continues until no more children found
```
**Key Patterns:**
1. **Top-Down Traversal** (most common): Start at root, find all descendants
2. **Bottom-Up Traversal**: Start at leaf node, find all ancestors up to root
3. **Subtree Queries**: Start at any node, find all descendants of that subtree
4. **Sibling Queries**: Find all nodes at the same level
**Circular Reference Detection:**
Hierarchies can have cycles (A reports to B, B reports to C, C reports to A). This causes infinite recursion. Solutions:
```sql
-- Method 1: Limit depth
WHERE level < 20
-- Method 2: Track visited nodes (PostgreSQL)
WITH RECURSIVE tree AS (
SELECT id, parent_id, ARRAY[id] as path
FROM categories WHERE parent_id IS NULL