How do you calculate the size of each subtree in a hierarchy?

Use recursive CTE to traverse hierarchy, then aggregate to count descendants. Method 1: Count in outer query WITH RECURSIVE tree AS ( SELECT id, parent_id, name FROM categories WHERE parent_id IS NULL UNION ALL SELECT c.id, c.parent_id, c.name FROM categories c JOIN tree t ON c.parent_id = t.id ) SELECT t1.id, t1.name, COUNT(t2.id) - 1 as descendant_count -- Subtract 1 for self FROM tree t1 LEFT JOIN tree t2 ON t2.path LIKE t1.path || '%' GROUP BY t1.id, t1.name; Method 2: Use

What are the performance implications of recursive CTEs on large hierarchies? How do you optimize?

Performance challenges: 1. Each iteration does a JOIN - can be expensive 2. Deep hierarchies (>10 levels) multiply the cost 3. Wide hierarchies (many children per node) increase row count 4. No index on parent column = full table scan per iteration Optimization strategies: 1. Index parent column (critical!): CREATE INDEX idx_parent ON table(parent_id); Benchmark: 1M rows, 10 levels - Without index: 45 seconds - With index: 0.8 seconds (50x faster!) 2. Limit recursion depth: WHERE level < 10

How would you handle a hierarchy where nodes can have multiple parents (DAG - Directed Acyclic Graph)?

Standard hierarchies are trees (one parent per node). DAGs allow multiple parents (e.g., employee reports to two managers, product in multiple categories). Challenges: 1. Node appears multiple times in result (once per path) 2. Counting descendants is tricky (avoid double-counting) 3. Cycle detection is critical Solution: Track full path to detect duplicates WITH RECURSIVE dag AS ( SELECT id, name, ARRAY[id] as path, 1 as depth FROM nodes WHERE is_root = true UNION

Recursive CTEs for Hierarchical Data: Interview

Q: How do recursive CTEs work for hierarchical data? Explain the execution flow.

Recursive CTEs have two parts: base case and recursive case, separated by UNION ALL. Base case: Selects root nodes (WHERE parent_id IS NULL). For org charts, that's the CEO. For category trees, it's top-level categories. Recursive case: Joins the table to the CTE itself to find children. Each iteration finds the next level down. Execution flow: 1. Run base case, get root nodes (Level 1) 2. Join table to Level 1 results, get Level 2 3. Join table to Level 2 results, get Level 3 4. Continue un

Q: How do you prevent infinite recursion in recursive CTEs? What causes it?

Infinite recursion happens when the recursive case never stops finding new rows. Two main causes: 1. Circular references: A reports to B, B reports to C, C reports to A. Query loops forever. 2. Missing termination condition: No depth limit, query keeps recursing even after finding all rows. Prevention methods: 1. Limit depth with level counter (works on all databases): WHERE level < 10 2. Track visited nodes (PostgreSQL with arrays): SELECT ..., ARRAY[id] as visited WHERE NOT id = ANY(visite

Q: What's the difference between top-down and bottom-up traversal? When would you use each?

Top-down (most common): Start at root, find all descendants - Base case: WHERE parent_id IS NULL (root nodes) - Recursive: JOIN child.parent_id = parent.id - Use cases: Org charts, category trees, file systems, BOM explosion - Example: Find all employees under VP Engineering Bottom-up: Start at leaf node, find all ancestors - Base case: WHERE id = specific_employee (starting point) - Recursive: JOIN parent.id = child.parent_id (reversed!) - Use cases: Approval chains, escalation paths, breadcru

Q: How would you handle a hierarchy where nodes can have multiple parents (DAG - Directed Acyclic Graph)?

Standard hierarchies are trees (one parent per node). DAGs allow multiple parents (e.g., employee reports to two managers, product in multiple categories). Challenges: 1. Node appears multiple times in result (once per path) 2. Counting descendants is tricky (avoid double-counting) 3. Cycle detection is critical Solution: Track full path to detect duplicates WITH RECURSIVE dag AS ( SELECT id, name, ARRAY[id] as path, 1 as depth FROM nodes WHERE is_root = true UNION

Module: Subqueries & CTEs

How do recursive CTEs work for hierarchical data? Explain the execution flow.

Recursive CTEs have two parts: base case and recursive case, separated by UNION ALL.

Base case: Selects root nodes (WHERE parent_id IS NULL). For org charts, that's the CEO. For category trees, it's top-level categories.

Recursive case: Joins the table to the CTE itself to find children. Each iteration finds the next level down.

Execution flow:

1. Run base case, get root nodes (Level 1)

2. Join table to Level 1 results, get Level 2

3. Join table to Level 2 results, get Level 3

4. Continue until no more children found or depth limit reached

5. UNION ALL combines all levels into final result

Example: Org chart with CEO → 2 VPs → 4 Managers → 8 Engineers

- Iteration 0: Find CEO (1 row)

- Iteration 1: Find VPs reporting to CEO (2 rows)

- Iteration 2: Find Managers reporting to VPs (4 rows)

- Iteration 3: Find Engineers reporting to Managers (8 rows)

- Iteration 4: No more employees, stop

- Result: 15 total rows (1+2+4+8)

Key: Each iteration uses results from previous iteration. Stops when no new rows found.

How do you prevent infinite recursion in recursive CTEs? What causes it?

Infinite recursion happens when the recursive case never stops finding new rows. Two main causes:

1. Circular references: A reports to B, B reports to C, C reports to A. Query loops forever.

2. Missing termination condition: No depth limit, query keeps recursing even after finding all rows.

Prevention methods:

1. Limit depth with level counter (works on all databases):

WHERE level < 10

2. Track visited nodes (PostgreSQL with arrays):

SELECT ..., ARRAY[id] as visited

WHERE NOT id = ANY(visited)

3. Use MAXRECURSION hint (SQL Server):

OPTION (MAXRECURSION 100)

4. Validate data: Ensure no circular references exist

SELECT * FROM employees e1

JOIN employees e2 ON e1.manager_id = e2.id

WHERE e2.manager_id = e1.id; -- Finds cycles

Best practice: Always include WHERE level < max_depth in recursive part, even if you think data is clean. Better safe than sorry.

What's the difference between top-down and bottom-up traversal? When would you use each?

Top-down (most common): Start at root, find all descendants

- Base case: WHERE parent_id IS NULL (root nodes)

- Recursive: JOIN child.parent_id = parent.id

- Use cases: Org charts, category trees, file systems, BOM explosion