SQL Practice Logo

SQLPractice Online

Common Table Expressions: Interview

Module: Subqueries & CTEs

What is a CTE and how is it different from a subquery or derived table?

A CTE (Common Table Expression) is a named temporary result set defined using the WITH clause. It's functionally similar to a derived table (subquery in FROM clause) but with key differences: (1) CTEs are defined at the top of the query with clear names, making them more readable. (2) CTEs can be referenced multiple times in the same query without rewriting. (3) CTEs can reference other CTEs defined earlier in the same query, enabling chaining. (4) CTEs support recursion (recursive CTEs). Performance is usually identical to derived tables - the main benefit is readability and maintainability. Example: WITH active_users AS (SELECT * FROM users WHERE active = true) SELECT * FROM active_users; vs SELECT * FROM (SELECT * FROM users WHERE active = true) AS active_users;

Are CTEs materialized? Do they execute once or multiple times?

It depends on the database and how the CTE is used. CTEs are NOT always materialized (stored as temp table). Behavior varies: PostgreSQL by default may inline CTEs (substitute into main query), but you can use MATERIALIZED hint to force temp table creation. MySQL typically inlines CTEs. SQL Server usually materializes them. If a CTE is referenced multiple times and not materialized, it might execute multiple times. Always check the execution plan with EXPLAIN to see actual behavior. Use MATERIALIZED hint (PostgreSQL) if you need guaranteed single execution. For very large datasets used multiple times, consider temp tables with indexes instead.

When should you use a CTE vs a derived table vs a temp table?

Use CTE when: (1) You want readable, self-documenting code with named steps. (2) You need to reference the same subquery multiple times. (3) You're chaining multiple logical steps. (4) You need recursion. Use derived table when: (1) Simple one-time subquery that's already clear. (2) Working with older databases without CTE support. Use temp table when: (1) Result set is very large and used multiple times. (2) You need indexes on intermediate results. (3) You need to reuse data across multiple queries. (4) Performance testing shows temp table is faster. Rule of thumb: Start with CTE for readability. If performance issues arise, consider temp table with indexes.

Can you chain CTEs? How does it work?

Yes, CTEs can reference other CTEs defined earlier in the same WITH clause. This creates a pipeline where each CTE builds on previous ones. Example: WITH step1 AS (SELECT * FROM employees WHERE active = true), step2 AS (SELECT * FROM step1 WHERE salary > 50000), step3 AS (SELECT department, AVG(salary) FROM step2 GROUP BY department) SELECT * FROM step3; Here step2 references step1, and step3 references step2. This makes complex logic very readable - each step has a clear purpose. Order matters: you can only reference CTEs defined earlier, not later. This pattern is excellent for breaking down complex business logic into understandable steps.

Write a query using multiple CTEs to find customers who spent more than the average customer in their region.

WITH

-- Step 1: Calculate total spending per customer

customer_spending AS (

SELECT

c.customer_id,

c.name,

c.region,

COALESCE(SUM(o.total), 0) AS total_spent

FROM customers c

LEFT JOIN orders o ON o.customer_id = c.customer_id

GROUP BY c.customer_id, c.name, c.region

),

-- Step 2: Calculate average spending per region

regional_averages AS (

SELECT

region,

AVG(total_spent) AS avg_regional_spending

FROM customer_spending

GROUP BY region

)

-- Step 3: Find customers above their regional average

SELECT

cs.customer_id,

cs.name,

cs.region,

cs.total_spent,

ra.avg_regional_spending,

cs.total_spent - ra.avg_regional_spending AS diff_from_avg

FROM customer_spending cs

JOIN regional_averages ra ON ra.region = cs.region

WHERE cs.total_spent > ra.avg_regional_spending