SQL Practice Logo

SQLPractice Online

SQL Subqueries: The Complete Guide to Scalar, Correlated, EXISTS, IN and Derived-Table Subqueries

A subquery is a SELECT inside another SELECT — but that one sentence hides four very different patterns with very different performance characteristics. A scalar subquery returns a single value. A row subquery returns one row. A table subquery (derived table) returns a relation. A correlated subquery references the outer row and reruns once per outer row. This guide covers all four forms, the EXISTS / IN / ANY / ALL operators, the NULL-trap that breaks NOT IN, and exactly when to rewrite a subquery as a JOIN, a window function, or a CTE.

Last updated · SQL Practice Online team

What is a SQL subquery?

A subquery is a SELECT statement nested inside another statement. The outer statement (which can be SELECT, INSERT, UPDATE, DELETE, or even another subquery) uses the result of the inner SELECT as a value, a row, or a table. Subqueries are how SQL composes — they let you express "the rows where X depends on the result of another query" without resorting to procedural code.

There are four ways to classify a subquery, and you need both classifications to reason about it: by what it returns (scalar, row, or table) and by whether it depends on the outer query (correlated or non-correlated). Almost every subquery question — performance, NULL behaviour, whether to rewrite as a JOIN — is answered by figuring out which combination you have.

Classification: scalar vs row vs table, correlated vs non-correlated

  • Scalar subquery — returns exactly one column and at most one row. Usable anywhere a single value is allowed: SELECT list, WHERE = (...) , ORDER BY, expressions.
  • Row subquery — returns one row with multiple columns. Used with row constructors: WHERE (a, b) = (SELECT a, b FROM ...). Supported in PostgreSQL and MySQL; SQL Server requires rewriting.
  • Table subquery — returns a relation (multiple rows, multiple columns). Used in FROM (derived table), in EXISTS / NOT EXISTS, and as the right side of IN / NOT IN, ANY, ALL.
  • Non-correlated — the inner query can be evaluated once, independent of the outer query. The planner runs it once and reuses the result.
  • Correlated — the inner query references columns from the outer query. Logically, it is re-evaluated for each outer row. Modern planners often rewrite correlated subqueries into semi-joins or hash joins, but the mental model is "loop".

Subqueries in the SELECT list (scalar subqueries)

A scalar subquery in the SELECT list adds one computed column whose value is the result of an inner SELECT. The inner query MUST return at most one row and one column — more than one row is a runtime error in every dialect.

-- Per-customer order count + total spend, computed via scalar subqueries
SELECT
  c.customer_id,
  c.name,
  (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count,
  (SELECT COALESCE(SUM(total), 0) FROM orders o WHERE o.customer_id = c.customer_id) AS lifetime_value
FROM customers c
WHERE c.status = 'active';
  • Each scalar subquery in the SELECT list runs (logically) once per outer row. For N customers and M scalar subqueries you get O(N × M) inner evaluations — fine for tens of rows, terrible for millions.
  • For the same workload, a single LEFT JOIN to a pre-aggregated derived table or CTE is almost always faster: the planner can hash-join once instead of looping.
  • A scalar subquery returning zero rows yields NULL — not a runtime error. Use COALESCE if you need a default like 0.

Subqueries in FROM (derived tables)

A subquery in the FROM clause is called a derived table or inline view. It is logically identical to a CTE referenced exactly once: the inner query produces a relation, you alias it, and the outer query treats the alias as a table.

-- Pre-aggregate orders, then join to customers
SELECT c.customer_id, c.name, o.order_count, o.lifetime_value
FROM customers c
LEFT JOIN (
  SELECT customer_id,
         COUNT(*)         AS order_count,
         COALESCE(SUM(total), 0) AS lifetime_value
  FROM orders
  GROUP BY customer_id
) o ON o.customer_id = c.customer_id
WHERE c.status = 'active';
  • Every derived table must have an alias. PostgreSQL, MySQL and SQL Server all enforce this — `SELECT * FROM (SELECT ...)` without an alias is a syntax error.
  • Modern engines inline the derived table into the surrounding plan, just like a non-recursive CTE. There is no performance difference between a one-shot CTE and the equivalent derived table.
  • For multi-step logic referenced by name across the query, CTEs read better. For a single-shot inline aggregation, a derived table keeps the logic adjacent to where it is used.

Subqueries in WHERE: IN, NOT IN, EXISTS, NOT EXISTS, ANY, ALL

The WHERE clause is where most subqueries live, and where the most subtle bugs hide. The four operator families that take a subquery on the right are IN, EXISTS, ANY, and ALL — plus their NOT variants. They look interchangeable but have different NULL semantics and different planner behaviour.

-- IN: customer must appear in the orders table
SELECT * FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders WHERE order_date >= '2026-01-01');

-- EXISTS: same intent, semi-join semantics
SELECT * FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.customer_id AND o.order_date >= '2026-01-01'
);

-- ANY / ALL: comparison against every row in the subquery
SELECT * FROM products p
WHERE p.price > ALL (SELECT price FROM products WHERE category = 'budget');

SELECT * FROM products p
WHERE p.price < ANY (SELECT price FROM products WHERE category = 'premium');
  • IN evaluates the subquery, builds a set, and checks membership. Equivalent to = ANY (...). The subquery returns one column.
  • EXISTS returns true as soon as the subquery produces at least one row. The SELECT list is irrelevant — `SELECT 1` is a convention, not a requirement. EXISTS is a true semi-join.
  • ANY (= SOME) and ALL compare the outer expression to every row from the subquery. `> ALL` is "greater than the maximum"; `< ANY` is "less than the maximum"; the gotchas are when the subquery is empty (`> ALL (empty)` is TRUE, `> ANY (empty)` is FALSE).
  • On modern planners, IN and EXISTS produce identical plans for the same logical query. Pick the form that reads clearest. For NULLable columns, prefer EXISTS — see next section.

The NOT IN NULL trap (and why NOT EXISTS is always safer)

NOT IN with a subquery that can return NULL is the single most common silent bug in production SQL. SQL uses three-valued logic: comparing anything to NULL yields UNKNOWN, and `x NOT IN (a, b, NULL)` evaluates to `x != a AND x != b AND x != NULL` — and that last term is always UNKNOWN. The whole AND chain becomes UNKNOWN, which is treated as not-true, so the row is excluded. The query returns ZERO rows even when you expected matches.

-- BUG: returns zero rows whenever any order has a NULL customer_id
SELECT * FROM customers
WHERE customer_id NOT IN (SELECT customer_id FROM orders);

-- CORRECT: NOT EXISTS handles NULL cleanly
SELECT * FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);

-- ALSO CORRECT: filter NULLs out of the inner query
SELECT * FROM customers
WHERE customer_id NOT IN (
  SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);

Correlated subqueries: when the inner query depends on the outer row

A correlated subquery references columns from the outer query. The inner query cannot be evaluated once-and-cached — logically it runs again for each outer row. This makes correlated subqueries enormously powerful for "find the X for each Y" problems and a frequent performance trap when used carelessly.

-- For each customer, the date of their most recent order (correlated scalar)
SELECT
  c.customer_id,
  c.name,
  (SELECT MAX(o.order_date)
   FROM orders o
   WHERE o.customer_id = c.customer_id) AS last_order_date
FROM customers c;

-- For each product, the number of categories it outsells (correlated count)
SELECT p.product_id, p.name,
       (SELECT COUNT(*)
        FROM products p2
        WHERE p2.category = p.category AND p2.sales < p.sales) AS rank_in_category
FROM products p;

-- Top-1-per-group via correlated subquery (a window function is usually clearer)
SELECT * FROM orders o
WHERE o.total = (
  SELECT MAX(o2.total) FROM orders o2 WHERE o2.customer_id = o.customer_id
);
  • PostgreSQL, MySQL 8 and SQL Server will often "decorrelate" a correlated subquery, transforming it into a join with a derived table. EXPLAIN will show whether the loop survived.
  • For "top-N per group", a window function (ROW_NUMBER OVER PARTITION BY) is almost always faster and clearer than a correlated subquery — see the cross-link to the window functions pillar.
  • A correlated subquery in WHERE returning multiple values is a row/table subquery and must be wrapped in IN, EXISTS, ANY or ALL. A correlated scalar subquery (single value) can sit on either side of `=`, `<`, `>`, etc.

Subquery vs JOIN vs window function vs CTE: how to choose

Most subqueries can be rewritten as a JOIN, a window function, or a CTE — and most should be. The choice is about clarity for the reader and predictability for the planner.

  • Existence check (does at least one matching row exist?) — EXISTS / NOT EXISTS. A semi-join is the right primitive; do not rewrite as JOIN + DISTINCT, which is slower and obscures intent.
  • Membership check (is this value in a small fixed set?) — IN with a literal list, or IN with a subquery returning a NOT NULL column. Use NOT EXISTS instead of NOT IN when NULLs are possible.
  • Pre-aggregation (one row per group, joined back to detail) — derived table or CTE. This is faster than a correlated scalar subquery in SELECT once the outer set is more than a few thousand rows.
  • Top-N per group, ranking, running totals, period-over-period — window functions, every time. They cannot be expressed cleanly in correlated subqueries past simple cases.
  • Anti-join (rows on the left with no match on the right) — LEFT JOIN ... WHERE right.key IS NULL OR NOT EXISTS. Pick whichever the planner handles better on your workload; both are usually optimised to a hash anti-join.
  • Recursive traversal (hierarchies, graphs) — recursive CTE. A subquery cannot recurse; if you find yourself nesting EXISTS inside EXISTS for hierarchy traversal, switch to WITH RECURSIVE.

Performance: when subqueries are fast, when they are slow

  • A non-correlated scalar subquery is evaluated once and cached. Almost free even when used in many rows.
  • A non-correlated table subquery in FROM is inlined like a one-shot CTE. Performance is identical to writing the same logic as a JOIN.
  • A correlated scalar subquery in SELECT is logically O(N) where N is the size of the outer set. Modern planners often rewrite it into a hash aggregate + LEFT JOIN — check EXPLAIN to confirm.
  • IN and EXISTS produce identical plans on PostgreSQL 12+, MySQL 8, SQL Server 2017+ and Oracle 12c+. Pick by readability, not folklore about speed.
  • NOT IN with a NULLable subquery is a correctness bug, not a performance issue — see the NOT IN trap section. NOT EXISTS is the universally safe form.
  • A correlated subquery in WHERE with no supporting index turns into a nested-loops join with a full table scan inside the loop. Always have an index on the column the subquery filters by.

Dialect notes: PostgreSQL, MySQL, SQL Server, Oracle, SQLite

  • PostgreSQL: full subquery support including row subqueries, decorrelation of EXISTS/IN since 8.4, semi-join optimisation since 9.0. LATERAL subqueries (a join-in-FROM that can reference earlier FROM items) since 9.3 — invaluable for top-N-per-group when window functions are not enough.
  • MySQL: subquery optimisation was historically MySQL's weak spot. Versions 5.6 and earlier materialised IN/EXISTS subqueries naively. MySQL 8.0 has full semi-join optimisation, hash joins (8.0.18+) and is roughly on par with PostgreSQL. LATERAL is supported since 8.0.14.
  • SQL Server: full support including row subqueries (in some forms), CROSS APPLY / OUTER APPLY (the SQL Server name for LATERAL) since 2005. Decorrelation is mature.
  • Oracle: full standard support, LATERAL since 12c. Oracle's optimiser has been decorrelating subqueries for decades.
  • SQLite: no row subqueries, no LATERAL. Scalar, derived-table and EXISTS subqueries all work; correlated subqueries are not always decorrelated, so write JOINs explicitly when performance matters.

Practice: 7 hands-on lessons in the Subqueries module

Each topic above maps to a hands-on lesson in the Learning Mode curriculum. Lessons include runnable PostgreSQL examples, exercises against the HR, E-commerce and Banking schemas, and progressive difficulty from intermediate to expert.

Practice this in the editor

Frequently asked questions

What is a subquery in SQL?

A subquery is a SELECT statement nested inside another SQL statement. The outer statement uses the inner query's result as a value (scalar subquery), a row, or a table. Subqueries appear in the SELECT list, the FROM clause (as a derived table), the WHERE clause (with IN, EXISTS, ANY, ALL), and even in INSERT, UPDATE and DELETE.

What is the difference between IN and EXISTS?

Logically they express the same thing: "is there a matching row in the inner query?". On modern planners (PostgreSQL 12+, MySQL 8, SQL Server, Oracle) they produce identical execution plans. The practical difference is NULL handling: NOT IN with a NULLable inner column silently returns zero rows, while NOT EXISTS handles NULLs correctly. Always prefer EXISTS / NOT EXISTS unless you have a specific reason to use IN.

What is a correlated subquery?

A correlated subquery references columns from the outer query, so it cannot be evaluated once and cached — logically it reruns for each outer row. They are powerful for "for each X, find the Y" patterns. Modern planners often rewrite correlated subqueries as joins or semi-joins, but for top-N-per-group and similar problems, a window function is usually clearer and faster.

Is a subquery faster than a JOIN?

Usually identical, sometimes the JOIN wins. On PostgreSQL 12+, MySQL 8, SQL Server 2017+ and Oracle 12c+ the planner rewrites IN, EXISTS and most correlated subqueries into semi-joins or hash joins internally, so the execution plan is the same regardless of how you wrote it. Choose the form that reads clearer for the reader. Exceptions: NOT IN with NULLable values is broken (use NOT EXISTS), and correlated subqueries on engines without decorrelation (older MySQL, SQLite) can run as nested loops and be much slower than the equivalent JOIN.

When should I use a subquery instead of a CTE?

A non-recursive CTE referenced once is logically identical to a derived table (subquery in FROM). Use a CTE when the intermediate result has a meaningful name or is reused; use a derived table for one-shot inline aggregations where the logic should sit next to where it is used. Use a scalar subquery in WHERE / SELECT for single-value lookups. Always use a recursive CTE — never a subquery — for hierarchy traversal.

Why does my NOT IN query return zero rows?

Almost certainly because the subquery returns at least one NULL. SQL's three-valued logic makes `x NOT IN (a, b, NULL)` evaluate to UNKNOWN for every x, and UNKNOWN is treated as not-true. The fix is either to filter NULLs out of the inner query (`WHERE col IS NOT NULL`) or, better, to rewrite as NOT EXISTS, which has the correct semantics for free.

Can a subquery return multiple rows?

Yes — that is a table subquery. Use it in FROM (as a derived table), with EXISTS / NOT EXISTS, or with IN / NOT IN / ANY / ALL on the right side. A scalar subquery (used as a single value, e.g. in SELECT or in `WHERE x = (...)`) MUST return at most one row, or you get a runtime error in every dialect.

What is a LATERAL subquery?

A LATERAL subquery in the FROM clause is allowed to reference columns from earlier FROM items. Without LATERAL, a derived table is computed in isolation; with LATERAL, the subquery is re-evaluated for each row from the preceding tables — like a correlated subquery but in FROM. It is the SQL-standard way to express top-N-per-group and other "for each row, run this query" patterns. Supported as LATERAL in PostgreSQL/MySQL/Oracle and as CROSS APPLY / OUTER APPLY in SQL Server.