SQL Practice Logo

SQLPractice Online

SQL Aggregate Functions and GROUP BY: The Complete Guide to COUNT, SUM, AVG, MIN, MAX, HAVING and Conditional Aggregation

Aggregate functions are SQL's answer to "summarise this set of rows into one value". COUNT, SUM, AVG, MIN, MAX are the universal five; GROUP BY decides what "this set" means; HAVING filters the resulting groups; and a handful of patterns — conditional aggregation with CASE or FILTER, ROLLUP/CUBE for subtotals, COUNT(DISTINCT), STRING_AGG — turn them into a complete reporting toolkit. This guide covers every aggregate function, the precise rules for GROUP BY across PostgreSQL/MySQL/SQL Server/Oracle, the NULL behaviour that catches everyone once, and when to reach for a window function instead.

Last updated · SQL Practice Online team

What is a SQL aggregate function?

An aggregate function takes a set of rows and returns a single summary value. SUM(amount) takes N input rows and returns one number. COUNT(*) takes N rows and returns N. AVG(price) takes N rows and returns a single average. The defining contract is "many rows in, one row out" — the opposite of a window function (many in, many out) and very different from a scalar function (one in, one out).

Without GROUP BY, an aggregate collapses every row in the result set into a single row. With GROUP BY, the row set is partitioned into groups first, and the aggregate runs once per group, producing one output row per group. Every aggregate query is one of those two shapes.

COUNT, SUM, AVG, MIN, MAX: the universal five

-- One row of summary stats for the whole orders table
SELECT
  COUNT(*)         AS total_orders,
  COUNT(customer_id) AS orders_with_customer,
  COUNT(DISTINCT customer_id) AS unique_customers,
  SUM(total)       AS total_revenue,
  AVG(total)       AS average_order_value,
  MIN(total)       AS smallest_order,
  MAX(total)       AS largest_order
FROM orders
WHERE order_date >= '2026-01-01';
  • COUNT(*) counts every row, including rows with all-NULL columns. COUNT(column) counts only rows where that column is NOT NULL — this is the most common source of "off-by-N" bugs in aggregate queries.
  • COUNT(DISTINCT column) counts unique non-NULL values. It is fundamentally more expensive than COUNT(column) because the engine must build a hash set or sort the values; on huge tables, an approximate-distinct function (APPROX_COUNT_DISTINCT in SQL Server / BigQuery, HLL in Postgres) is often the right tradeoff.
  • SUM and AVG ignore NULL inputs. AVG of {10, 20, NULL} is 15, not 10 — the NULL is dropped, not treated as zero. If you want NULL = zero semantics, use AVG(COALESCE(x, 0)).
  • MIN and MAX work on every comparable type, not just numbers. MIN(date_column), MAX(string_column), MIN(boolean_column) are all valid and useful.
  • AVG of an integer column returns a numeric result on PostgreSQL/Oracle but truncates to integer on older MySQL configurations. Cast to a floating type when in doubt: AVG(price::numeric) or AVG(CAST(price AS DECIMAL)).

GROUP BY: partition the rows, then aggregate

GROUP BY says: before applying the aggregates, split the rows into groups based on the listed expressions; one output row will come back for each distinct combination of group values. The columns in the SELECT list must either appear in GROUP BY or be inside an aggregate function — every other column is ambiguous and either rejected by the engine or, on permissive engines, returns an arbitrary value.

-- Revenue per country per month
SELECT
  c.country,
  DATE_TRUNC('month', o.order_date) AS month,
  COUNT(*) AS order_count,
  SUM(o.total) AS revenue
FROM orders o
JOIN customers c USING (customer_id)
WHERE o.order_date >= '2026-01-01'
GROUP BY c.country, DATE_TRUNC('month', o.order_date)
ORDER BY c.country, month;
  • You can GROUP BY any expression — a column, a CASE, a date truncation, a string concatenation. Postgres/SQL Server/Oracle also let you GROUP BY 1, 2 (positional references to SELECT columns) for brevity.
  • PostgreSQL, SQL Server, Oracle and MySQL 5.7+ with ONLY_FULL_GROUP_BY enforce the "every non-aggregate must be in GROUP BY" rule strictly. MySQL with ONLY_FULL_GROUP_BY off lets you put random columns in SELECT and returns a non-deterministic value — every modern install should leave the strict mode on.
  • GROUP BY runs after WHERE and before HAVING. WHERE filters individual rows; HAVING filters groups. You cannot use an aggregate in WHERE — that is what HAVING is for.
  • GROUP BY is allowed without any aggregates in SELECT. `SELECT country FROM customers GROUP BY country` is exactly equivalent to `SELECT DISTINCT country FROM customers` — the planner usually produces the same plan.

HAVING: filtering groups after aggregation

WHERE filters rows before they reach the aggregate; HAVING filters groups after they are aggregated. If your filter condition references an aggregate function, it must go in HAVING. If it references only ordinary columns, it should go in WHERE — pushing filters as early as possible lets the engine throw away rows before the expensive aggregation step.

-- Customers who placed 5+ orders in 2026
SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS spend
FROM orders
WHERE order_date >= '2026-01-01'                  -- filter rows first
GROUP BY customer_id
HAVING COUNT(*) >= 5 AND SUM(total) > 1000        -- filter groups after
ORDER BY spend DESC;
  • HAVING can use aggregate expressions even when they are not in the SELECT list — `HAVING COUNT(*) > 5` is fine even if you do not select COUNT(*).
  • A non-aggregate predicate that only references GROUP BY columns can technically go in HAVING but should go in WHERE for performance. The planner cannot always rewrite the other direction.
  • HAVING runs after GROUP BY but before SELECT-list expression evaluation, so column aliases defined in SELECT are not visible in HAVING on most engines (PostgreSQL is a permissive exception). Reach for the underlying expression or wrap the query in a CTE.

COUNT vs COUNT DISTINCT vs DISTINCT vs GROUP BY

These four feel similar and are constantly confused in interviews. Each does something subtly different.

  • COUNT(*) — counts rows including all-NULL ones. Cheapest aggregate; the planner can use index-only scans.
  • COUNT(column) — counts rows where column IS NOT NULL. Use when "missing data" should not contribute to the count.
  • COUNT(DISTINCT column) — counts unique non-NULL values. Requires a sort or hash; on a 100M-row table this can dominate query time.
  • SELECT DISTINCT a, b — returns one row per unique (a, b) combination; no aggregation. Equivalent to GROUP BY a, b with no aggregates in SELECT.
  • GROUP BY a, b — partitions for aggregation. Combined with COUNT(*), gives you "how many rows fall in each group" — a different question from "how many distinct values".

Conditional aggregation: SUM(CASE WHEN ...) and FILTER

Conditional aggregation is how you build pivot tables and cohort-style reports without a self-join. The trick: an aggregate over a CASE expression counts (or sums) only the rows where the condition is true. PostgreSQL adds the cleaner FILTER clause, which does the same thing in standard SQL syntax.

-- Per-customer breakdown: lifetime, last 30 days, refunds
-- Portable form (every dialect)
SELECT
  customer_id,
  COUNT(*)                                                AS total_orders,
  SUM(total)                                              AS lifetime_value,
  COUNT(*) FILTER (WHERE order_date >= CURRENT_DATE - 30) AS orders_30d,  -- Postgres / SQLite
  SUM(CASE WHEN order_date >= CURRENT_DATE - 30 THEN total ELSE 0 END) AS spend_30d,
  SUM(CASE WHEN status = 'refunded' THEN total ELSE 0 END) AS refunded_total,
  COUNT(*) FILTER (WHERE status = 'refunded')             AS refund_count
FROM orders
GROUP BY customer_id;

-- Pivot: months as columns
SELECT
  customer_id,
  SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 1 THEN total ELSE 0 END) AS jan,
  SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 2 THEN total ELSE 0 END) AS feb,
  SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 3 THEN total ELSE 0 END) AS mar
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2026
GROUP BY customer_id;
  • For COUNT, use COUNT(CASE WHEN cond THEN 1 END) — NULLs are not counted, so the unmatched branch can be omitted (no ELSE needed).
  • For SUM, use SUM(CASE WHEN cond THEN value ELSE 0 END) — without an ELSE 0, unmatched rows produce NULL, which SUM ignores; the result is the same numerically but the explicit ELSE is clearer.
  • PostgreSQL's FILTER (WHERE ...) clause is standard SQL and significantly more readable than CASE for this pattern. SQLite supports it; MySQL and SQL Server do not (use CASE).
  • Conditional aggregation is the right tool for "wide" reports where each row is a key and each column is a metric. For "tall" reports (each row a metric, each column a key), use UNPIVOT or a UNION ALL.

NULL handling in aggregates: the rules everyone forgets

  • COUNT(*) counts NULL rows; COUNT(column) does not. This is the #1 reason aggregate queries return unexpected counts.
  • SUM, AVG, MIN, MAX skip NULL inputs. SUM of all-NULLs is NULL, not 0 — wrap in COALESCE if you need 0.
  • AVG drops NULLs from both numerator and denominator. AVG({10, 20, NULL}) = 15, not 10. If NULL should mean "treat as zero", use AVG(COALESCE(x, 0)).
  • COUNT of zero rows returns 0; SUM/AVG/MIN/MAX of zero rows return NULL. This asymmetry is built into the SQL standard and is responsible for many "report shows blank instead of 0" bugs.
  • ARRAY_AGG and STRING_AGG include NULLs by default in some dialects (PostgreSQL) and skip them in others. Always test or use FILTER (WHERE x IS NOT NULL) to be explicit.

ROLLUP, CUBE and GROUPING SETS: subtotals and grand totals

GROUPING SETS (and its shortcuts ROLLUP and CUBE) compute multiple GROUP BY combinations in a single query, with NULL marking the "all values" rows. They are the SQL-standard way to produce subtotals and grand totals without a UNION ALL of three queries.

-- Revenue by country, by region, and grand total — one query
SELECT
  country,
  region,
  SUM(total) AS revenue,
  GROUPING(country) AS country_is_total,
  GROUPING(region)  AS region_is_total
FROM orders o
JOIN customers c USING (customer_id)
GROUP BY ROLLUP (country, region)
ORDER BY country NULLS LAST, region NULLS LAST;

-- Equivalent to GROUP BY GROUPING SETS ((country, region), (country), ())
-- ROLLUP is the hierarchical case; CUBE generates every subset.
SELECT country, region, SUM(total)
FROM orders o JOIN customers c USING (customer_id)
GROUP BY GROUPING SETS ((country, region), (country), (region), ());
  • ROLLUP (a, b, c) generates the hierarchy (a, b, c), (a, b), (a), () — 4 grouping sets, 4 row types.
  • CUBE (a, b) generates every subset: (a, b), (a), (b), () — 2^N grouping sets for N columns; use only when you genuinely need every combination.
  • GROUPING(col) returns 1 for the "total" rows where col was rolled up (NULL in the data) and 0 for ordinary group rows. Use it in HAVING or in CASE expressions to label total rows distinctly from real NULLs.
  • PostgreSQL supports ROLLUP, CUBE and GROUPING SETS since 9.5. SQL Server, Oracle and MySQL 8.0.1+ all support them. SQLite does not.

String and array aggregation: STRING_AGG, GROUP_CONCAT, ARRAY_AGG

When you need to collapse a group of rows into a single concatenated string or array — comma-separated tag lists, the JSON object of dependent rows, an audit trail of values — every dialect has its own function.

-- PostgreSQL / SQL Server 2017+ / SQLite 3.44+ : STRING_AGG (standard SQL)
SELECT customer_id,
       STRING_AGG(product_name, ', ' ORDER BY purchased_at) AS purchases
FROM order_items
GROUP BY customer_id;

-- MySQL: GROUP_CONCAT
SELECT customer_id,
       GROUP_CONCAT(product_name ORDER BY purchased_at SEPARATOR ', ') AS purchases
FROM order_items
GROUP BY customer_id;

-- Oracle: LISTAGG
SELECT customer_id,
       LISTAGG(product_name, ', ') WITHIN GROUP (ORDER BY purchased_at) AS purchases
FROM order_items
GROUP BY customer_id;

-- PostgreSQL: ARRAY_AGG / JSON_AGG / JSONB_AGG for structured aggregation
SELECT customer_id, ARRAY_AGG(product_id ORDER BY purchased_at) AS product_ids
FROM order_items
GROUP BY customer_id;
  • STRING_AGG (PostgreSQL 9.0+, SQL Server 2017+, SQLite 3.44+) is the SQL-standard form. Prefer it where supported.
  • GROUP_CONCAT in MySQL has a default 1024-byte cutoff governed by group_concat_max_len. Long lists silently truncate — increase the limit or switch to JSON_ARRAYAGG.
  • LISTAGG in Oracle has a default 4000-character return limit; ON OVERFLOW TRUNCATE is supported in 12cR2+.
  • ARRAY_AGG / JSON_AGG (PostgreSQL) are the right choice when downstream code will parse the result — pass structured data instead of CSV.

Aggregate function vs window function: same names, different shape

Every aggregate function can also be used as a window function by adding an OVER clause. The math is identical; the shape of the result is different. SUM(x) collapses rows; SUM(x) OVER (...) keeps every input row and adds a running, partition or whole-set total to each one.

-- Aggregate: one row per customer
SELECT customer_id, SUM(total) AS lifetime_value
FROM orders
GROUP BY customer_id;

-- Window: every order row gets the customer's lifetime total alongside it
SELECT order_id, customer_id, total,
       SUM(total) OVER (PARTITION BY customer_id) AS customer_lifetime_value,
       total / SUM(total) OVER (PARTITION BY customer_id) AS share_of_lifetime
FROM orders;

Choose the aggregate form for "summary" queries (one row per group) and the window form for "detail with context" queries (every input row plus its group statistics). They compose cleanly: a CTE that computes per-customer lifetime via window functions can feed an outer GROUP BY for further reporting.

Performance: indexing and plan-reading for aggregates

  • COUNT(*) on InnoDB / heap tables requires a full table or index scan; PostgreSQL also requires a heap or index scan because of MVCC visibility. There is no engine where COUNT(*) is free, but a covering index on a small column makes it nearly so.
  • A composite index on (group_by_columns) lets the planner stream rows in group order and use a streaming aggregate (no hash, no sort). EXPLAIN ANALYZE will show "GroupAggregate" on Postgres or "Stream Aggregate" on SQL Server when this works.
  • Hash aggregation builds a hash table keyed by the GROUP BY columns; memory cost is O(distinct_keys). If work_mem (Postgres) or hash_aggregation_size is too small the engine spills to disk and performance falls off a cliff.
  • COUNT(DISTINCT column) is the most expensive common aggregate. For approximate answers on huge tables, APPROX_COUNT_DISTINCT (SQL Server, BigQuery) and the postgres_hll extension trade < 1% error for orders-of-magnitude speed.
  • SUM and AVG on integer / numeric columns are extremely cheap — the cost is dominated by I/O, not by the arithmetic. Aggregating a wide column or row pulls extra bytes from disk; project narrowly.

Dialect notes: PostgreSQL, MySQL, SQL Server, Oracle, SQLite

  • PostgreSQL: full ANSI compliance, FILTER clause, GROUPING SETS / ROLLUP / CUBE since 9.5, native ARRAY_AGG and JSON_AGG, percentile_cont and other ordered-set aggregates. ONLY_FULL_GROUP_BY semantics by default.
  • MySQL: 8.0+ has GROUPING SETS, ROLLUP, JSON_ARRAYAGG, JSON_OBJECTAGG. Pre-8.0 lacks ROLLUP without WITH ROLLUP shorthand and lacks JSON aggregation. ONLY_FULL_GROUP_BY is on by default since 5.7.
  • SQL Server: full GROUPING SETS / ROLLUP / CUBE since 2008, STRING_AGG since 2017, OVER clause for aggregates since 2012. STRING_AGG's separator must be a literal in some compatibility levels.
  • Oracle: long-standing support for everything in this guide; LISTAGG instead of STRING_AGG, MEDIAN as a built-in. Has had GROUPING SETS / ROLLUP / CUBE for decades.
  • SQLite: COUNT, SUM, AVG, MIN, MAX, GROUP_CONCAT, FILTER (3.30+), STRING_AGG (3.44+, 2023). No GROUPING SETS, no ROLLUP, no CUBE, no percentile aggregates.

Practice: 11 hands-on lessons in the Aggregates module

Each topic above maps to a hands-on lesson in the Learning Mode curriculum. Lessons include runnable PostgreSQL examples, exercises against the HR, E-commerce and Banking schemas, and progressive difficulty from beginner to expert.

Practice this in the editor

Frequently asked questions

What are aggregate functions in SQL?

Aggregate functions take a set of rows and return a single summary value. The five universal aggregates are COUNT (how many rows), SUM (total), AVG (average), MIN (smallest), and MAX (largest). Without GROUP BY they collapse the entire result set into one row; with GROUP BY they partition rows into groups and produce one row per group.

What is the difference between COUNT(*) and COUNT(column)?

COUNT(*) counts every row, including rows where every column is NULL. COUNT(column) counts only the rows where that column IS NOT NULL. So if a table has 100 rows but 20 of them have NULL in customer_id, COUNT(*) returns 100 and COUNT(customer_id) returns 80. This is the most common source of "off by N" bugs in aggregate queries.

What is the difference between WHERE and HAVING?

WHERE filters individual rows before aggregation; HAVING filters groups after aggregation. WHERE cannot reference aggregate functions; HAVING can. Always push as many predicates as possible into WHERE — filtering rows before they reach the aggregate is dramatically cheaper than filtering groups afterwards.

Does GROUP BY ignore NULL values?

No — GROUP BY treats NULL as its own group. All rows with NULL in the grouping column collapse into a single "NULL" group, and aggregate functions run over them like any other group. This differs from = comparisons, where NULL = NULL is UNKNOWN, not TRUE.

How do I count distinct values in SQL?

Use COUNT(DISTINCT column). For example, COUNT(DISTINCT customer_id) returns the number of unique non-NULL customer IDs. On very large tables COUNT(DISTINCT) is expensive because the engine must build a hash set or sort; for approximate answers use APPROX_COUNT_DISTINCT (SQL Server, BigQuery) or the postgres_hll extension.

What is conditional aggregation?

Conditional aggregation lets you sum or count only the rows that match a condition, in a single aggregate query. The portable form is SUM(CASE WHEN cond THEN value ELSE 0 END) or COUNT(CASE WHEN cond THEN 1 END). PostgreSQL and SQLite support the cleaner FILTER clause: SUM(value) FILTER (WHERE cond). It is the standard tool for building pivot tables and cohort reports without self-joins.

Why does SUM return NULL instead of 0?

When the input set is empty (or all-NULL), SUM, AVG, MIN and MAX return NULL by SQL standard. Only COUNT returns 0 in that case. To force a zero, wrap the aggregate in COALESCE: COALESCE(SUM(amount), 0). This is the fix for "dashboard shows blank instead of 0" bugs after a filter excludes every row.

When should I use a window function instead of GROUP BY?

Use GROUP BY when you want one output row per group (a summary). Use a window function (the same SUM/AVG/COUNT plus an OVER clause) when you want to keep every input row and add group-level statistics alongside each row. Window functions also handle running totals, ranking, and top-N-per-group, which GROUP BY cannot express directly.