Aggregate Query Optimization: Concept

Module: Query Optimization & Performance

Aggregate optimization is about making GROUP BY, COUNT, SUM, AVG queries fast on large datasets. The problem: aggregates must process every row in a group, which means scanning millions of rows. The solution: reduce rows early with WHERE, use indexes to avoid sorting, and pre-compute expensive aggregates.

Think of it like counting votes. Bad approach: count all 10 million votes every time someone asks for results (slow aggregate). Good approach: keep running totals as votes come in (incremental aggregation). The difference: 10 million operations vs instant lookup.

**Why Aggregates Are Slow:**

1. **Full Table Scans** - Must read all rows to compute aggregates

- COUNT(*) on 100M rows = scan 100M rows

- No way to skip rows without filtering

- Time: O(n) where n = table size

2. **Sorting for GROUP BY** - Database must sort or hash to group

- GROUP BY without index = sort entire table

- 10M rows × log(10M) = 230M comparisons

- Memory: May spill to disk if too large

3. **Multiple Passes** - Some aggregates require multiple scans

- DISTINCT COUNT = scan + deduplicate

- Percentiles = scan + sort + calculate

- Time: 2-3x slower than simple COUNT

**Optimization Strategies:**

**1. Index GROUP BY Columns**

Without index:

SELECT department, COUNT(*) FROM employees GROUP BY department;

- Full table scan: 10M rows

- Sort by department: 230M comparisons

- Time: 12 seconds

With index on (department):

CREATE INDEX idx_dept ON employees(department);

- Index scan: rows already sorted by department

- No sorting needed

- Time: 0.8 seconds (15x faster)

How it works: Index stores rows in department order, so database can read sequentially and count as it goes. No sorting step needed.

**2. Covering Indexes (Index-Only Scans)**

Regular index:

CREATE INDEX idx_dept ON employees(department);

SELECT department, AVG(salary) FROM employees GROUP BY department;

- Index scan for grouping: 10M rows

- Table lookup for salary: 10M lookups

- Time: 8 seconds

Covering index:

CREATE INDEX idx_dept_salary ON employees(department, salary);

SELECT department, AVG(salary) FROM employees GROUP BY department;

- Index scan only: no table access needed

- All data in index: department + salary