Aggregate Query Optimization: Concept
Module: Query Optimization & Performance
Aggregate optimization is about making GROUP BY, COUNT, SUM, AVG queries fast on large datasets. The problem: aggregates must process every row in a group, which means scanning millions of rows. The solution: reduce rows early with WHERE, use indexes to avoid sorting, and pre-compute expensive aggregates.
Think of it like counting votes. Bad approach: count all 10 million votes every time someone asks for results (slow aggregate). Good approach: keep running totals as votes come in (incremental aggregation). The difference: 10 million operations vs instant lookup.
**Why Aggregates Are Slow:**
1. **Full Table Scans** - Must read all rows to compute aggregates
- COUNT(*) on 100M rows = scan 100M rows
- No way to skip rows without filtering
- Time: O(n) where n = table size
2. **Sorting for GROUP BY** - Database must sort or hash to group
- GROUP BY without index = sort entire table
- 10M rows × log(10M) = 230M comparisons
- Memory: May spill to disk if too large
3. **Multiple Passes** - Some aggregates require multiple scans
- DISTINCT COUNT = scan + deduplicate
- Percentiles = scan + sort + calculate
- Time: 2-3x slower than simple COUNT
**Optimization Strategies:**
**1. Index GROUP BY Columns**
Without index:
SELECT department, COUNT(*) FROM employees GROUP BY department;
- Full table scan: 10M rows
- Sort by department: 230M comparisons
- Time: 12 seconds
With index on (department):
CREATE INDEX idx_dept ON employees(department);
- Index scan: rows already sorted by department
- No sorting needed
- Time: 0.8 seconds (15x faster)
How it works: Index stores rows in department order, so database can read sequentially and count as it goes. No sorting step needed.
**2. Covering Indexes (Index-Only Scans)**
Regular index:
CREATE INDEX idx_dept ON employees(department);
SELECT department, AVG(salary) FROM employees GROUP BY department;
- Index scan for grouping: 10M rows
- Table lookup for salary: 10M lookups
- Time: 8 seconds
Covering index:
CREATE INDEX idx_dept_salary ON employees(department, salary);
SELECT department, AVG(salary) FROM employees GROUP BY department;
- Index scan only: no table access needed
- All data in index: department + salary