SQL Practice Logo

SQLPractice Online

Aggregate Query Optimization: Interview

Module: Query Optimization & Performance

Explain the difference between WHERE and HAVING in aggregate queries. When would using HAVING instead of WHERE cause performance problems?

WHERE filters rows BEFORE aggregation. HAVING filters groups AFTER aggregation.

**Execution Order:**

1. FROM - Get tables

2. WHERE - Filter rows (reduces data early)

3. GROUP BY - Group remaining rows

4. Aggregate functions - Compute COUNT, SUM, etc.

5. HAVING - Filter groups

6. SELECT - Return results

**Performance Impact:**

Bad (HAVING for row filtering):

SELECT department, AVG(salary)

FROM employees

GROUP BY department

HAVING department IN ('Sales', 'Engineering');

- Aggregates all 50 departments (10M employees)

- Then filters to 2 departments

- Wasted work: aggregated 48 unnecessary departments

- Time: 12 seconds

Good (WHERE for row filtering):

SELECT department, AVG(salary)

FROM employees

WHERE department IN ('Sales', 'Engineering')

GROUP BY department;

- Filters to 2 departments first (400K employees)

- Aggregates only relevant rows

- Time: 0.5 seconds (24x faster)

**When to Use Each:**

WHERE:

- Filter individual rows

- Conditions on non-aggregated columns

- Example: WHERE active = true, WHERE salary > 50000

- Benefit: Reduces rows before expensive aggregation

HAVING:

- Filter aggregated results

- Conditions on aggregate functions

- Example: HAVING COUNT(*) > 10, HAVING AVG(salary) > 75000

- Use case: "Show departments with more than 100 employees"

**Rule:** Use WHERE whenever possible. Only use HAVING for conditions that require aggregation.

What is a covering index and why is it particularly effective for aggregate queries? Explain with an example.