Explain incremental aggregation. When would you use it instead of materialized views? Provide a real-world scenario.

Incremental aggregation maintains running totals that update with each transaction, rather than recomputing from raw data. **Concept:** Recompute approach: - Every query: scan all data, compute aggregate - Cost: O(n) per query where n = data size - Example: COUNT orders today = scan all orders Incremental approach: - Maintain counter: orders_today = 1247 - On new order: orders_today++ - Query: SELECT orders_today (instant) - Cost: O(1) per query, O(1) per write **Implementation:** -- Summar

Aggregate Query Optimization: Interview

Q: What is a covering index and why is it particularly effective for aggregate queries? Explain with an example.

A covering index contains ALL columns needed by a query, allowing the database to satisfy the query entirely from the index without accessing the table ("index-only scan"). **Why Effective for Aggregates:** 1. **Eliminates Table Lookups** - Regular index: provides row locations, must lookup table for data - Covering index: contains all needed data, no table access - Savings: Eliminates millions of random I/O operations 2. **Sequential I/O** - Index stored sequentially on disk -

Module: Query Optimization & Performance

Explain the difference between WHERE and HAVING in aggregate queries. When would using HAVING instead of WHERE cause performance problems?

WHERE filters rows BEFORE aggregation. HAVING filters groups AFTER aggregation.

**Execution Order:**

1. FROM - Get tables

2. WHERE - Filter rows (reduces data early)

3. GROUP BY - Group remaining rows

4. Aggregate functions - Compute COUNT, SUM, etc.

5. HAVING - Filter groups

6. SELECT - Return results

**Performance Impact:**

Bad (HAVING for row filtering):

SELECT department, AVG(salary)

FROM employees

GROUP BY department

HAVING department IN ('Sales', 'Engineering');

- Aggregates all 50 departments (10M employees)

- Then filters to 2 departments

- Wasted work: aggregated 48 unnecessary departments

- Time: 12 seconds

Good (WHERE for row filtering):

SELECT department, AVG(salary)

FROM employees

WHERE department IN ('Sales', 'Engineering')

GROUP BY department;

- Filters to 2 departments first (400K employees)

- Aggregates only relevant rows

- Time: 0.5 seconds (24x faster)

**When to Use Each:**

WHERE:

- Filter individual rows

- Conditions on non-aggregated columns

- Example: WHERE active = true, WHERE salary > 50000

- Benefit: Reduces rows before expensive aggregation

HAVING:

- Filter aggregated results

- Conditions on aggregate functions

- Example: HAVING COUNT(*) > 10, HAVING AVG(salary) > 75000

- Use case: "Show departments with more than 100 employees"

**Rule:** Use WHERE whenever possible. Only use HAVING for conditions that require aggregation.

What is a covering index and why is it particularly effective for aggregate queries? Explain with an example.