SQL Practice Logo

SQLPractice Online

Index Design & Selection: Interview

Module: Query Optimization & Performance

Explain why column order matters in composite indexes. How do you determine the optimal order?

Column order in composite indexes is critical because indexes work like a phone book - sorted by first column, then second, then third.

**How Composite Indexes Work:**

Index on (last_name, first_name, age):

- Sorted by last_name first

- Within each last_name, sorted by first_name

- Within each first_name, sorted by age

Can efficiently find:

- WHERE last_name = 'Smith' (uses first column)

- WHERE last_name = 'Smith' AND first_name = 'John' (uses first + second)

- WHERE last_name = 'Smith' AND first_name = 'John' AND age = 30 (uses all three)

Cannot efficiently find:

- WHERE first_name = 'John' (doesn't start with last_name)

- WHERE age = 30 (doesn't start with last_name)

This is the "leftmost prefix" rule.

**Determining Optimal Order:**

1. **Equality before range:**

- Equality: WHERE status = 'active'

- Range: WHERE created_at > '2024-01-01'

- Put equality conditions first

2. **Most selective first:**

- Selectivity = COUNT(DISTINCT column) / COUNT(*)

- Higher selectivity = fewer rows

- Example: user_id (100K unique) before status (5 unique)

3. **Consider query patterns:**

- If queries always filter by user_id, put it first

- Even if less selective than other columns

**Example:**

Query: WHERE user_id = 123 AND status = 'pending' AND created_at > '2024-01-01'

Data:

- user_id: 100K unique (high selectivity)

- status: 5 unique (low selectivity)

- created_at: continuous (range)

Optimal order: (user_id, status, created_at)

- user_id first: Equality, most selective (narrows to 100 rows)

- status second: Equality (narrows to 20 rows)

- created_at last: Range (narrows to 5 rows)

Wrong order: (created_at, status, user_id)

- created_at first: Range, scans 5M rows

- Then filters by status and user_id