How do you choose the right column order for a multi-column index?

Multi-column indexes work left-to-right. Index on (A, B, C) helps queries filtering by: - A only ✓ - A and B ✓ - A, B, and C ✓ - B only ✗ (cannot use index) - C only ✗ (cannot use index) Column order rules: 1. Equality columns first (WHERE col = ?) 2. Range columns second (WHERE col > ?) 3. Sort columns last (ORDER BY col) 4. High selectivity first (filters most rows) Example - Order search: Bad order: CREATE INDEX idx ON orders(order_date, customer_id, status); Query: WHERE customer_id = 12

Index Types Deep Dive: Interview

Module: Query Optimization & Performance

Explain the difference between B-tree and Hash indexes. When would you choose each?

B-tree is a balanced tree structure that maintains sorted order. Supports =, <, >, BETWEEN, LIKE 'prefix%', ORDER BY. O(log n) lookup. Use for 90% of cases.

Hash applies hash function to key, maps to bucket. Supports only = (equality). O(1) lookup, 30% faster than B-tree for exact matches. But no range queries, no sorting, no partial matches.

Choose Hash when: 1) Only equality queries, 2) High cardinality (millions of unique values), 3) Never need ranges or sorting. Example: session_id, UUID lookups.

Choose B-tree when: Any range queries, sorting, or prefix matching needed. Default choice.

What is a GIN index and when should you use it? How does it differ from B-tree?

GIN (Generalized Inverted Index) is an inverted index that maps values to row locations. Like a book index: "postgresql" → pages 5, 23, 67.

Structure: Stores each token/element separately with list of rows containing it. One entry per unique value pointing to all matching rows.

Use cases:

1. Full-text search: to_tsvector/to_tsquery

2. JSONB queries: @>, ?, ?&, ?|

3. Array contains: @>, &&

4. Multiple values per row

Difference from B-tree:

- B-tree: One row → one value. Good for single-value columns.

- GIN: One value → many rows. Good for multi-value columns.

Performance: 40-100x faster for full-text search. But slower writes (updates posting lists) and larger storage (20-50% of table size).

Example: Product search with "laptop wireless" - B-tree with LIKE: 2000ms, GIN: 20ms.

Explain covering indexes. Why are they faster and what are the trade-offs?

Covering index includes all columns needed by query, enabling index-only scan without accessing table.

How it works:

1. Regular index: Read index → Get row pointer → Read table (2 I/O)

2. Covering index: Read index → Done (1 I/O)

Syntax:

PostgreSQL: CREATE INDEX idx ON table(col1) INCLUDE (col2, col3);

MySQL: CREATE INDEX idx ON table(col1, col2, col3);

Performance: 3-5x faster by eliminating table access.

Trade-offs:

+ Faster reads (3-5x)

- Slower writes (20-40% overhead)

- Larger storage (50-100% bigger index)

- More maintenance (VACUUM, reindex)

When to use:

1. Read-heavy queries (100:1 read:write ratio)

2. Hot queries (run 1000s of times/sec)

3. Queries selecting few columns

4. Latency-critical endpoints

Example: User dashboard query runs 10K times/sec. Adding covering index: 45ms → 12ms. Worth the storage cost.

What is a partial index? Give a real-world example where it provides significant benefit.

Partial index indexes only subset of rows matching WHERE condition. Smaller, faster, cheaper to maintain.