Index Types Deep Dive: Interview
Module: Query Optimization & Performance
Explain the difference between B-tree and Hash indexes. When would you choose each?
B-tree is a balanced tree structure that maintains sorted order. Supports =, <, >, BETWEEN, LIKE 'prefix%', ORDER BY. O(log n) lookup. Use for 90% of cases.
Hash applies hash function to key, maps to bucket. Supports only = (equality). O(1) lookup, 30% faster than B-tree for exact matches. But no range queries, no sorting, no partial matches.
Choose Hash when: 1) Only equality queries, 2) High cardinality (millions of unique values), 3) Never need ranges or sorting. Example: session_id, UUID lookups.
Choose B-tree when: Any range queries, sorting, or prefix matching needed. Default choice.
What is a GIN index and when should you use it? How does it differ from B-tree?
GIN (Generalized Inverted Index) is an inverted index that maps values to row locations. Like a book index: "postgresql" → pages 5, 23, 67.
Structure: Stores each token/element separately with list of rows containing it. One entry per unique value pointing to all matching rows.
Use cases:
1. Full-text search: to_tsvector/to_tsquery
2. JSONB queries: @>, ?, ?&, ?|
3. Array contains: @>, &&
4. Multiple values per row
Difference from B-tree:
- B-tree: One row → one value. Good for single-value columns.
- GIN: One value → many rows. Good for multi-value columns.
Performance: 40-100x faster for full-text search. But slower writes (updates posting lists) and larger storage (20-50% of table size).
Example: Product search with "laptop wireless" - B-tree with LIKE: 2000ms, GIN: 20ms.
Explain covering indexes. Why are they faster and what are the trade-offs?
Covering index includes all columns needed by query, enabling index-only scan without accessing table.
How it works:
1. Regular index: Read index → Get row pointer → Read table (2 I/O)
2. Covering index: Read index → Done (1 I/O)
Syntax:
PostgreSQL: CREATE INDEX idx ON table(col1) INCLUDE (col2, col3);
MySQL: CREATE INDEX idx ON table(col1, col2, col3);
Performance: 3-5x faster by eliminating table access.
Trade-offs:
+ Faster reads (3-5x)
- Slower writes (20-40% overhead)
- Larger storage (50-100% bigger index)
- More maintenance (VACUUM, reindex)
When to use:
1. Read-heavy queries (100:1 read:write ratio)
2. Hot queries (run 1000s of times/sec)
3. Queries selecting few columns
4. Latency-critical endpoints
Example: User dashboard query runs 10K times/sec. Adding covering index: 45ms → 12ms. Worth the storage cost.
What is a partial index? Give a real-world example where it provides significant benefit.
Partial index indexes only subset of rows matching WHERE condition. Smaller, faster, cheaper to maintain.