Index Design & Selection: Mistakes

Module: Query Optimization & Performance

CREATE INDEX idx_category ON products(category); CREATE INDEX idx_price ON products(price); CREATE INDEX idx_rating ON products(rating);

CREATE INDEX idx_composite ON products(category, rating, price);

Three single-column indexes for query with WHERE category AND rating AND price. Optimizer uses one index (category), scans 500K rows, filters rest in memory. Composite index uses all three columns, scans only 5K rows. 8 seconds → 0.3 seconds (27x faster). Single indexes also waste storage (3 indexes vs 1) and slow writes (update 3 indexes vs 1).

Use composite indexes for multi-column WHERE clauses. Column order: equality before range, most selective first.

High

Optimizer picks one index, filters other conditions in memory

CREATE INDEX idx_wrong_order ON orders(created_at, user_id, status);

CREATE INDEX idx_right_order ON orders(user_id, status, created_at);

Query: WHERE user_id = 123 AND status = "pending" AND created_at > "2024-01-01". Wrong order (created_at first) scans all orders after 2024-01-01 (5M rows), then filters. Right order (user_id first) scans user 123 orders (10K rows), then filters. Selectivity: user_id (0.01%), status (20%), created_at (50%). Most selective first. 15 seconds → 0.5 seconds (30x faster).

Column order in composite index: most selective first, equality before range. Test with EXPLAIN to verify.

Critical

Index not used efficiently - scans too many rows

CREATE INDEX idx_gender ON users(gender);

No index needed - use table scan or composite index with other columns

Gender has 2 values in 1M rows (0.0002% selectivity). Index scans 500K rows (50% of table). Table scan is faster. Rule: Index columns with >1% selectivity. Low-selectivity columns only useful in composite indexes with selective columns first. Example: (user_id, gender) works, (gender) alone doesn't.

Only index columns with >1% selectivity. Check: SELECT COUNT(DISTINCT column) / COUNT(*) FROM table.

Medium

Index not selective enough - not used by optimizer

CREATE INDEX idx_users ON users(user_id, email, name, created_at, status); -- Never used

Monitor usage and DROP INDEX idx_users;

Index created "just in case" but never used (idx_scan = 0). Every INSERT/UPDATE must update this index (+0.005s overhead). With 1M writes/day, wastes 5000 seconds (1.4 hours) of CPU. Also wastes 500MB storage. Check: SELECT indexname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0. Remove unused indexes to improve write performance by 10-50%.

Monitor index usage monthly. Remove indexes with idx_scan = 0 after 30 days. Document index purpose.

High

Unused index slows writes and wastes storage

SELECT user_id, email, name FROM users WHERE user_id = 123; -- Table lookup

CREATE INDEX idx_covering ON users(user_id, email, name); -- Index-only scan

Query runs 10K times/second. Index on user_id finds row (fast), then looks up email and name from table (random I/O). With covering index including email and name, all data in index (index-only scan). No table access. 0.5 seconds → 0.02 seconds (25x faster). Trade-off: Larger index (300MB vs 100MB), but worth it for hot queries.

Add covering indexes for frequently-run queries. Include all SELECT columns. Check EXPLAIN for "Index Only Scan".

Medium

Index scan + table lookup (random I/O)