Index Types Deep Dive: Real-World
Module: Query Optimization & Performance
At Uber, choosing the right index type reduced geospatial query time from 800ms to 12ms. At Reddit, GIN indexes made full-text search 40x faster. Index type selection is the difference between a slow app and a fast one.
Reddit - Full-Text Search Optimization
Full-text search across 50M posts and comments was taking 5-10 seconds. Users were frustrated with slow search results. Challenge: Migrate from B-tree LIKE queries to GIN indexes.
Migrated from B-tree with LIKE queries to GIN indexes with PostgreSQL full-text search (to_tsvector/to_tsquery). Added partial indexes for recent content (last 90 days) which covers 80% of searches.
-- Before: B-tree with LIKE (slow)
CREATE INDEX idx_content_old ON posts(content);
SELECT * FROM posts WHERE content LIKE '%postgresql%';
-- Time: 5000ms (full index scan)
-- After: GIN with full-text search (fast)
CREATE INDEX idx_content_fts ON posts
USING gin(to_tsvector('english', title || ' ' || content));
-- Partial index for recent posts (80% of queries)
CREATE INDEX idx_content_recent ON posts
USING gin(to_tsvector('english', title || ' ' || content))
WHERE created_at > NOW() - INTERVAL '90 days';
-- Query with ranking
SELECT
post_id,
title,
ts_rank(to_tsvector('english', title || ' ' || content),
to_tsquery('english', 'postgresql & performance')) as rank
FROM posts
WHERE to_tsvector('english', title || ' ' || content)
@@ to_tsquery('english', 'postgresql & performance')
AND created_at > NOW() - INTERVAL '90 days'
ORDER BY rank DESC
LIMIT 50;
-- Time: 120ms (40x faster!)
40x performance improvement (5000ms to 120ms)
Search relevance improved with ranking
Partial index reduced storage by 75%
User satisfaction increased significantly
Lessons: GIN essential for full-text, partial indexes save storage, ranking improves relevance, VACUUM regularly
PostgreSQL
Uber - Geospatial Driver Matching
Driver-rider matching was taking 800ms for "find drivers within 5km" query. With 10K requests/sec during peak, this was causing timeouts and poor user experience. Challenge: Optimize location-based queries.
Implemented GiST spatial indexes with PostGIS for geospatial queries. Added covering indexes to avoid table lookups. Used partial indexes for online drivers only.
-- Before: B-tree on lat/lon (very slow)
CREATE INDEX idx_lat ON drivers(latitude);
CREATE INDEX idx_lon ON drivers(longitude);