SQL Practice Logo

SQLPractice Online

Index Types Deep Dive: Mistakes

Module: Query Optimization & Performance

CREATE INDEX idx_status ON orders USING hash(status);

CREATE INDEX idx_status ON orders(status); -- B-tree

Status column likely has few values (pending, completed, cancelled). Hash provides no benefit over B-tree for low-cardinality columns, and B-tree supports more operations.

Use Hash only for high-cardinality columns (millions of unique values) with equality-only queries

Medium

Hash index cannot be used for range queries or sorting

CREATE INDEX idx_search ON products(name); -- B-tree for full-text

CREATE INDEX idx_search ON products USING gin(to_tsvector('english', name));

B-tree with LIKE '%keyword%' scans entire index. GIN tokenizes text and creates inverted index for instant lookups. For "laptop wireless" search, B-tree takes 2000ms, GIN takes 20ms.

Always use GIN with to_tsvector for full-text search. Use B-tree only for prefix matching (LIKE 'prefix%')

High

B-tree with LIKE is 100x slower than GIN for full-text search

CREATE INDEX idx_location ON restaurants(latitude, longitude); -- B-tree

CREATE INDEX idx_location ON restaurants USING gist(location); -- PostGIS

B-tree on lat/lon requires checking both columns separately. GiST uses spatial indexing with bounding boxes. For 5km radius search on 100K rows: B-tree = 3000ms, GiST = 12ms (250x faster).

Use GiST with PostGIS geography/geometry types for any distance-based queries

Critical

B-tree cannot efficiently handle "find nearby" queries

CREATE INDEX idx_multi ON orders(order_date, customer_id); -- Wrong order

CREATE INDEX idx_multi ON orders(customer_id, order_date); -- Equality first

Multi-column indexes work left-to-right. Index on (order_date, customer_id) helps queries filtering by order_date, but not customer_id alone. Put equality columns before range columns.

Index column order: 1) Equality (=), 2) Range (<, >), 3) Sort (ORDER BY)

High

Index cannot be used efficiently for WHERE customer_id = ?

CREATE INDEX idx_email ON users(email); -- Missing covering columns

CREATE INDEX idx_email ON users(email) INCLUDE (name, created_at);

Query: SELECT name, created_at FROM users WHERE email = ? Without covering: reads index + table (2 I/O). With covering: reads only index (1 I/O). 3-5x faster for hot queries.

Add INCLUDE for columns in SELECT that aren't in WHERE. Huge win for read-heavy queries.

Medium

Query still needs to access table for name and created_at

CREATE INDEX idx_all ON orders(order_date); -- Indexes all 10M rows

CREATE INDEX idx_pending ON orders(order_date) WHERE status = 'pending';

If 90% of orders are completed and you only query pending orders, full index wastes space. Partial index: 500MB → 50MB. Same query speed, 90% less storage, faster writes.

Use partial indexes when querying specific subset (active users, pending orders, recent data)

Medium

Index is 10x larger than needed and slower to maintain

CREATE INDEX idx_gender ON users(gender); -- Low cardinality

-- No index needed, or use partial: WHERE gender = 'F' AND status = 'active'

Gender has only 2-3 values. Index scan reads 50% of table anyway. Table scan is faster. Rule: Index useful only if returns <10% of rows. Low cardinality = wasted space.

Don't index boolean, gender, or any column with <10 distinct values