Real-World Performance Tuning Cases: Mistakes

Module: Query Optimization & Performance

Adding indexes randomly without analyzing query plans

Use EXPLAIN ANALYZE to identify Seq Scans, then add targeted indexes

Random indexing wastes time and resources. Indexes have cost: storage, write overhead, maintenance. Must identify actual bottlenecks with EXPLAIN ANALYZE first. Look for Seq Scan on large tables, high "Rows Removed by Filter". Then add index on filter columns. Verify with EXPLAIN ANALYZE that index is used.

Always run EXPLAIN ANALYZE first. Look for Seq Scan on tables >10K rows. Check "Rows Removed by Filter" >90%. Add index on WHERE/JOIN columns. Verify index is used. Monitor index usage: pg_stat_user_indexes.

High

Wasted effort, index bloat, slower writes, no improvement

Optimizing single query in isolation without considering overall system impact

Measure total impact (execution time × call frequency) and consider effect on other queries

Query A: 10 seconds, called 10 times/day = 100 seconds total. Query B: 1 second, called 10,000 times/day = 10,000 seconds total. Optimizing Query A saves 100 seconds, Query B saves 10,000 seconds. Must prioritize by total impact. Also, some optimizations (indexes, configuration changes) affect all queries. Must verify no regressions.

Calculate impact: execution_time × call_frequency. Prioritize high total impact. After optimization, check pg_stat_statements for regressions in other queries. Monitor overall system metrics (CPU, I/O, throughput).

High

Optimize rarely-called query, ignore frequently-called slow query. Or optimization hurts other queries

Not updating statistics after bulk data changes, causing bad query plans

Run ANALYZE after bulk INSERT/DELETE/UPDATE (>10% of table)

After bulk load of 1M rows into 10M row table, statistics still show 10M rows. Optimizer underestimates result size, chooses nested loop instead of hash join. Query 50x slower. ANALYZE updates statistics (row counts, distributions, histograms). Takes seconds, prevents hours of slow queries.

Always ANALYZE after: bulk INSERT (>10% of table), bulk DELETE (>10%), schema changes, data distribution changes. Automate: add ANALYZE to ETL scripts. Monitor: pg_stat_user_tables shows last_analyze, n_mod_since_analyze.

Critical

Optimizer has stale statistics, chooses wrong plan, query 10-50x slower

Rewriting query without validating results match original

Compare row counts and sample results before and after optimization

Rewriting LEFT JOIN to INNER JOIN changes results (excludes NULLs). Rewriting DISTINCT to GROUP BY may change order. Rewriting subquery to JOIN may produce duplicates. Must validate: (1) Row count matches, (2) Sample results match, (3) Edge cases tested (NULLs, empty sets, duplicates).

Before optimization: SELECT COUNT(*), save sample results. After optimization: verify COUNT matches, compare sample results. Test edge cases: WHERE col IS NULL, empty result set, duplicate values. Use EXCEPT to find differences: (original EXCEPT optimized) UNION (optimized EXCEPT original).

Critical

Optimization changes results, breaks application logic, data corruption

Not monitoring after optimization - regressions and new issues go unnoticed

Set up monitoring and alerts for query performance, resource usage, errors

After optimization, query is fast. But: (1) Data grows, query slows again, (2) Statistics become stale, plan degrades, (3) New queries added, cause contention, (4) Configuration changes affect performance. Without monitoring, issues go unnoticed until users complain.

Set up: (1) Query performance monitoring (pg_stat_statements, slow query log), (2) Alerts for slow queries (>threshold), (3) Resource monitoring (CPU, I/O, memory), (4) Regular statistics updates (daily ANALYZE), (5) Index usage tracking (drop unused indexes).

Medium

Optimization regresses over time, new slow queries appear, no visibility