Real-World Performance Tuning Cases: Mistakes
Module: Query Optimization & Performance
Adding indexes randomly without analyzing query plans
Use EXPLAIN ANALYZE to identify Seq Scans, then add targeted indexes
Random indexing wastes time and resources. Indexes have cost: storage, write overhead, maintenance. Must identify actual bottlenecks with EXPLAIN ANALYZE first. Look for Seq Scan on large tables, high "Rows Removed by Filter". Then add index on filter columns. Verify with EXPLAIN ANALYZE that index is used.
Always run EXPLAIN ANALYZE first. Look for Seq Scan on tables >10K rows. Check "Rows Removed by Filter" >90%. Add index on WHERE/JOIN columns. Verify index is used. Monitor index usage: pg_stat_user_indexes.
High
Wasted effort, index bloat, slower writes, no improvement
Optimizing single query in isolation without considering overall system impact
Measure total impact (execution time × call frequency) and consider effect on other queries
Query A: 10 seconds, called 10 times/day = 100 seconds total. Query B: 1 second, called 10,000 times/day = 10,000 seconds total. Optimizing Query A saves 100 seconds, Query B saves 10,000 seconds. Must prioritize by total impact. Also, some optimizations (indexes, configuration changes) affect all queries. Must verify no regressions.
Calculate impact: execution_time × call_frequency. Prioritize high total impact. After optimization, check pg_stat_statements for regressions in other queries. Monitor overall system metrics (CPU, I/O, throughput).
High
Optimize rarely-called query, ignore frequently-called slow query. Or optimization hurts other queries
Not updating statistics after bulk data changes, causing bad query plans
Run ANALYZE after bulk INSERT/DELETE/UPDATE (>10% of table)
After bulk load of 1M rows into 10M row table, statistics still show 10M rows. Optimizer underestimates result size, chooses nested loop instead of hash join. Query 50x slower. ANALYZE updates statistics (row counts, distributions, histograms). Takes seconds, prevents hours of slow queries.
Always ANALYZE after: bulk INSERT (>10% of table), bulk DELETE (>10%), schema changes, data distribution changes. Automate: add ANALYZE to ETL scripts. Monitor: pg_stat_user_tables shows last_analyze, n_mod_since_analyze.
Critical
Optimizer has stale statistics, chooses wrong plan, query 10-50x slower
Rewriting query without validating results match original
Compare row counts and sample results before and after optimization
Rewriting LEFT JOIN to INNER JOIN changes results (excludes NULLs). Rewriting DISTINCT to GROUP BY may change order. Rewriting subquery to JOIN may produce duplicates. Must validate: (1) Row count matches, (2) Sample results match, (3) Edge cases tested (NULLs, empty sets, duplicates).
Before optimization: SELECT COUNT(*), save sample results. After optimization: verify COUNT matches, compare sample results. Test edge cases: WHERE col IS NULL, empty result set, duplicate values. Use EXCEPT to find differences: (original EXCEPT optimized) UNION (optimized EXCEPT original).
Critical
Optimization changes results, breaks application logic, data corruption
Not monitoring after optimization - regressions and new issues go unnoticed
Set up monitoring and alerts for query performance, resource usage, errors
After optimization, query is fast. But: (1) Data grows, query slows again, (2) Statistics become stale, plan degrades, (3) New queries added, cause contention, (4) Configuration changes affect performance. Without monitoring, issues go unnoticed until users complain.
Set up: (1) Query performance monitoring (pg_stat_statements, slow query log), (2) Alerts for slow queries (>threshold), (3) Resource monitoring (CPU, I/O, memory), (4) Regular statistics updates (daily ANALYZE), (5) Index usage tracking (drop unused indexes).
Medium
Optimization regresses over time, new slow queries appear, no visibility