Query Optimization Fundamentals: Interview

Q: What is the difference between a table scan and an index scan, and when is each appropriate?

A table scan reads every row sequentially from disk - appropriate when retrieving most rows (>20-30% of table) or when no suitable index exists. An index scan uses an index to locate specific rows - appropriate when retrieving a small percentage of rows. Index scans have overhead (reading index + table), so for large result sets, table scans can be faster. The optimizer chooses based on estimated selectivity. For small tables (<1000 rows), table scans are often faster than index scans due to low

Module: Query Optimization & Performance

Explain the query optimization process from identifying a slow query to deploying the fix.

The process involves: 1) Measure baseline performance using timing tools, 2) Analyze execution plan with EXPLAIN ANALYZE to identify bottlenecks (table scans, missing indexes, inefficient joins), 3) Apply targeted optimizations (add indexes, rewrite query, update statistics), 4) Test changes in a staging environment and measure improvements, 5) Validate that results remain correct, 6) Document the changes and reasoning, 7) Deploy to production, 8) Monitor to ensure sustained performance. Always measure before and after to quantify improvements.

What are the three main factors that affect query performance and how do you optimize each?

The three factors are: 1) I/O Operations - minimize disk reads through proper indexing, covering indexes, and caching. One disk read equals ~10,000 memory operations. 2) CPU Processing - reduce computational complexity by filtering early, avoiding unnecessary sorting, and using efficient join algorithms. 3) Memory Usage - ensure working sets fit in memory through proper buffer pool sizing, avoiding large sorts, and using appropriate data types. Optimize by analyzing execution plans to identify which factor is the bottleneck.

How does the database optimizer choose an execution plan?

The optimizer is cost-based: it generates multiple possible execution plans, estimates the cost of each using statistics about table sizes, index selectivity, and data distribution, then chooses the plan with the lowest estimated cost. Costs include I/O operations, CPU processing, and memory usage. The optimizer considers factors like available indexes, join order, join algorithms (nested loop, hash, merge), and whether to use indexes or table scans. Outdated statistics can cause poor plan selection, which is why regular ANALYZE/UPDATE STATISTICS is crucial.

Why do functions on indexed columns prevent index usage?

When you apply a function to an indexed column (e.g., WHERE YEAR(date_col) = 2024), the database must evaluate the function for every row before comparing values. The index stores the original column values, not the function results, so it cannot be used for direct lookups. This forces a full table scan. The solution is to rewrite the condition without functions (e.g., WHERE date_col >= '2024-01-01' AND date_col < '2025-01-01') or create a function-based index if your database supports it.

What is the difference between a table scan and an index scan, and when is each appropriate?

A table scan reads every row sequentially from disk - appropriate when retrieving most rows (>20-30% of table) or when no suitable index exists. An index scan uses an index to locate specific rows - appropriate when retrieving a small percentage of rows. Index scans have overhead (reading index + table), so for large result sets, table scans can be faster. The optimizer chooses based on estimated selectivity. For small tables (<1000 rows), table scans are often faster than index scans due to lower overhead.

Given a query taking 30 seconds, walk through your optimization approach.

Step 1: Run EXPLAIN ANALYZE to get baseline and execution plan. Step 2: Identify bottlenecks - look for Seq Scans on large tables, high cost operations, large row count differences between estimated and actual. Step 3: Check for missing indexes on WHERE/JOIN columns. Step 4: Look for functions on indexed columns, SELECT *, or correlated subqueries. Step 5: Add appropriate indexes: CREATE INDEX idx_name ON table(col1, col2). Step 6: Rewrite problematic patterns (e.g., replace correlated subqueries with JOINs). Step 7: Re-run EXPLAIN ANALYZE to measure improvement. Step 8: Validate results are correct. Step 9: Test with production-like data volumes. Step 10: Document changes and deploy.

This systematic approach ensures you identify the actual bottleneck rather than guessing. Execution plans reveal the truth about query performance. Most optimizations come from adding missing indexes or rewriting inefficient query patterns. Always measure to confirm improvements.

You have a query with multiple JOINs that is slow. How do you optimize it?

Step 1: Analyze execution plan to see join order and algorithms. Step 2: Ensure all JOIN columns are indexed on both sides. Step 3: Check join selectivity - join smaller tables first to reduce intermediate result sets. Step 4: Verify join types are correct (INNER vs LEFT). Step 5: Look for Cartesian products (missing join conditions). Step 6: Consider denormalization if tables are frequently joined. Step 7: Use covering indexes to avoid table lookups. Step 8: Filter early with WHERE clauses before JOINs. Step 9: For complex queries, break into CTEs for clarity and potential optimization. Step 10: Consider materialized views for frequently executed complex joins.

Join optimization focuses on reducing the number of rows processed at each step. Proper indexes enable efficient join algorithms. The optimizer usually chooses good join order, but you can help by filtering early and ensuring statistics are current.