Understanding EXPLAIN Plans: Interview

Module: Query Optimization & Performance

Explain how to read an execution plan and identify the main bottleneck.

Read execution plans bottom-to-top or inside-out (innermost operations execute first). Look for: 1) Sequential/Table scans on large tables (indicates missing indexes), 2) High row count discrepancies between estimated and actual (indicates outdated statistics), 3) Expensive operations like sorts or hash joins on large datasets, 4) Nested loops with high row counts (indicates missing indexes on join columns). The bottleneck is usually the operation with the highest actual time or the one processing the most rows unnecessarily.

What is the difference between EXPLAIN and EXPLAIN ANALYZE?

EXPLAIN shows the estimated execution plan without running the query - it displays what the optimizer plans to do based on statistics. EXPLAIN ANALYZE actually executes the query and shows both estimated and actual metrics (rows, time, buffers). Use EXPLAIN for quick plan checks without execution overhead. Use EXPLAIN ANALYZE to see real performance and identify discrepancies between estimates and reality. ANALYZE is essential for optimization because estimates can be wrong due to outdated statistics or data skew.

What does it mean when estimated rows differ significantly from actual rows in an execution plan?

Large discrepancies (10x or more) indicate outdated statistics or data skew. The optimizer uses statistics to estimate row counts and choose execution strategies. When estimates are wrong, it may choose inefficient join methods (nested loop instead of hash join) or wrong join order. This causes poor performance. Fix by running ANALYZE (PostgreSQL), ANALYZE TABLE (MySQL), or UPDATE STATISTICS (SQL Server) to refresh statistics. For data skew, consider creating filtered statistics or histograms.

Explain the difference between Index Seek, Index Scan, and Table Scan.

Index Seek (or Index Scan with filter) is a direct lookup using an index - best performance, O(log n). Example: WHERE id = 123. Index Scan reads the index sequentially - good for range queries, O(n) but only index pages. Example: WHERE date BETWEEN x AND y. Table Scan (Sequential Scan) reads the entire table - worst for large tables, O(n) for all data pages. Acceptable for small tables or when retrieving >20-30% of rows. The optimizer chooses based on selectivity and available indexes.

What are the three main join methods and when is each appropriate?

Nested Loop: Best for small outer table (<1000 rows) with indexed inner table. For each outer row, does index lookup on inner table. O(outer × log(inner)). Hash Join: Best for large tables with equality joins. Builds hash table from smaller table, probes with larger. O(outer + inner) but requires memory. Merge Join: Best when both tables are already sorted on join key. Simultaneously scans both sorted inputs. O(outer + inner) with minimal memory. The optimizer chooses based on table sizes, available indexes, memory, and join conditions.

Given this execution plan showing a sequential scan, what would you do?

Step 1: Identify which table has the sequential scan and what filter conditions are applied. Step 2: Check if the filter is selective (removes >90% of rows) - if yes, an index would help significantly. Step 3: Create an index on the columns used in WHERE clause: CREATE INDEX idx_name ON table(col1, col2). Step 4: For composite filters, put the most selective column first in the index. Step 5: Re-run EXPLAIN ANALYZE to verify the index is used and measure improvement. Step 6: If the scan remains, check if statistics are outdated (run ANALYZE) or if the query retrieves too many rows (>20-30% of table) where a scan is actually optimal.

Sequential scans are the #1 cause of slow queries. The key is identifying whether an index would help (selective filter) or if the scan is actually optimal (retrieving most rows). Always verify the index is used after creating it - sometimes the optimizer still chooses a scan if statistics suggest it's faster.

You see "Using temporary; Using filesort" in MySQL EXPLAIN. What does this mean and how do you fix it?

"Using temporary" means MySQL created a temporary table to process the query - usually for GROUP BY or DISTINCT. "Using filesort" means MySQL had to sort the results - expensive for large datasets. Fixes: 1) For GROUP BY: Add index on GROUP BY columns to enable index-based grouping. 2) For ORDER BY: Add index on ORDER BY columns to provide pre-sorted data. 3) For both: Create composite index covering both GROUP BY and ORDER BY columns in the right order. 4) Increase sort_buffer_size if sorts are unavoidable. 5) Consider if DISTINCT is necessary or if GROUP BY would be more efficient. Re-run EXPLAIN to verify the temporary table and filesort are eliminated.

Temporary tables and filesorts are expensive operations that can often be eliminated with proper indexing. The key is understanding what causes them (GROUP BY, DISTINCT, ORDER BY, complex joins) and providing indexes that allow the database to process data in the required order without additional sorting or temporary storage.