Query Rewriting Patterns: Interview
Module: Query Optimization & Performance
Explain why OR conditions on different columns prevent index usage. How does rewriting to UNION ALL solve this problem?
OR on different columns creates a dilemma for the query optimizer:
**The Problem:**
Query: WHERE category = 'Electronics' OR brand = 'Apple'
Optimizer choices:
1. Use index on category?
- Fast for category = 'Electronics'
- But misses rows where brand = 'Apple' and category != 'Electronics'
- Must scan table anyway
2. Use index on brand?
- Fast for brand = 'Apple'
- But misses rows where category = 'Electronics' and brand != 'Apple'
- Must scan table anyway
3. Use both indexes?
- Would need to merge results from two index scans
- Complex, not supported by most databases
Result: Optimizer gives up and does full table scan.
**How UNION ALL Fixes It:**
Rewritten query:
SELECT * FROM products WHERE category = 'Electronics'
UNION ALL
SELECT * FROM products WHERE brand = 'Apple' AND category != 'Electronics'
Now each query is simple:
- First query: Uses index on category (500K rows)
- Second query: Uses index on brand (200K rows)
- UNION ALL: Concatenates results (no sorting)
Total: 700K rows scanned vs 5M (full table)
**Key Insight:**
By splitting one complex query into two simple queries, we enable index usage on each branch. The AND category != 'Electronics' in the second query prevents duplicates.
**When It Works:**
- OR on different columns (not same column)
- Each column has an index
- Conditions are selective (<20% of rows)
- Can identify exclusion condition
**When It Doesn't Work:**
- OR on same column: WHERE status IN ('A', 'B') is fine
- No indexes on columns
- Conditions not selective (>50% of rows)
- Can't avoid duplicates
Why does NOT IN fail with NULL values? Explain the NULL handling difference between NOT IN, NOT EXISTS, and LEFT JOIN.