Join Optimization Strategies: Performance
Module: Query Optimization & Performance
**Index Impact on Join Performance:**
1. **No Index on Join Column:**
- Algorithm: Nested Loop with full table scan OR Hash Join
- Time: O(n * m) - quadratic
- Example: 1000 orders × 100K customers = 100M row comparisons
- Result: 45 seconds for simple join
2. **Index on Join Column:**
- Algorithm: Nested Loop with index lookup
- Time: O(n * log m) - logarithmic
- Example: 1000 orders × log(100K) ≈ 17K lookups
- Result: 0.2 seconds (225x faster)
3. **Covering Index:**
- Algorithm: Index-only nested loop
- Time: O(n * log m) but no table access
- Benefit: 3-5x faster than regular index (no disk I/O for table)
- Result: 0.05 seconds (900x faster than no index)
**Join Algorithm Selection:**
| Scenario | Algorithm | Speed | Memory |
|----------|-----------|-------|--------|
| Small outer (<1K), indexed inner | Nested Loop | Fast | Low |
| Large tables, equality join | Hash Join | Fast | High |
| Both sorted/indexed | Merge Join | Fast | Low |
| No indexes, small tables | Nested Loop | Slow | Low |
| No indexes, large tables | Hash Join | Medium | High |
**Filter Early - Critical Optimization:**
Filtering before join reduces intermediate result size:
- 5M orders, 2M customers
- Filter: orders from last 30 days (500K rows)
- Without early filter: Join 5M × 2M, then filter = 10 trillion comparisons
- With early filter: Filter to 500K, join 500K × 2M = 1 billion comparisons
- Result: 10,000x fewer comparisons
Database optimizers do this automatically, but you can help:
- Put filters in WHERE, not in JOIN ON
- Use specific date ranges, not open-ended
- Filter on indexed columns
**Join Order Impact:**
Example: orders (1M) JOIN order_items (5M) JOIN products (100K)
Bad order (left-to-right):
1. orders × order_items = 5M intermediate rows
2. 5M × products = 500B comparisons