Query Rewriting Patterns: Real-World

Module: Query Optimization & Performance

E-commerce search with OR conditions taking 15 seconds - rewrite to UNION ALL for 0.5 seconds. Report queries with subqueries taking 2 minutes - convert to JOINs for 5 seconds. NOT IN queries returning wrong results - fix with LEFT JOIN pattern.

Amazon: Product Search Query Rewriting

Amazon product search allowed filtering by multiple criteria with OR conditions. Query took 15 seconds, causing 40% search abandonment. Rewrote using UNION ALL pattern to enable index usage. Reduced to 0.5 seconds, increasing conversion by 25%.

Original query had OR conditions on category, brand, price_range, and rating. Each condition prevented index usage, forcing full table scan of 500M products. Rewrote as UNION ALL with 4 branches, each using its index. Added exclusion conditions to prevent duplicates. First branch: category index (50M rows). Second: brand index excluding first category (20M rows). Third: price_range index excluding previous (10M rows). Fourth: rating index excluding all previous (5M rows). Total scanned: 85M vs 500M. Time: 15s → 0.5s (30x faster).

SELECT * FROM products WHERE category = "Electronics" UNION ALL SELECT * FROM products WHERE brand = "Apple" AND category != "Electronics" UNION ALL SELECT * FROM products WHERE price BETWEEN 100 AND 500 AND category != "Electronics" AND brand != "Apple";

30x faster search queries (15s → 0.5s)

Reduced search abandonment from 40% to 15%

Increased conversion rate by 25%

Saved $2M/year in infrastructure costs

Enabled real-time search suggestions

MySQL

Netflix: Recommendation Query Optimization

Netflix recommendation engine used correlated subqueries to calculate user viewing statistics. Query took 2 minutes per user, limiting recommendations to batch processing. Rewrote subqueries as window functions for real-time recommendations in 3 seconds.

Original query had 5 correlated subqueries in SELECT calculating: average watch time, genre preferences, completion rate, binge score, and recency. Each subquery scanned viewing_history table (10B rows) for each user. 100M users × 5 subqueries × 10B rows = catastrophic. Rewrote using window functions: AVG() OVER (PARTITION BY user_id), COUNT() OVER (PARTITION BY user_id, genre), etc. Single pass over data with windowing. Added materialized view refreshed hourly for frequently accessed stats. Time: 120s → 3s (40x faster).

SELECT user_id, AVG(watch_duration) OVER (PARTITION BY user_id) as avg_watch, COUNT(*) OVER (PARTITION BY user_id, genre) as genre_count FROM viewing_history;

40x faster per-user queries (120s → 3s)

Enabled real-time recommendations (was batch-only)

Improved recommendation accuracy by 15%

Reduced database load by 95%

Saved $5M/year in compute costs

PostgreSQL

Uber: Driver Matching Query Rewrite

Uber driver matching used NOT IN to find available drivers (not currently on trip). Query returned 0 drivers during peak hours due to NULL trip_ids. Critical production bug. Rewrote with LEFT JOIN for correct results and 20x speedup.

Original query: SELECT * FROM drivers WHERE driver_id NOT IN (SELECT driver_id FROM active_trips). Problem: Some active_trips had NULL driver_id (system glitch), causing NOT IN to return 0 rows. During peak hours, no drivers matched, breaking the app. Rewrote: FROM drivers d LEFT JOIN active_trips t ON d.driver_id = t.driver_id WHERE t.driver_id IS NULL. This correctly handled NULLs and used hash anti-join instead of nested loop. Added monitoring to alert on NULL driver_ids. Time: 30s → 1.5s (20x faster). More importantly: correct results.

SELECT d.* FROM drivers d LEFT JOIN active_trips t ON d.driver_id = t.driver_id WHERE t.driver_id IS NULL AND d.status = "available";

20x faster driver matching (30s → 1.5s)

Fixed critical bug causing 0 available drivers

Prevented $10M+ in lost revenue during peak hours

Improved driver utilization by 30%

Added NULL monitoring to prevent recurrence

PostgreSQL