SQL Practice Logo

SQLPractice Online

Query Rewriting Techniques: Mistakes

Module: Query Optimization & Performance

SELECT * FROM (SELECT o.*, c.name FROM orders o JOIN customers c ON o.customer_id = c.id) t WHERE t.status = "completed";

SELECT o.*, c.name FROM (SELECT * FROM orders WHERE status = "completed") o JOIN customers c ON o.customer_id = c.id;

Original joins all 10M orders with customers (10M rows), then filters to 1M completed. Optimized filters to 1M completed first, then joins (1M rows). Predicate pushdown reduces rows processed from 10M to 1M. 45 seconds → 5 seconds (9x faster). Rule: Push WHERE conditions as close to data source as possible.

Move WHERE filters to innermost subquery. Filter before joins, not after.

High

Filters after expensive join instead of before

SELECT o.order_id, o.total FROM orders o JOIN customers c ON o.customer_id = c.id JOIN products p ON o.product_id = p.id WHERE o.status = "completed";

SELECT order_id, total FROM orders WHERE status = "completed";

Query joins with customers and products but uses no columns from these tables. Each join adds overhead: scan table, build hash table, probe. Removing 2 unnecessary joins: 15 seconds → 2 seconds (7x faster). Check: Are joined table columns used in SELECT, WHERE, or GROUP BY? If not, remove join.

Remove joins if no columns used from joined table. Check SELECT, WHERE, GROUP BY, ORDER BY.

High

Unnecessary joins with customers and products - no columns used

SELECT o1.* FROM orders o1 JOIN orders o2 ON o1.customer_id = o2.customer_id WHERE o2.total > o1.total;

SELECT * FROM orders o1 WHERE EXISTS (SELECT 1 FROM orders o2 WHERE o2.customer_id = o1.customer_id AND o2.total > o1.total);

Self-join for comparison creates O(n²) complexity. 10K orders = 100M comparisons. EXISTS short-circuits at first match, O(n). Better: window function RANK() OVER (PARTITION BY customer_id ORDER BY total DESC). 120 seconds → 5 seconds (24x faster). Rule: Avoid self-joins, use window functions or EXISTS.

Replace self-joins with window functions (RANK, ROW_NUMBER) or EXISTS for existence checks.

Critical

Self-join creates cartesian product, very slow

WITH expensive AS (SELECT user_id, complex_calc() FROM users) SELECT * FROM expensive WHERE user_id = 123 UNION ALL SELECT * FROM expensive WHERE user_id = 456;

WITH expensive AS MATERIALIZED (SELECT user_id, complex_calc() FROM users) SELECT * FROM expensive WHERE user_id IN (123, 456);

Without MATERIALIZED, CTE may be inlined and computed per use. Expensive calculation runs twice. With MATERIALIZED, computed once and reused. PostgreSQL 12+ supports MATERIALIZED hint. 20 seconds → 10 seconds (2x faster). Rule: Use MATERIALIZED for expensive calculations used multiple times.

Add MATERIALIZED hint to CTEs with expensive calculations used multiple times (PostgreSQL 12+).

Medium

CTE computed twice without MATERIALIZED hint

SELECT * FROM orders WHERE (status = "pending" OR status = "processing") AND (priority = "high" OR priority = "urgent");

SELECT * FROM orders WHERE status IN ("pending", "processing") AND priority IN ("high", "urgent");

Multiple OR conditions can prevent index usage or create complex execution plans. IN is clearer for optimizer and may enable better index strategies. Not always faster, but clearer. Test both. Simplification helps optimizer make better decisions. Rule: Simplify expressions when possible.

Convert multiple OR conditions to IN for clarity. Test if performance improves.

Low

Complex OR conditions harder for optimizer to handle