UPDATE with Joins: Real-World
Module: Data Modification & Transactions
UPDATE with JOIN is used everywhere: (1) E-commerce: Update order totals when items change (Amazon recalculates totals for millions of orders daily). (2) Inventory: Sync stock levels from warehouse system (Shopify updates inventory across 1M+ products). (3) Denormalization: Update customer lifetime value from orders (Stripe calculates customer stats for 10M+ customers). (4) Data cleanup: Fix inconsistent data using reference tables (banks correct millions of transaction records). (5) Cache updates: Refresh materialized data from source tables (Netflix updates viewing stats for billions of records). Trade-offs: JOIN updates are fast but syntax varies by database. Subqueries are portable but sometimes slower.
Amazon: Update Order Totals from Order Items
Amazon processes millions of orders daily. When customers add/remove items, order totals must be recalculated. Challenge: Row-by-row updates would take hours and lock tables. Solution: Use UPDATE...FROM to recalculate all order totals in single query (100x faster). Architecture: (1) Order items stored in separate table (normalized), (2) Order totals cached in orders table (denormalized for performance), (3) Background job runs every 5 minutes to sync totals, (4) Uses UPDATE...FROM with aggregation subquery. Performance: Row-by-row: 2 hours for 10M orders. UPDATE...FROM: 2 minutes for 10M orders (60x faster). Benefits: Fast recalculation, no table locks, atomic operation.
Amazon uses UPDATE...FROM with batching: (1) Aggregate order_items by order_id in subquery, (2) JOIN orders with aggregated data, (3) UPDATE totals in single query, (4) Batch by order_id range (1M orders per batch), (5) Run during off-peak hours. Architecture: Order Items → Aggregate → UPDATE...FROM → Orders. Performance: 2 minutes per 10M orders. Result: 60x faster than row-by-row, handles peak traffic.
-- Amazon order total update pattern (simplified)
-- Runs every 5 minutes for orders modified in last 10 minutes
-- Step 1: Create temp table with aggregated totals
CREATE TEMP TABLE order_totals AS
SELECT
order_id,
SUM(quantity * price) as item_total,
COUNT(*) as item_count,
SUM(quantity * price * tax_rate) as tax_amount
FROM order_items
WHERE updated_at > NOW() - INTERVAL '10 minutes'
GROUP BY order_id;
-- Step 2: Create index for fast JOIN
CREATE INDEX idx_order_totals_order_id ON order_totals(order_id);
-- Step 3: Update orders using UPDATE...FROM
UPDATE orders o
SET
subtotal = ot.item_total,
tax = ot.tax_amount,
total = ot.item_total + ot.tax_amount,
item_count = ot.item_count,
updated_at = NOW()
FROM order_totals ot
WHERE o.order_id = ot.order_id
AND o.status IN ('pending', 'processing');
-- Performance metrics:
-- Orders updated: 100,000 per run
-- Time: 2 seconds (vs 200 seconds row-by-row)
-- Speedup: 100x faster
-- Runs: Every 5 minutes, 288 times per day
Updates 10M orders daily (100x faster than row-by-row)
Runs every 5 minutes without locking tables
Handles peak traffic (Black Friday: 50K orders/min)
Atomic operation ensures data consistency
Lesson: UPDATE with JOIN essential for high-volume applications
PostgreSQL