UPDATE with Joins: Real-World

Module: Data Modification & Transactions

UPDATE with JOIN is used everywhere: (1) E-commerce: Update order totals when items change (Amazon recalculates totals for millions of orders daily). (2) Inventory: Sync stock levels from warehouse system (Shopify updates inventory across 1M+ products). (3) Denormalization: Update customer lifetime value from orders (Stripe calculates customer stats for 10M+ customers). (4) Data cleanup: Fix inconsistent data using reference tables (banks correct millions of transaction records). (5) Cache updates: Refresh materialized data from source tables (Netflix updates viewing stats for billions of records). Trade-offs: JOIN updates are fast but syntax varies by database. Subqueries are portable but sometimes slower.

Amazon: Update Order Totals from Order Items

Amazon processes millions of orders daily. When customers add/remove items, order totals must be recalculated. Challenge: Row-by-row updates would take hours and lock tables. Solution: Use UPDATE...FROM to recalculate all order totals in single query (100x faster). Architecture: (1) Order items stored in separate table (normalized), (2) Order totals cached in orders table (denormalized for performance), (3) Background job runs every 5 minutes to sync totals, (4) Uses UPDATE...FROM with aggregation subquery. Performance: Row-by-row: 2 hours for 10M orders. UPDATE...FROM: 2 minutes for 10M orders (60x faster). Benefits: Fast recalculation, no table locks, atomic operation.

Amazon uses UPDATE...FROM with batching: (1) Aggregate order_items by order_id in subquery, (2) JOIN orders with aggregated data, (3) UPDATE totals in single query, (4) Batch by order_id range (1M orders per batch), (5) Run during off-peak hours. Architecture: Order Items → Aggregate → UPDATE...FROM → Orders. Performance: 2 minutes per 10M orders. Result: 60x faster than row-by-row, handles peak traffic.

-- Amazon order total update pattern (simplified)

-- Runs every 5 minutes for orders modified in last 10 minutes

-- Step 1: Create temp table with aggregated totals

CREATE TEMP TABLE order_totals AS

SELECT

order_id,

SUM(quantity * price) as item_total,

COUNT(*) as item_count,

SUM(quantity * price * tax_rate) as tax_amount

FROM order_items

WHERE updated_at > NOW() - INTERVAL '10 minutes'

GROUP BY order_id;

-- Step 2: Create index for fast JOIN

CREATE INDEX idx_order_totals_order_id ON order_totals(order_id);

-- Step 3: Update orders using UPDATE...FROM

UPDATE orders o

SET

subtotal = ot.item_total,

tax = ot.tax_amount,

total = ot.item_total + ot.tax_amount,

item_count = ot.item_count,

updated_at = NOW()

FROM order_totals ot

WHERE o.order_id = ot.order_id

AND o.status IN ('pending', 'processing');

-- Performance metrics:

-- Orders updated: 100,000 per run

-- Time: 2 seconds (vs 200 seconds row-by-row)

-- Speedup: 100x faster

-- Runs: Every 5 minutes, 288 times per day

Updates 10M orders daily (100x faster than row-by-row)

Runs every 5 minutes without locking tables

Handles peak traffic (Black Friday: 50K orders/min)

Atomic operation ensures data consistency

Lesson: UPDATE with JOIN essential for high-volume applications

PostgreSQL