DELETE & TRUNCATE: Real-World
Module: Data Modification & Transactions
Instagram deletes 500M+ spam comments daily using batch DELETE with WHERE clauses. Stripe truncates temporary payment processing tables every hour (100x faster than DELETE). Amazon archives old orders then deletes using DELETE with date filters to maintain 7-year retention policy. Netflix uses soft deletes (UPDATE status = deleted) instead of hard DELETE to enable data recovery. Uber batch-deletes 1B+ old location records monthly using DELETE with LIMIT to avoid long locks. DELETE and TRUNCATE are critical for data lifecycle management, GDPR compliance (right to be forgotten), storage optimization, and test data cleanup.
Instagram: Batch Delete 500M Spam Comments Daily
Instagram detects and removes 500 million spam comments daily using machine learning. Traditional approach: DELETE each spam comment individually. Problem: 500M individual DELETEs take 14 hours (10,000 rows/sec), blocks comment table, impacts user experience. Solution: Batch delete in chunks of 10,000 comments with 50ms pause between batches.
Batch delete pattern: Delete 10,000 spam comments per transaction with 50ms pause between batches. Benefits: (1) 7x faster (70K rows/sec vs 10K rows/sec), (2) Short locks (0.3 sec per batch, users not impacted), (3) Progress monitoring (track completion, estimate ETA), (4) Can pause/resume (stop if issues detected). Index on (is_spam, detected_at) enables fast filtering (10x speedup).
-- Instagram batch delete pattern (simplified)
DO $$
DECLARE
deleted_count INT;
total_deleted INT := 0;
batch_num INT := 0;
start_time TIMESTAMP := CLOCK_TIMESTAMP();
target_time INTERVAL := INTERVAL '2 hours';
BEGIN
LOOP
-- Delete batch of 10,000 spam comments
DELETE FROM comments
WHERE comment_id IN (
SELECT comment_id
FROM comments
WHERE is_spam = true
AND detected_at < CURRENT_TIMESTAMP - INTERVAL '1 hour'
LIMIT 10000
);
GET DIAGNOSTICS deleted_count = ROW_COUNT;
total_deleted := total_deleted + deleted_count;
batch_num := batch_num + 1;
-- Log progress every 100 batches (1M comments)
IF batch_num % 100 = 0 THEN
RAISE NOTICE 'Progress: % batches, % comments deleted, % elapsed, ETA: %',
batch_num, total_deleted,
CLOCK_TIMESTAMP() - start_time,
(CLOCK_TIMESTAMP() - start_time) * (500000000.0 / total_deleted) - (CLOCK_TIMESTAMP() - start_time);
END IF;
EXIT WHEN deleted_count = 0;
-- Pause 50ms between batches (let user queries run)
PERFORM pg_sleep(0.05);
-- Safety: Stop if taking too long (prevent runaway deletion)
IF CLOCK_TIMESTAMP() - start_time > target_time THEN
RAISE NOTICE 'Stopping: Time limit reached. Deleted % of 500M comments', total_deleted;
EXIT;