SQL Practice Logo

SQLPractice Online

DELETE & TRUNCATE: Real-World

Module: Data Modification & Transactions

Instagram deletes 500M+ spam comments daily using batch DELETE with WHERE clauses. Stripe truncates temporary payment processing tables every hour (100x faster than DELETE). Amazon archives old orders then deletes using DELETE with date filters to maintain 7-year retention policy. Netflix uses soft deletes (UPDATE status = deleted) instead of hard DELETE to enable data recovery. Uber batch-deletes 1B+ old location records monthly using DELETE with LIMIT to avoid long locks. DELETE and TRUNCATE are critical for data lifecycle management, GDPR compliance (right to be forgotten), storage optimization, and test data cleanup.

Instagram: Batch Delete 500M Spam Comments Daily

Instagram detects and removes 500 million spam comments daily using machine learning. Traditional approach: DELETE each spam comment individually. Problem: 500M individual DELETEs take 14 hours (10,000 rows/sec), blocks comment table, impacts user experience. Solution: Batch delete in chunks of 10,000 comments with 50ms pause between batches.

Batch delete pattern: Delete 10,000 spam comments per transaction with 50ms pause between batches. Benefits: (1) 7x faster (70K rows/sec vs 10K rows/sec), (2) Short locks (0.3 sec per batch, users not impacted), (3) Progress monitoring (track completion, estimate ETA), (4) Can pause/resume (stop if issues detected). Index on (is_spam, detected_at) enables fast filtering (10x speedup).

-- Instagram batch delete pattern (simplified)

DO $$

DECLARE

deleted_count INT;

total_deleted INT := 0;

batch_num INT := 0;

start_time TIMESTAMP := CLOCK_TIMESTAMP();

target_time INTERVAL := INTERVAL '2 hours';

BEGIN

LOOP

-- Delete batch of 10,000 spam comments

DELETE FROM comments

WHERE comment_id IN (

SELECT comment_id

FROM comments

WHERE is_spam = true

AND detected_at < CURRENT_TIMESTAMP - INTERVAL '1 hour'

LIMIT 10000

);

GET DIAGNOSTICS deleted_count = ROW_COUNT;

total_deleted := total_deleted + deleted_count;

batch_num := batch_num + 1;

-- Log progress every 100 batches (1M comments)

IF batch_num % 100 = 0 THEN

RAISE NOTICE 'Progress: % batches, % comments deleted, % elapsed, ETA: %',

batch_num, total_deleted,

CLOCK_TIMESTAMP() - start_time,

(CLOCK_TIMESTAMP() - start_time) * (500000000.0 / total_deleted) - (CLOCK_TIMESTAMP() - start_time);

END IF;

EXIT WHEN deleted_count = 0;

-- Pause 50ms between batches (let user queries run)

PERFORM pg_sleep(0.05);

-- Safety: Stop if taking too long (prevent runaway deletion)

IF CLOCK_TIMESTAMP() - start_time > target_time THEN

RAISE NOTICE 'Stopping: Time limit reached. Deleted % of 500M comments', total_deleted;

EXIT;