INSERT Patterns: Interview

Module: Data Modification & Transactions

Why is bulk INSERT 100x faster than single INSERT in a loop? Explain the technical reasons.

Bulk INSERT is 100x faster because it eliminates three major bottlenecks: (1) Transaction overhead: Single inserts create N separate transactions (each with BEGIN/COMMIT overhead). Bulk INSERT uses 1 transaction for all rows. For 10,000 rows: 10,000 commits vs 1 commit = 100x less overhead. (2) Network round-trips: Each single INSERT requires network round-trip to database (latency adds up). Bulk INSERT sends all data in 1 round-trip. For 10,000 rows with 5ms latency: 50 seconds vs 0.05 seconds = 1000x faster. (3) Index updates: Single inserts update indexes 10,000 times (incremental updates slow). Bulk INSERT updates indexes once after all rows inserted (batch update faster). Real-world example: Shopify processes 50K orders/min during Black Friday using bulk inserts. Without bulk inserts, would take 100x longer (500 orders/min). Technical details: (1) Transaction log: 1 log entry vs 10,000 log entries, (2) Lock acquisition: 1 lock vs 10,000 locks, (3) Constraint checks: Batched vs per-row. Lesson: Always use bulk INSERT for multiple rows. Never loop with single INSERTs in production code.

When should you use INSERT...SELECT vs COPY/LOAD DATA? What are the trade-offs?

Use INSERT...SELECT when: (1) Data is already in database (copying between tables), (2) Need to transform data during copy (aggregations, JOINs, calculations), (3) Need to filter data (WHERE clause), (4) Source is query result (not file). Benefits: Server-side operation (no network transfer), can transform during copy, works with any query. Speed: 100x faster than row-by-row, processes millions of rows efficiently. Use COPY/LOAD DATA when: (1) Data is in external file (CSV, text), (2) Initial data load (millions of rows), (3) No transformation needed (raw data import), (4) Maximum speed required (10-100x faster than INSERT). Benefits: Bypasses SQL parsing (direct data load), minimal logging, parallel loading possible. Speed: 100,000+ rows/second. Trade-offs: INSERT...SELECT: More flexible (transformations, filters), but slower than COPY. COPY/LOAD DATA: Fastest method, but requires file access, no transformations. Real-world examples: (1) Amazon Redshift: Uses COPY for data warehouse loads (billions of rows daily from S3). (2) Stripe: Uses INSERT...SELECT for ETL pipelines (transform and aggregate data). (3) Banks: Use BULK INSERT for historical data migration (millions of transactions). Lesson: Use INSERT...SELECT for database-to-database copies with transformations. Use COPY/LOAD DATA for file imports when speed is critical.

Explain RETURNING clause in PostgreSQL. Why is it better than separate SELECT to get inserted IDs?

RETURNING clause returns inserted data in same statement, eliminating need for separate SELECT. Benefits: (1) No extra query: RETURNING returns data immediately, no second round-trip to database. Saves network latency and query execution time. (2) Atomic operation: INSERT and return happen in single atomic operation. No race condition where another INSERT happens between your INSERT and SELECT. (3) Works with bulk inserts: Returns all inserted rows, not just one. Can get IDs for 1000 inserted rows in single statement. (4) Can return expressions: Not just columns, but calculated values (total * 0.1 as tax). (5) Use in CTEs: Can chain operations using returned data in subsequent queries. Example without RETURNING: INSERT INTO orders (user_id, total) VALUES (1, 999.99); SELECT order_id FROM orders WHERE user_id = 1 ORDER BY created_at DESC LIMIT 1; -- Race condition: Another order might be inserted between INSERT and SELECT. Example with RETURNING: INSERT INTO orders (user_id, total) VALUES (1, 999.99) RETURNING order_id, created_at; -- Returns order_id immediately, atomic, no race condition. Advanced usage with CTE: WITH inserted_order AS (INSERT INTO orders (user_id, total) VALUES (1, 999.99) RETURNING order_id) INSERT INTO order_items (order_id, product_id, quantity) SELECT order_id, 101, 1 FROM inserted_order; -- Use returned order_id in subsequent INSERT. Real-world: Stripe uses RETURNING for payment processing. INSERT payment, get payment_id immediately, use for payment_items. No extra SELECT needed. Lesson: Always use RETURNING (PostgreSQL) or OUTPUT (SQL Server) to get inserted IDs. Faster, safer, cleaner code.

Write a query to bulk insert 5 products with different categories, then verify the insert was successful by counting products per category.

-- Bulk insert 5 products

INSERT INTO products (name, price, category, stock)

VALUES

('Laptop Pro 15"', 1299.99, 'Electronics', 50),

('Wireless Mouse', 29.99, 'Electronics', 200),

('Office Desk', 399.99, 'Furniture', 30),

('Ergonomic Chair', 249.99, 'Furniture', 45),

('USB-C Cable', 12.99, 'Electronics', 500);

-- Verify insert: Count products per category

SELECT

category,

COUNT(*) as product_count,

AVG(price) as avg_price,

SUM(stock) as total_stock

FROM products

GROUP BY category

ORDER BY product_count DESC;

-- Expected output:

-- category | product_count | avg_price | total_stock

-- Electronics | 3 | 443.32 | 750

-- Furniture | 2 | 324.99 | 75

Bulk INSERT adds 5 rows in 1 statement (1 transaction, 1 network round-trip). Verification query uses GROUP BY to count products per category and calculate averages. This pattern is 100x faster than 5 separate INSERT statements. Real-world: E-commerce sites use bulk inserts for product imports during catalog updates.

Write an INSERT...SELECT query to copy all orders from 2023 into an archive table, including customer names (requires JOIN). Then verify the archive contains the correct data.

-- Create archive table (if not exists)

CREATE TABLE archive_orders (

order_id INTEGER,

customer_name VARCHAR(100),

order_total DECIMAL(10, 2),

order_date DATE,

archived_at TIMESTAMP DEFAULT NOW()

);

-- INSERT...SELECT with JOIN to copy 2023 orders with customer names

INSERT INTO archive_orders (order_id, customer_name, order_total, order_date)