Transaction Control (ACID): Interview
Module: Data Modification & Transactions
Explain the four ACID properties with real-world examples. Why is each property critical for database reliability?
ACID guarantees database reliability through four properties: (1) Atomicity (All-or-Nothing): Transaction executes completely or not at all. Example: Bank transfer $100 from Account A to Account B - both debit and credit must succeed together. If credit fails, debit is rolled back (money does not disappear). Critical because: Prevents partial operations that violate business rules. Real-world failure: Knight Capital lost $440M (2012) due to atomicity violations (buy orders without sell orders). (2) Consistency (Valid → Valid): Transaction moves database from one valid state to another. Example: Account balance must be ≥ 0. If debit causes negative balance, transaction is rolled back. Critical because: Enforces business rules and constraints. Real-world failure: TSB Bank outage (2018) - consistency violations during migration caused wrong balances for 1.9M customers. (3) Isolation (Concurrent Safety): Concurrent transactions do not interfere. Example: Two users transfer from same account - locks prevent race condition (lost update). Critical because: Prevents race conditions in multi-user systems. Real-world failure: Ticket booking system double-books last seat due to isolation violation. (4) Durability (Persist Forever): Committed changes survive crashes. Example: Transfer committed, database crashes immediately - changes still present after restart. Critical because: Guarantees data is not lost. Real-world failure: British Airways (2017) lost transaction data due to power outage, 75K passengers stranded. Summary: ACID prevents data loss, inconsistencies, and race conditions. Every production database must guarantee ACID for reliability.
What is a deadlock? How does it occur? What are the prevention strategies?
Deadlock: Two transactions wait for each other to release locks, neither can proceed. Example: Transaction 1 locks Account A, waits for Account B. Transaction 2 locks Account B, waits for Account A. Both wait forever (circular dependency). How deadlock occurs: (1) Transaction 1: UPDATE accounts WHERE id = A (locks A), then UPDATE accounts WHERE id = B (waits for B). (2) Transaction 2: UPDATE accounts WHERE id = B (locks B), then UPDATE accounts WHERE id = A (waits for A). (3) Circular wait: T1 waits for T2, T2 waits for T1. (4) Database detects deadlock (wait-for graph has cycle), chooses victim (transaction with least work), rolls back victim. Prevention strategies: (1) Lock ordering: Always acquire locks in same order (lock A before B). Prevents circular wait. Example: All transactions lock accounts in ascending ID order. (2) Timeout: Set lock timeout (5-30 seconds), rollback if lock not acquired. Prevents infinite wait. (3) Retry logic: Catch deadlock error, wait random time (100-500ms), retry transaction. Handles deadlocks gracefully. (4) Minimize lock duration: Keep transactions short (< 1 second). Less time for deadlock to occur. (5) Lower isolation level: Use Read Committed instead of Serializable. Fewer locks, less deadlock risk. (6) Optimistic locking: Check version before update, retry if changed. No locks held during read. Real-world: Stripe retries transactions 3 times on deadlock (99% success after retries). Amazon uses lock ordering (always lock in ID order). Deadlock frequency target: < 0.01% (1 per 10,000 transactions).
Compare transaction isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable). When would you use each?
Isolation levels trade consistency for performance: (1) Read Uncommitted (weakest, fastest): Can see uncommitted changes from other transactions (dirty reads). Example: Transaction 1 updates balance to $500 (not committed), Transaction 2 reads $500 (dirty read), Transaction 1 rolls back, Transaction 2 saw invalid data. Use case: Rarely used (dirty reads violate consistency). Only for approximate counts where accuracy not critical. Performance: Fastest (no locks). (2) Read Committed (default, balanced): Can only see committed changes. No dirty reads. Example: Transaction 1 updates balance to $500 (not committed), Transaction 2 reads old value $1000 (committed), Transaction 1 commits, Transaction 2 now sees $500. Use case: Most applications (99% of transactions). Balance between consistency and performance. Performance: Fast (short locks). Real-world: Stripe, Amazon use Read Committed for most transactions. (3) Repeatable Read (stronger, slower): Same query returns same results within transaction. Prevents non-repeatable reads. Example: Transaction 1 reads balance $1000, Transaction 2 updates to $500 and commits, Transaction 1 reads again - still sees $1000 (repeatable). Use case: Reports requiring consistent snapshot (balance sheet, financial reports). Performance: Slower (more locks, longer duration). (4) Serializable (strongest, slowest): Transactions execute as if sequential. Full isolation. Prevents phantom reads (new rows appearing). Example: Transaction 1 counts orders (10 orders), Transaction 2 inserts new order, Transaction 1 counts again - still sees 10 (no phantom). Use case: Critical financial calculations, complex business logic requiring full isolation. Performance: Slowest (most locks, highest deadlock risk). Real-world: Stripe uses Serializable for critical financial calculations only (< 1% of transactions). Trade-off: Stronger isolation = more locks = slower performance. Choose based on consistency requirements vs performance needs.
Design a transaction for e-commerce order placement that: (1) Creates order record, (2) Reserves inventory for multiple items, (3) Charges payment. Handle errors: out of stock items should not cancel entire order (partial fulfillment), payment failure should cancel entire order. Include error handling and savepoints.
-- E-commerce order placement with savepoints
CREATE OR REPLACE FUNCTION place_order_with_partial_fulfillment(
customer_id_param INT,
items JSONB, -- [{"product_id": 100, "quantity": 2}, ...]
payment_method VARCHAR(50)
) RETURNS TABLE(
order_id INT,
items_added INT,
items_failed INT,
total_amount DECIMAL(10,2),
status TEXT
) AS $$
DECLARE
order_id_var INT;
item JSONB;
product_id_var INT;
quantity_var INT;
price_var DECIMAL(10,2);
available_qty INT;
order_total DECIMAL(10,2) := 0;
items_added_count INT := 0;
items_failed_count INT := 0;
BEGIN
-- Step 1: Create order record
INSERT INTO orders (customer_id, status, created_at)
VALUES (customer_id_param, 'pending', CURRENT_TIMESTAMP)
RETURNING orders.order_id INTO order_id_var;
-- Step 2: Process each item with savepoint (partial fulfillment)
FOR item IN SELECT * FROM jsonb_array_elements(items)
LOOP
product_id_var := (item->>'product_id')::INT;
quantity_var := (item->>'quantity')::INT;
-- Savepoint before processing item