Isolation Levels: Interview
Module: Data Modification & Transactions
Explain the four isolation levels and their trade-offs. When would you use each?
Four isolation levels from weakest to strongest: (1) Read Uncommitted: Allows dirty reads (see uncommitted data). Fastest (no locks). Use case: Approximate counts only (dashboard showing "~1000 orders"). Never for financial data. (2) Read Committed (default): Prevents dirty reads, allows non-repeatable reads. Fast (short locks). Use case: Most applications (99% of transactions). Stripe uses for payment processing (30K+ TPS). (3) Repeatable Read: Prevents dirty and non-repeatable reads, allows phantom reads (in SQL Server). Slower (long locks). Use case: Reports requiring consistent snapshot (financial reports, analytics). Amazon uses for inventory reports. (4) Serializable: Prevents all anomalies. Slowest (range locks, high deadlock risk). Use case: Critical operations requiring full isolation (< 1% of transactions). Banks use for account balance calculations. Trade-off: Stronger isolation = more consistency = slower performance. Choose based on consistency requirements vs performance needs.
What are the three concurrency anomalies? Provide examples and prevention strategies.
Three concurrency anomalies: (1) Dirty Read: Transaction reads uncommitted data from another transaction. Example: T1 updates balance to $2000 (not committed), T2 reads $2000, T1 rolls back, T2 saw invalid data. Prevention: Use Read Committed or higher (default). Impact: Data corruption, wrong business decisions. (2) Non-repeatable Read: Transaction reads same row twice, gets different values. Example: T1 reads balance $1000, T2 updates to $1500 and commits, T1 reads again sees $1500. Prevention: Use Repeatable Read or Serializable. Impact: Inconsistent reports (calculations do not match). (3) Phantom Read: Transaction executes same query twice, gets different number of rows. Example: T1 counts 10 orders, T2 inserts new order and commits, T1 counts again sees 11 orders. Prevention: Use Serializable (or Repeatable Read in PostgreSQL/MySQL). Impact: Overbooking, inventory allocation errors. Real-world: Trading system lost millions due to dirty reads, inventory system oversold due to phantom reads.
Compare PostgreSQL, MySQL, and SQL Server isolation level implementations. What are the key differences?
Key differences: (1) Default levels: PostgreSQL (Read Committed), MySQL (Repeatable Read), SQL Server (Read Committed), Oracle (Read Committed). (2) Phantom read prevention: PostgreSQL Repeatable Read prevents phantoms (uses MVCC), MySQL Repeatable Read prevents phantoms (uses gap locks), SQL Server Repeatable Read allows phantoms (must use Serializable). (3) Serializable implementation: PostgreSQL uses Serializable Snapshot Isolation (SSI) - optimistic approach, detects conflicts at commit, better performance. SQL Server uses traditional Serializable - range locks, pessimistic approach. (4) Syntax: PostgreSQL (BEGIN; SET TRANSACTION ISOLATION LEVEL level;), MySQL (SET TRANSACTION ISOLATION LEVEL level; START TRANSACTION;), SQL Server (SET TRANSACTION ISOLATION LEVEL level; BEGIN TRANSACTION;). (5) Table hints: SQL Server supports query-level hints (WITH READUNCOMMITTED), others do not. Recommendation: Use Read Committed for most applications (good balance), use Repeatable Read for reports, use Serializable sparingly (< 1% of transactions).
Design a financial report function that calculates total revenue, average order value, and order count. Ensure consistent calculations even with concurrent order insertions. Choose appropriate isolation level and explain why.
-- Financial report with Repeatable Read for consistent snapshot
CREATE OR REPLACE FUNCTION generate_financial_report(
start_date DATE,
end_date DATE
) RETURNS TABLE(
total_revenue DECIMAL(10,2),
avg_order_value DECIMAL(10,2),
order_count BIGINT,
min_order DECIMAL(10,2),
max_order DECIMAL(10,2),
report_timestamp TIMESTAMP
) AS $$
BEGIN
-- Use Repeatable Read for consistent snapshot
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
RETURN QUERY
SELECT
SUM(total)::DECIMAL(10,2) as total_revenue,
AVG(total)::DECIMAL(10,2) as avg_order_value,
COUNT(*)::BIGINT as order_count,
MIN(total)::DECIMAL(10,2) as min_order,
MAX(total)::DECIMAL(10,2) as max_order,
CURRENT_TIMESTAMP as report_timestamp
FROM orders
WHERE status = 'completed'
AND order_date BETWEEN start_date AND end_date;
END;
$$ LANGUAGE plpgsql;
-- Generate report (consistent snapshot)
SELECT * FROM generate_financial_report('2024-01-01', '2024-12-31');
-- Why Repeatable Read:
-- 1. Prevents non-repeatable reads (same query returns same results)
-- 2. All calculations use consistent snapshot (total, average, count match)