Database Engines & ACID Properties: Interview
Module: Foundational Concepts
Explain the four ACID properties and give a concrete failure mode that each one prevents.
Atomicity prevents partial writes. Without it, a bank debit can succeed while the credit fails — money disappears. With it, both steps are one unit: either both commit or both roll back.
Consistency prevents invalid states. Without it, a constraint violation mid-transaction leaves the database with data that violates business rules (negative balance, orphaned foreign key). With it, any violation aborts the whole transaction.
Isolation prevents concurrent interference. Without it, two customers can simultaneously read "1 seat available", both decide to book, and both commit — double booking. With it, the first writer's lock forces the second to wait and see the updated state.
Durability prevents post-commit data loss. Without it, a confirmed order can vanish after a server crash. With it, WAL is fsynced to disk before COMMIT returns, so recovery always restores committed transactions.
Structure your answer as: property → the specific failure it prevents → one concrete example. Interviewers want to see you connect theory to production failure modes.
What is the internal mechanism behind Durability?
How does MVCC relate to Isolation?
What is the cost of ACID and how do you manage it?
What is the difference between READ COMMITTED and SERIALIZABLE? When would you choose each?
READ COMMITTED: each statement in a transaction sees only data committed before that statement ran. Two SELECT statements in the same transaction can return different results if a concurrent transaction commits between them (non-repeatable read). This is the default in PostgreSQL and Oracle.
SERIALIZABLE: the full transaction sees a snapshot consistent with a single point in time. No concurrent transaction can insert, update, or delete data that would change any row your transaction has read. This prevents phantom reads and ensures the result is identical to running all transactions one-by-one in serial order.
Choose READ COMMITTED for: typical OLTP queries — reading user profiles, displaying product listings, recording events. Any operation where data changing between two reads is acceptable.
Choose SERIALIZABLE for: inventory reservation (cannot oversell), seat/room booking (cannot double-book), financial totals that are immediately acted upon (cannot read a stale balance and then transfer against it).
Mention the performance cost: SERIALIZABLE uses predicate locks that can increase contention. Quantify when possible — READ COMMITTED handles 10× more throughput on the same hardware for most workloads.
What is a phantom read? Can REPEATABLE READ prevent it?
How does PostgreSQL implement SERIALIZABLE (SSI)?
When would you use SNAPSHOT isolation in SQL Server?
How does Write-Ahead Logging make Durability practical without writing to disk on every change?
The data file write is expensive and often random I/O. The WAL write is sequential and much cheaper. The engine separates the two:
1. Changes are first written to the WAL (sequential file write, fsynced to disk)
2. COMMIT returns to the application — at this point Durability is guaranteed
3. The actual data file pages are updated later, in the background
On crash: the data files may be stale, but the WAL is complete. The engine replays the WAL from the last checkpoint forward, applying every committed change to the data files. The database recovers to exactly the state at the last committed transaction.
This is why COMMIT can be fast (one sequential fsync) while still being durable. The data file writes are batched and done asynchronously in the background (the checkpointer process in PostgreSQL).
Mention redo vs undo logs: WAL / redo log is used for crash recovery (replay forward). Undo log is used for rollback (replay backward). They are separate structures in most engines.
What is a checkpoint and why does it matter for recovery time?
How does synchronous_commit=off in PostgreSQL trade Durability for performance?
What is group commit and how does it improve WAL throughput?
What is ACID versus BASE and when is each the right choice?
ACID (Atomicity, Consistency, Isolation, Durability):
• Strong, immediate consistency
• All nodes see the same data at the same time
• Used by: PostgreSQL, MySQL, Oracle, SQL Server
• Best for: financial systems, inventory, reservations, healthcare records
• CAP theorem position: favours Consistency over Availability
BASE (Basically Available, Soft state, Eventually consistent):
• Eventual consistency — all nodes converge to the same value eventually
• Higher availability under network partitions
• Used by: DynamoDB, Cassandra, CouchDB