Database System Fundamentals: Concept

Module: Database-Specific Features

Every SQL database has the same core architecture: query processor (parses SQL, creates execution plan), storage engine (manages data on disk), transaction manager (ensures ACID), and buffer pool (caches data in memory). Understanding these components helps you diagnose performance issues, choose the right database, and configure it correctly. For example: slow query? Check execution plan (query processor). Random I/O? Check storage engine and indexes. Deadlocks? Check transaction isolation level. Out of memory? Check buffer pool size. The difference between databases is HOW they implement these components. PostgreSQL uses MVCC (multi-version concurrency control) for non-blocking reads. MySQL InnoDB uses clustered indexes for fast primary key lookups. SQL Server uses page-based storage with extent allocation. Oracle uses redo logs and flashback for point-in-time recovery.

**RDBMS Architecture - 4 Core Components:**

1. **Query Processor** (SQL → Execution Plan)

- Parser: Validates SQL syntax, builds parse tree

- Optimizer: Generates execution plan using statistics and cost model

- Executor: Runs the plan, fetches data from storage engine

- Example: SELECT * FROM users WHERE city = 'NYC' → Parser validates → Optimizer chooses index scan vs table scan → Executor fetches rows

2. **Storage Engine** (Data on Disk)

- **MySQL InnoDB**: Clustered index (data stored in B+tree by primary key), secondary indexes store primary key, fast PK lookups, slower secondary key lookups

- **PostgreSQL Heap**: Heap storage (data stored in insertion order), all indexes store row pointer (TID), fast secondary key lookups, slower PK lookups

- **SQL Server Pages**: Page-based storage (8KB pages), clustered index (like InnoDB), extent allocation (8 pages), fast range scans

- Trade-off: InnoDB fast for PK lookups (e-commerce orders by order_id), PostgreSQL fast for secondary key lookups (search by email)

3. **Transaction Manager** (ACID Guarantees)

- **Atomicity**: All or nothing. Uses WAL (write-ahead log) or redo log. If crash during transaction, replay log to complete or rollback.

- **Consistency**: Enforces constraints (foreign keys, unique, check). Validates before commit.

- **Isolation**: MVCC (PostgreSQL, Oracle) vs Locking (MySQL, SQL Server). MVCC allows non-blocking reads, locking blocks readers.

- **Durability**: Writes to WAL/redo log first (sequential I/O, fast), then to data files (random I/O, slow). Crash recovery replays log.

4. **Buffer Pool** (Memory Cache)

- Caches frequently accessed pages in memory (RAM)

- Reduces disk I/O (1000x faster: RAM 100ns, SSD 100μs, HDD 10ms)

- Size: MySQL innodb_buffer_pool_size, PostgreSQL shared_buffers, SQL Server buffer pool

- Hit ratio: >95% good, <90% increase buffer pool size

- Example: 1TB database, 64GB buffer pool, 95% hit ratio → 95% queries served from RAM (fast), 5% from disk (slow)

**MVCC vs Locking - Concurrency Control:**

**MVCC (PostgreSQL, Oracle):**

- Each transaction sees snapshot of data at transaction start time

- Writers don't block readers, readers don't block writers

- Old row versions kept until no transaction needs them (vacuum/cleanup)

- Trade-off: High concurrency, but bloat if vacuum doesn't run

- Use case: High read concurrency (analytics dashboard, reporting)

**Locking (MySQL InnoDB, SQL Server):**

- Readers acquire shared locks, writers acquire exclusive locks

- Writers block readers, readers block writers (unless READ UNCOMMITTED)

- No old row versions, no bloat

- Trade-off: Lower concurrency, but simpler cleanup

- Use case: Write-heavy workloads (order processing, inventory updates)

**Storage Engine Comparison:**

|---------|--------------|-----------------|------------------|