Database System Fundamentals: Mistakes

Q: Common mistake in Database System Fundamentals

PostgreSQL MVCC keeps old row versions for concurrent readers. After transactions commit, old versions become "dead tuples". VACUUM removes dead tuples and reclaims space. Without VACUUM, tables grow 10x larger: 1GB table with 90% dead tuples = 10GB on disk. Queries scan all pages (including dead tuples) = 10x slower. Check bloat: SELECT schemaname, tablename, n_live_tup, n_dead_tup, ROUND(100.0 * n_dead_tup / (n_live_tup + n_dead_tup), 2) AS dead_pct FROM pg_stat_user_tables WHERE n_dead_tup >

Module: Database-Specific Features

Using MySQL for high read concurrency workload (analytics dashboard with 1000 concurrent users)

Use PostgreSQL with MVCC for high read concurrency (writers don't block readers)

MySQL InnoDB uses locking: writers acquire exclusive locks, blocking readers. With 1000 concurrent readers and frequent writes, readers wait for locks (100-1000ms). PostgreSQL MVCC allows writers and readers to work simultaneously without blocking. Writers create new row versions, readers see old versions. Uber migrated from MySQL to PostgreSQL for ride matching: 50K writes/sec, zero blocking reads. Check lock waits: MySQL (SHOW ENGINE INNODB STATUS, look for "LOCK WAIT"), PostgreSQL (SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock').

Analyze workload before choosing database. High read concurrency (>100 concurrent readers with frequent writes) = use PostgreSQL MVCC. Write-heavy with few readers = MySQL InnoDB is fine.

High

Slow queries, lock waits, timeout errors

Setting buffer pool to 90% of RAM on shared server (database + application on same server)

Set buffer pool to 50-60% of RAM on shared server, 70-80% on dedicated database server

Buffer pool caches database pages in RAM. Setting too high causes OS to swap (1000x slower than RAM). On shared server, application needs RAM too. Example: 64GB RAM server, buffer pool 90% (58GB), application needs 10GB, OS needs 2GB = 70GB total > 64GB RAM = swapping. Correct: buffer pool 50% (32GB), application 10GB, OS 2GB, free 20GB = no swapping. Monitor: Linux (free -h, check "Swap" usage), MySQL (SHOW VARIABLES LIKE 'innodb_buffer_pool_size'), PostgreSQL (SHOW shared_buffers). If swapping occurs, reduce buffer pool by 10-20%.

Start conservative (50% RAM), monitor buffer pool hit ratio (should be >95%), increase gradually if hit ratio is low and no swapping occurs. Dedicated database server: 70-80% RAM. Shared server: 50-60% RAM.

Critical

Out of memory, swapping, system crash

Using SERIALIZABLE isolation level for all queries ("maximum safety")

Use READ COMMITTED for most queries, REPEATABLE READ for financial transactions, SERIALIZABLE only for critical data with known conflicts

SERIALIZABLE provides full isolation but is 10x slower than READ COMMITTED. It detects conflicts and aborts transactions with "serialization failure" errors. Example: Two transactions read account balance ($1000), both try to withdraw $600, one succeeds, other aborts. Application must retry. For most queries, READ COMMITTED is sufficient: see committed data, no dirty reads. Use REPEATABLE READ for financial transactions: same query returns same results within transaction. Use SERIALIZABLE only for critical data with known conflicts (inventory updates, seat reservations). Check isolation level: PostgreSQL (SHOW transaction_isolation), MySQL (SELECT @@transaction_isolation).

Default isolation levels are good: PostgreSQL (READ COMMITTED), MySQL (REPEATABLE READ). Only change if you have specific requirements. Monitor serialization failures: PostgreSQL (SELECT * FROM pg_stat_database WHERE datname = 'mydb', check "serialization_failures").

Medium

Slow queries (10x slower), frequent serialization failures, deadlocks

Creating new database connection for every HTTP request (no connection pooling)

Use connection pooling: PgBouncer (PostgreSQL), ProxySQL (MySQL), or application-level pooling

Creating database connection is expensive: TCP handshake, authentication, session initialization = 100ms overhead. With 1000 req/sec, that's 100 seconds of wasted time per second (impossible). Connection pooling reuses connections: 1ms overhead. Example: Without pooling: 1000 req/sec × 100ms = 100 connections/sec created, max_connections=100 exhausted in 1 second. With pooling: 10 pooled connections handle 1000 req/sec, each connection serves 100 req/sec. Setup: PgBouncer (PostgreSQL): pool_mode=transaction, default_pool_size=25, max_client_conn=1000. ProxySQL (MySQL): max_connections=1000, mysql-max_connections=100.

Always use connection pooling in production. Application-level pooling (HikariCP, pgbouncer) is easier to setup. Database-level pooling (PgBouncer, ProxySQL) enables connection sharing across multiple applications. Monitor: active connections should be <50% of max_connections.

Critical

Slow response times (100ms overhead per request), connection limit exhausted, database crash

Ignoring PostgreSQL MVCC bloat (never running VACUUM, n_dead_tup growing)

Run VACUUM regularly: autovacuum enabled (default), manual VACUUM on high-update tables, monitor n_dead_tup

PostgreSQL MVCC keeps old row versions for concurrent readers. After transactions commit, old versions become "dead tuples". VACUUM removes dead tuples and reclaims space. Without VACUUM, tables grow 10x larger: 1GB table with 90% dead tuples = 10GB on disk. Queries scan all pages (including dead tuples) = 10x slower. Check bloat: SELECT schemaname, tablename, n_live_tup, n_dead_tup, ROUND(100.0 * n_dead_tup / (n_live_tup + n_dead_tup), 2) AS dead_pct FROM pg_stat_user_tables WHERE n_dead_tup > 10000 ORDER BY n_dead_tup DESC. If dead_pct > 20%, run VACUUM. Autovacuum runs automatically but may be too slow for high-update tables. Manual VACUUM: VACUUM ANALYZE tablename (runs in background, doesn't block queries).

Enable autovacuum (default): autovacuum = on. Monitor dead tuples weekly: alert if dead_pct > 20%. For high-update tables (>10K updates/min), run manual VACUUM daily. VACUUM FULL reclaims space but locks table (avoid in production).

High

Table bloat (10x larger), slow queries (10x slower), disk space exhausted