SQL Practice Logo

SQLPractice Online

Real-World Database Selection: Concept

Module: Database-Specific Features

Database selection is choosing the right database for your specific use case. The challenge: Every database has trade-offs. PostgreSQL has better features but MySQL is simpler. SQL Server has great tooling but expensive licensing. Oracle is most powerful but costs $47K/core/year. SQLite is embedded but single-writer. No perfect database exists. Solution: (1) Analyze requirements (workload type, scale, budget, team skills), (2) Compare databases (features, performance, cost, ecosystem), (3) Build proof of concept (test with realistic data), (4) Make informed decision (document reasoning). Real-world: Instagram chose PostgreSQL for better concurrency (MVCC, non-blocking reads) - scaled to 2B users. Uber migrated MySQL → PostgreSQL for same reason (50K writes/sec without blocking reads). Netflix uses MySQL for transactions + Cassandra for time-series data (right tool for each job). Airbnb uses MySQL for transactions + PostgreSQL for analytics (different workloads). Trade-off: Single database (simpler) vs multiple databases (better fit for each workload).

**Database Selection Framework:**

**Step 1: Analyze Requirements**

**Workload Type:**

- **OLTP (Online Transaction Processing)**: Many small transactions (INSERT, UPDATE, DELETE). Example: E-commerce orders, user registrations, payments. Best: PostgreSQL, MySQL, SQL Server. Need: ACID transactions, row-level locking, indexes on primary keys.

- **OLAP (Online Analytical Processing)**: Few large queries (complex JOINs, aggregations). Example: Business intelligence, reporting, data warehousing. Best: PostgreSQL (window functions, CTEs), Redshift, BigQuery. Need: Column-store indexes, parallel query execution, materialized views.

- **Mixed (HTAP)**: Both OLTP and OLAP. Example: Real-time analytics, dashboards. Best: PostgreSQL (handles both), specialized HTAP databases. Trade-off: Optimizing for one hurts the other.

**Scale Requirements:**

- **Small (<1M rows, <100 users)**: Any database works. SQLite, MySQL, PostgreSQL all fine. Choose based on team skills.

- **Medium (1M-100M rows, 100-10K users)**: PostgreSQL or MySQL. Both handle this easily. Add read replicas if needed.

- **Large (100M-1B rows, 10K-1M users)**: PostgreSQL or MySQL with sharding. Instagram: PostgreSQL with 2B users, 1000+ servers. Uber: PostgreSQL with 50K writes/sec.

- **Very Large (>1B rows, >1M users)**: Specialized databases or heavy sharding. Netflix: MySQL + Cassandra. Facebook: MySQL with custom sharding.

**Consistency Requirements:**

- **Strong consistency (ACID)**: Every read sees latest write. Example: Banking, payments, inventory. Best: PostgreSQL, MySQL, SQL Server, Oracle. All provide ACID transactions.

- **Eventual consistency**: Reads may see stale data temporarily. Example: Social media feeds, recommendations. Best: Cassandra, DynamoDB. Trade-off: Higher availability, lower consistency.

- **Causal consistency**: Reads see writes in order. Example: Chat messages, comments. Best: PostgreSQL (with proper isolation levels).

**Budget:**

- **Free/Open-source**: PostgreSQL, MySQL, SQLite. $0 licensing. Pay for hosting (AWS RDS, Google Cloud SQL).

- **Commercial**: SQL Server ($14K-$47K/core), Oracle ($47K/core/year). Expensive but includes support.

- **Managed services**: AWS RDS ($100-$1000/month), Google Cloud SQL, Azure SQL. Pay for convenience.

**Team Skills:**

- **MySQL**: Easier to learn, simpler features, huge community. Good for beginners.

- **PostgreSQL**: More complex, advanced features, steeper learning curve. Good for experienced teams.

- **SQL Server**: Windows-focused, .NET integration, great tooling. Good for Microsoft shops.

- **Oracle**: Most complex, enterprise features, expensive training. Good for large enterprises with budget.

**Step 2: Database Comparison**

**PostgreSQL vs MySQL:**

| Feature | PostgreSQL | MySQL |

|---------|-----------|-------|

| ACID compliance | Full ACID | Full ACID (InnoDB) |

| Concurrency | MVCC (non-blocking reads) | Locking (blocking reads) |

| Advanced features | JSONB, arrays, full-text search, window functions, CTEs | Basic features, JSON (limited) |

| Performance | Better for complex queries | Better for simple queries |

| Replication | Streaming replication, logical replication | Binary log replication |

| Ecosystem | Smaller but growing | Huge ecosystem |

| Learning curve | Steeper | Easier |

| Use cases | Complex apps, analytics, JSON data | Simple apps, read-heavy workloads |

**When to choose PostgreSQL:**

- Complex queries (JOINs, subqueries, window functions)

- JSON data (JSONB is faster than MySQL JSON)