Real-World Database Selection: Real-World
Module: Database-Specific Features
Database selection is the most critical technical decision for any project. Wrong choice = expensive migration later. Real examples: (1) Startups: Choose PostgreSQL or MySQL (proven, cheap, good ecosystem). Instagram started with PostgreSQL, scaled to 2B users. (2) Analytics: Choose PostgreSQL (window functions, CTEs) or specialized databases (Redshift, BigQuery). Airbnb uses PostgreSQL for analytics. (3) Mobile apps: Choose SQLite (embedded, no server, offline-first). WhatsApp uses SQLite for 2B users. (4) Enterprise: Choose SQL Server (Windows integration, .NET ecosystem) or Oracle (mission-critical, 24/7 support). Banks use Oracle. (5) Microservices: Choose PostgreSQL or MySQL (one database per service). Uber uses PostgreSQL for 50+ microservices. Trade-offs: PostgreSQL has better features (JSONB, arrays, full-text search) but MySQL is simpler. SQL Server has great tooling but expensive licensing. Oracle is most powerful but costs $47K/core/year. Decision factors: Budget (PostgreSQL free, Oracle $47K/core), team skills (MySQL easier than PostgreSQL), features needed (PostgreSQL for complex queries), scale (both handle billions of rows).
Instagram: PostgreSQL for 2 Billion Users
Instagram is a photo-sharing app with 2B users, 100M photos/day, 4.2B likes/day. Challenge: Choose database that scales to billions of users while maintaining fast response times. Requirements: High write throughput, high read throughput, ACID transactions, complex queries. Decision: PostgreSQL.
Instagram chose PostgreSQL over MySQL for MVCC (non-blocking reads during writes). Architecture: 1000+ PostgreSQL servers with sharding (2000 shards by user_id), read replicas (3-5 per shard), PgBouncer connection pooling (50K connections to 100 database connections), Redis caching (90% hit rate). Performance: 1.2K photos/sec writes, billions of reads/day, response time under 100ms for 99% of requests.
-- Instagram sharding by user_id
CREATE TABLE photos (
photo_id BIGSERIAL PRIMARY KEY,
user_id BIGINT,
image_url TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_photos_user_id ON photos(user_id);
-- Feed query with MVCC (non-blocking)
SELECT p.photo_id, p.image_url, u.username
FROM photos p
JOIN follows f ON p.user_id = f.following_id
JOIN users u ON p.user_id = u.user_id
WHERE f.follower_id = 123
ORDER BY p.created_at DESC LIMIT 20;
2B users, 100M photos/day, 4.2B likes/day
PostgreSQL MVCC for non-blocking reads
1000+ servers with sharding, 3-5 read replicas per shard
PgBouncer: 50K connections to 100 database connections
Redis: 90% cache hit rate
Performance: under 100ms for 99% of requests
PostgreSQL
Uber: MySQL to PostgreSQL Migration
Uber migrated from MySQL to PostgreSQL in 2016. Problem: MySQL locking caused 100-500ms delays in ride matching. Solution: PostgreSQL MVCC provides non-blocking reads. Result: 10x faster ride matching (10-50ms).
Uber migration process: Schema conversion (AUTO_INCREMENT to SERIAL), pgloader for data migration (1TB in 4 hours), dual-write to both databases, gradual cutover service by service, kept MySQL as backup for 2 weeks. Duration: 6 months. Result: 50K writes/sec without blocking reads, 10x faster ride matching.
-- MySQL problem: locking blocks reads
UPDATE drivers SET latitude = 37.7749 WHERE driver_id = 123;
-- Blocks: SELECT * FROM drivers WHERE status = 'available';
-- PostgreSQL solution: MVCC non-blocking
UPDATE drivers SET latitude = 37.7749 WHERE driver_id = 123;
-- Doesn't block: SELECT * FROM drivers WHERE status = 'available';
100M users, 15M trips/day, 50K writes/sec
Problem: MySQL locking caused 100-500ms delays
Solution: PostgreSQL MVCC
Migration: 6 months with dual-write
Result: 10x faster (10-50ms vs 100-500ms)