Real-World Database Selection: Mistakes
Module: Database-Specific Features
Choosing database based on hype or trends (NoSQL is hot, let's use MongoDB) without analyzing if it fits requirements
Analyze requirements first (workload type, scale, consistency needs), then choose database that best fits those requirements
Database selection should be based on requirements, not trends. Example: Many startups chose MongoDB during NoSQL hype, then migrated to PostgreSQL when they needed ACID transactions and complex queries. Migration cost: 3-6 months of development time. Better approach: Analyze requirements (OLTP? OLAP? ACID needed? Complex queries?), compare databases (PostgreSQL vs MySQL vs MongoDB), test with realistic data, choose best fit. Real-world: Uber started with MySQL, migrated to PostgreSQL for better concurrency (MVCC). Took 6 months but worth it. Lesson: Choose right database from start, avoid expensive migration.
Use decision framework: (1) Analyze requirements, (2) Compare databases, (3) Build proof of concept, (4) Use decision matrix, (5) Document reasoning. Don't follow trends blindly.
Critical
Wrong database choice leads to expensive migration later (rewrite application, migrate data, retrain team)
Testing with toy data (1000 rows) when production will have 100M rows, then discovering performance issues in production
Test with realistic data size (same as production), realistic query patterns, realistic concurrent users
Database performance depends heavily on data size. Query that takes 10ms with 1000 rows may take 10 seconds with 100M rows (1000x slower). Example: SELECT * FROM users WHERE email = 'test@example.com' works fine with 1000 users, but with 100M users it does full table scan (slow). Solution: Add index on email column. Testing with realistic data reveals these issues early. Real-world: Many startups launch with toy data, then face performance crisis when they get real users. Better approach: Load realistic data (100M rows if that's production size), test critical queries, measure performance, add indexes where needed. Tools: pg_bench (PostgreSQL), sysbench (MySQL) for load testing.
Always test with realistic data size. If production will have 100M rows, test with 100M rows. Use EXPLAIN to verify indexes are used. Measure query performance (should be <100ms for most queries).
High
Queries that work fine with 1000 rows become slow with 100M rows (missing indexes, full table scans, memory issues)
Ignoring total cost of ownership (Oracle is free to download, let's use it) without considering licensing costs
Calculate total cost: Licensing + hosting + bandwidth + support + developer time (learning curve)
Database cost is not just licensing. Total cost includes: (1) Licensing (Oracle $47K/core/year, SQL Server $14K-$47K/core, PostgreSQL/MySQL $0), (2) Hosting (AWS RDS $100-$1000/month), (3) Bandwidth (data transfer costs), (4) Support (Oracle support $10K-$50K/year), (5) Developer time (learning curve, training). Example: Oracle total cost for small company: $376K/year licensing + $50K/year hosting + $20K/year support = $446K/year. PostgreSQL total cost: $0 licensing + $50K/year hosting + $0 support (community) = $50K/year. Savings: $396K/year. Real-world: Many companies migrate from Oracle to PostgreSQL to save costs. Lesson: Calculate total cost before choosing database.
Calculate 5-year total cost of ownership. Include licensing, hosting, support, developer time. For most companies, PostgreSQL or MySQL is more cost-effective than Oracle or SQL Server.
Critical
Oracle licensing costs $47K/core/year. For 8-core server, that's $376K/year. Plus hosting, support, training.
Choosing PostgreSQL when team only knows MySQL, without considering learning curve and productivity loss
Consider team skills in decision. If team knows MySQL, either choose MySQL or budget 2-4 weeks for PostgreSQL training
Team skills matter. If team knows MySQL, they are productive immediately. If team must learn PostgreSQL, productivity drops for 2-4 weeks (learning syntax, tools, debugging). Cost: 2-4 weeks * team size * salary. Example: 5 developers * $100K/year salary * 4 weeks = $38K cost. Trade-off: PostgreSQL has better features (JSONB, window functions, MVCC), but MySQL is simpler (team is productive immediately). Decision: If features are critical (complex queries, JSON data, high concurrency), PostgreSQL is worth learning curve. If features are not critical (simple CRUD), MySQL is better choice. Real-world: Many startups choose MySQL because team knows it, migrate to PostgreSQL later when they need advanced features (Uber did this).
Consider team skills in decision matrix. If team knows MySQL, either choose MySQL or budget time/money for PostgreSQL training. Don't underestimate learning curve.
High
Team struggles with PostgreSQL (different syntax, different tools, different debugging), productivity drops 50% for 2-4 weeks
Premature optimization: Choosing specialized database (Cassandra, MongoDB) for scale you don't have yet
Start with proven database (PostgreSQL, MySQL), scale when needed (read replicas, sharding)
Premature optimization is root of all evil. Many startups choose Cassandra or MongoDB for scale they don't have yet (we might have 1B users someday). Reality: Most apps never reach that scale. Cost: Specialized databases are complex (eventual consistency, no JOINs, limited transactions), team struggles, productivity drops 50%. Better approach: Start with PostgreSQL or MySQL (proven, simple, full ACID), scale when needed (read replicas, sharding). Instagram scaled PostgreSQL to 2B users. Uber scaled PostgreSQL to 50K writes/sec. Both started simple, scaled when needed. Real-world: Many startups chose MongoDB for scale, then migrated to PostgreSQL when they needed ACID transactions and complex queries. Lesson: Start simple, scale when needed.
Start with PostgreSQL or MySQL. Both scale to millions of users. Add read replicas when needed (10x read capacity). Add sharding when needed (100x capacity). Don't choose specialized database until you actually need it.
Medium
Specialized databases are complex (eventual consistency, no JOINs, limited transactions), team struggles, productivity drops