Real-World Database Selection: Interview
Module: Database-Specific Features
You are building a new e-commerce platform. How would you choose between PostgreSQL and MySQL? What factors would you consider?
Database selection for e-commerce requires analyzing specific requirements: (1) Workload: E-commerce is OLTP (many small transactions: orders, payments, inventory updates). Both PostgreSQL and MySQL handle OLTP well. (2) Features needed: Product search (full-text search + JSON attributes), real-time inventory (LISTEN/NOTIFY), analytics (sales reports with window functions). PostgreSQL advantage: Better full-text search, JSONB faster than MySQL JSON, LISTEN/NOTIFY for real-time updates, window functions for analytics. (3) Concurrency: E-commerce has high read concurrency (many users browsing products). PostgreSQL MVCC provides non-blocking reads (better concurrency). MySQL uses locking (reads block during writes). (4) Team skills: If team knows MySQL, learning curve for PostgreSQL is 1-2 weeks. (5) Cost: Same ($100-$500/month AWS RDS for both). Decision framework: Score each database on key criteria. PostgreSQL: Features (9/10), Concurrency (9/10), Cost (10/10), Team skills (7/10 if team knows MySQL). MySQL: Features (6/10), Concurrency (7/10), Cost (10/10), Team skills (10/10 if team knows MySQL). Recommendation: PostgreSQL for better features (JSONB, full-text search, LISTEN/NOTIFY, window functions). Worth 1-2 week learning curve. Alternative: Start with MySQL (team knows it), migrate to PostgreSQL later if needed (Uber did this). Real-world: Shopify uses PostgreSQL for e-commerce. Instagram uses PostgreSQL for 2B users.
When would you choose SQL Server over PostgreSQL? What are the key trade-offs?
SQL Server vs PostgreSQL decision depends on environment and requirements: (1) Platform: SQL Server is best for Windows/.NET environments (C#, ASP.NET). Tight integration with Visual Studio, Azure, Active Directory. PostgreSQL is cross-platform (Linux, macOS, Windows). (2) Tooling: SQL Server has excellent tooling (SSMS, Azure Data Studio, Visual Studio integration). PostgreSQL has good tooling (pgAdmin, DBeaver) but not as polished. (3) Cost: SQL Server licensing is $14K-$47K/core (expensive). PostgreSQL is free (open-source). For 8-core server, SQL Server costs $112K-$376K vs PostgreSQL $0. (4) Features: Both have advanced features (window functions, CTEs, JSON, full-text search). SQL Server has columnstore indexes (better for analytics). PostgreSQL has JSONB (faster than SQL Server JSON). (5) Cloud: SQL Server has tight Azure integration. PostgreSQL works on all clouds (AWS, Google, Azure). (6) Support: SQL Server includes enterprise support. PostgreSQL has community support + paid support options. When to choose SQL Server: (1) Windows/.NET environment, (2) Budget for licensing, (3) Need enterprise support, (4) Azure deployment, (5) Team knows SQL Server. When to choose PostgreSQL: (1) Cross-platform, (2) Budget constraints, (3) Open-source preference, (4) Cloud-agnostic, (5) Similar features at $0 cost. Trade-offs: SQL Server has better tooling and support but expensive licensing. PostgreSQL has similar features and free licensing but less polished tooling. Real-world: Stack Overflow uses SQL Server (Windows/.NET shop). Uber uses PostgreSQL (cost savings, cross-platform).
Explain the trade-offs between using a single database (monolith) vs multiple databases (microservices). When would you choose each approach?
Single database vs multiple databases is architectural decision with significant trade-offs: **Single Database (Monolith):** All services share one database. Pros: (1) Simple (one database to manage, backup, monitor), (2) ACID transactions across all data (can update users + orders + inventory in single transaction), (3) Easy JOINs (can join users + orders + products in single query), (4) Lower cost (one database server vs many). Cons: (1) Tight coupling (all services depend on same schema), (2) Single point of failure (database down = entire system down), (3) Scaling bottleneck (all services compete for same database resources), (4) Schema changes affect all services (risky deployments). **Multiple Databases (Microservices):** Each service has own database. Pros: (1) Loose coupling (services are independent, can change schema without affecting others), (2) Technology diversity (can use PostgreSQL for one service, MongoDB for another), (3) Independent scaling (scale each database independently), (4) Fault isolation (one database down = only one service affected). Cons: (1) Complex (many databases to manage, backup, monitor), (2) No ACID transactions across services (need distributed transactions or eventual consistency), (3) No JOINs across services (need API calls or data duplication), (4) Higher cost (many database servers). When to choose single database: (1) Small team (<10 developers), (2) Simple application (CRUD operations), (3) Need ACID transactions across all data, (4) Budget constraints, (5) Early stage (premature to split). When to choose multiple databases: (1) Large team (>10 developers, need independence), (2) Complex application (different services have different data needs), (3) Need independent scaling (some services are high-traffic, others low-traffic), (4) Mature product (proven need for microservices). Migration path: Start with single database (simpler), split into multiple databases when needed (when team grows, when scaling becomes bottleneck). Real-world: Uber started with single MySQL database, split into 50+ PostgreSQL databases as they grew. Netflix uses multiple databases (MySQL for billing, Cassandra for viewing history). Lesson: Start simple (single database), split when needed (when complexity of monolith exceeds complexity of microservices).
You are CTO of a startup building a SaaS analytics platform. Requirements: (1) Ingest 1M events/day, (2) Complex analytics queries (window functions, CTEs), (3) Real-time dashboards, (4) Budget: $1000/month. Choose database and justify.
-- Requirement Analysis:
-- 1. Ingest 1M events/day
-- = 1M / 86400 seconds = 11.6 events/second (low write rate)
-- Both PostgreSQL and MySQL can handle this easily
-- 2. Complex analytics queries
-- Need: Window functions, CTEs, aggregations
-- PostgreSQL: Excellent (mature window functions, CTEs, materialized views)
-- MySQL: Good (window functions added in 8.0, less mature)
-- Specialized: Redshift, BigQuery (better for very large data, but overkill for 1M events/day)
-- 3. Real-time dashboards
-- Need: Fast queries (<1 second), concurrent users
-- PostgreSQL: MVCC = non-blocking reads (good for dashboards)
-- MySQL: Locking = reads may block (worse for dashboards)
-- 4. Budget: $1000/month
-- AWS RDS PostgreSQL: db.t3.large (2 vCPU, 8GB RAM) = $280/month
-- AWS RDS MySQL: db.t3.large (2 vCPU, 8GB RAM) = $280/month
-- Redshift: $180/month (dc2.large) but overkill for 1M events/day
-- BigQuery: Pay per query, ~$100-500/month for this workload
-- DECISION: PostgreSQL
-- Schema design:
CREATE TABLE events (
event_id BIGSERIAL PRIMARY KEY,
user_id INTEGER,
event_type VARCHAR(50),
event_data JSONB, -- Flexible schema for different event types
created_at TIMESTAMP DEFAULT NOW()
);
-- Indexes for fast queries:
CREATE INDEX idx_events_user_id ON events(user_id);
CREATE INDEX idx_events_type ON events(event_type);
CREATE INDEX idx_events_created_at ON events(created_at DESC);
CREATE INDEX idx_events_data ON events USING GIN(event_data); -- JSONB index
-- Partitioning for large data (optional, for future scale):