Database Migration Strategies: Interview

Q: What are the main challenges when migrating from MySQL to PostgreSQL? How do you handle schema conversion, data migration, and application changes?

MySQL → PostgreSQL challenges: (1) Schema conversion: AUTO_INCREMENT → SERIAL, DATETIME → TIMESTAMP, ENUM → CHECK constraint, backticks → double quotes. (2) Data migration: Export (mysqldump), transform (encoding, data types), import (pgloader, COPY). (3) Application changes: SQL syntax (LIMIT same, but || vs CONCAT), connection strings, ORM configuration. (4) Stored procedures: MySQL uses SQL syntax, PostgreSQL uses PL/pgSQL (must rewrite). (5) Case sensitivity: MySQL case-insensitive, PostgreS

Module: Database-Specific Features

Compare migration strategies: Big bang vs Phased vs Parallel run vs Blue-green. When would you use each?

Big bang (single cutover): Stop app → export → import → start app. Pros: Simple, clean. Cons: Downtime (hours), high risk. Use case: Small databases (<100GB), low-traffic apps, maintenance windows OK. Example: Startup migrating over weekend. Phased (gradual): Migrate tables one by one, app reads from both. Pros: Lower risk per phase. Cons: Complex dual-read logic, longer total time. Use case: Large databases (>1TB), complex schemas. Example: E-commerce migrating orders first, then products. Parallel run (dual-write): Write to both databases, read from old, validate new, switch reads. Pros: Zero downtime, safe validation. Cons: Complex dual-write, 2x write overhead. Use case: High-availability systems, zero-downtime requirement. Example: GitLab MySQL → PostgreSQL used dual-write for 3 months. Blue-green: Setup new database (green), sync data, switch traffic instantly, keep old (blue) for rollback. Pros: Instant cutover (seconds), easy rollback. Cons: 2x infrastructure cost. Use case: Mission-critical systems, financial apps. Example: Stack Overflow SQL Server upgrade with zero downtime. Recommendation: Blue-green for production systems (best safety/downtime trade-off), big bang for small databases, parallel run for complex migrations.

How do you achieve zero-downtime database migration? Explain the dual-write pattern and its challenges.

Zero-downtime migration uses dual-write pattern: (1) Setup new database, (2) Application writes to BOTH old and new databases, (3) Application reads from old database, (4) Validate new database (row counts, checksums, sample queries), (5) Switch reads to new database, (6) Stop writing to old database. Challenges: (1) Dual-write complexity: Application must write to both databases, handle failures (what if write to new database fails?), maintain consistency. (2) Performance overhead: 2x writes, slower response times. (3) Data sync: Writes may arrive out of order, need transaction coordination. (4) Rollback: If issues, switch reads back to old database. Implementation: (1) Application layer: Add dual-write logic (write to old, then new), (2) CDC (Change Data Capture): Capture changes from old database, apply to new (Debezium, AWS DMS), (3) Triggers: Database triggers write to both databases. Real-world: GitLab used dual-write for MySQL → PostgreSQL migration, ran for 3 months, validated new database, switched reads gradually (1% → 10% → 100%), zero downtime. Uber used CDC (Debezium) for ride matching migration, 50K writes/sec, <5 second lag, zero downtime.

What are the main challenges when migrating from MySQL to PostgreSQL? How do you handle schema conversion, data migration, and application changes?

MySQL → PostgreSQL challenges: (1) Schema conversion: AUTO_INCREMENT → SERIAL, DATETIME → TIMESTAMP, ENUM → CHECK constraint, backticks → double quotes. (2) Data migration: Export (mysqldump), transform (encoding, data types), import (pgloader, COPY). (3) Application changes: SQL syntax (LIMIT same, but || vs CONCAT), connection strings, ORM configuration. (4) Stored procedures: MySQL uses SQL syntax, PostgreSQL uses PL/pgSQL (must rewrite). (5) Case sensitivity: MySQL case-insensitive, PostgreSQL case-sensitive (add LOWER() indexes). Schema conversion: Use pgloader (automates 90%), manual fixes for ON UPDATE CURRENT_TIMESTAMP (create trigger), ENUM (verify CHECK constraints). Data migration: (1) Full dump: mysqldump --single-transaction --default-character-set=utf8mb4, (2) Incremental sync: Debezium CDC for continuous sync, (3) Validation: Row counts, checksums, sample queries. Application changes: (1) Update connection strings, (2) Replace MySQL-specific SQL (CONCAT() works on both), (3) Test thoroughly (unit tests, integration tests). Real-world: GitLab migration took 6 months, 100+ schema fixes, 3 months dual-write, 2-3x performance improvement. Key lesson: Test on production-like data, not small sample.

You need to migrate a 500GB MySQL database to PostgreSQL with <1 hour downtime. Design the migration strategy including schema conversion, data migration, validation, and rollback plan.

-- Migration Strategy: Blue-Green with Incremental Sync

-- Goal: <1 hour downtime for 500GB database

-- Phase 1: Assessment (Week 1)

-- Analyze schema compatibility

SHOW CREATE TABLE users;

SHOW CREATE TABLE orders;

-- Identify issues: AUTO_INCREMENT, ENUM, DATETIME, stored procedures

-- Estimate data transfer time

-- 500GB / 1Gbps network = 4000 seconds = 67 minutes (too slow for 1 hour)

-- Solution: Incremental sync (initial full dump + CDC)

-- Phase 2: Setup Green (PostgreSQL) - Week 2

-- Convert schema with pgloader

pgloader mysql://user:pass@mysql-host/myapp postgresql://user:pass@postgres-host/myapp

-- Manual schema fixes

-- 1. ON UPDATE CURRENT_TIMESTAMP → trigger

CREATE TRIGGER update_users_updated_at

BEFORE UPDATE ON users

FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

-- 2. ENUM → CHECK constraint

ALTER TABLE users ADD CONSTRAINT chk_status

CHECK (status IN ('active', 'inactive', 'banned'));

-- Phase 3: Initial Data Sync (Week 3)

-- Full dump (takes 2-3 hours, done off-peak)

mysqldump --single-transaction --quick --default-character-set=utf8mb4 myapp > dump.sql

-- Import to PostgreSQL

psql myapp < dump.sql

-- Or use pgloader (faster)

pgloader mysql://... postgresql://...

-- Verify row counts

-- MySQL: SELECT COUNT(*) FROM users; -- 10M

-- PostgreSQL: SELECT COUNT(*) FROM users; -- 10M

-- Phase 4: Incremental Sync (Week 4)

-- Setup Debezium CDC