Database Migration Strategies: Mistakes
Module: Database-Specific Features
Testing migration on small sample data (1000 rows) instead of production-like data (1M+ rows)
Test on anonymized production dump with full data volume, catches encoding issues, performance problems, edge cases
Small sample data doesn't reveal real issues: (1) Encoding problems (MySQL latin1 → PostgreSQL UTF8 with special characters), (2) Performance issues (queries fast on 1K rows, slow on 1M rows), (3) Edge cases (NULL handling, date ranges, large TEXT fields). Example: GitLab migration tested on full production dump (100GB), found 50+ issues not visible in small sample. Always test with: (1) Full data volume (anonymize sensitive data), (2) Real query patterns (slow query log), (3) Peak load (load testing). Validation: Row counts match, checksums match, sample queries return same results, performance acceptable.
Use production dump (anonymized) for testing. Test with full data volume, not sample. Run load tests to catch performance issues. Validate thoroughly: row counts, checksums, sample queries, performance.
Critical
Production migration fails with encoding errors, performance regression, data corruption
No rollback plan, assuming migration will succeed ("we'll fix issues as they come")
Always have rollback plan: full backup before migration, keep old database running 24-48 hours, set rollback deadline
Migrations fail for unexpected reasons: (1) Data corruption (encoding issues, data type mismatches), (2) Performance regression (new database slower than old), (3) Application bugs (SQL syntax differences), (4) Infrastructure issues (network, disk space). Without rollback plan: Extended downtime (hours to days), data loss (if old database deleted), panic (no clear recovery path). Rollback plan: (1) Full backup before migration (restore if needed), (2) Keep old database running 24-48 hours (switch back if issues), (3) Blue-green deployment (instant rollback), (4) Set rollback deadline (e.g., 24 hours - if issues not resolved, rollback automatically). Example: Stripe migration had rollback plan, discovered performance issue 12 hours after cutover, rolled back instantly, fixed issue, re-migrated successfully.
Always have rollback plan. Take full backup before migration. Keep old database running 24-48 hours. Set rollback deadline. Test rollback procedure before migration.
Critical
Migration fails, no way to rollback, extended downtime, data loss, panic
Ignoring character encoding differences (MySQL latin1 → PostgreSQL UTF8 without conversion)
Convert encoding explicitly: mysqldump --default-character-set=utf8mb4, verify special characters after import
MySQL default encoding is latin1 (Western European), PostgreSQL default is UTF8 (Unicode). Without conversion: Special characters corrupted. Example: User name "José" becomes "Jos?" or "José". Chinese characters "中文" become "??" or mojibake. Fix: (1) Export with correct encoding: mysqldump --default-character-set=utf8mb4, (2) Verify PostgreSQL encoding: SHOW SERVER_ENCODING (should be UTF8), (3) Test special characters: INSERT test data with é, ñ, 中文, verify after import. Real-world: E-commerce site migrated MySQL → PostgreSQL, forgot encoding conversion, 10K product names corrupted, had to rollback and re-migrate.
Always specify encoding: mysqldump --default-character-set=utf8mb4. Verify PostgreSQL encoding: SHOW SERVER_ENCODING. Test special characters before full migration. Validate: Query rows with special characters, verify they display correctly.
High
Data corruption: Special characters (é, ñ, 中文) become ?, mojibake (ã), data loss
Not monitoring replication lag during incremental sync (CDC), cutting over while lag is high (5 minutes)
Monitor replication lag continuously, wait until lag <5 seconds before cutover, have alerting for lag spikes
Incremental sync (CDC with Debezium, AWS DMS) has replication lag: time between write to old database and apply to new database. Normal lag: 1-5 seconds. High lag: minutes to hours (network issues, high write volume, slow new database). Cutting over with high lag: New database missing recent writes, data loss. Example: E-commerce site cut over with 5 minute lag, 1000 orders placed in last 5 minutes not in new database, had to rollback. Monitor lag: (1) Check last synced timestamp: SELECT MAX(created_at) FROM orders (should be within 5 seconds), (2) CDC tool metrics: Debezium lag, AWS DMS replication lag, (3) Alert if lag >10 seconds. Wait for lag <5 seconds before cutover.
Monitor replication lag continuously. Wait until lag <5 seconds before cutover. Set up alerting for lag spikes. Verify: SELECT MAX(created_at) should be within 5 seconds of current time.
High
Data loss: New database missing recent writes, users see stale data, transactions lost
Not testing performance after migration, assuming new database will be faster ("PostgreSQL is faster than MySQL")
Run load tests on new database before cutover, compare performance with old database, optimize indexes if needed
New database may be slower despite being "better": (1) Different query planner (PostgreSQL planner different from MySQL), (2) Missing indexes (not all indexes converted), (3) Different configuration (default settings not optimized), (4) Different hardware (new server slower than old). Example: Company migrated MySQL → PostgreSQL, assumed PostgreSQL faster, didn't test performance, after cutover queries 10x slower, had to rollback. Performance testing: (1) Run load tests on new database (simulate production traffic), (2) Compare query times with old database (should be similar or faster), (3) Check slow queries: pg_stat_statements (PostgreSQL), slow query log (MySQL), (4) Optimize: Add missing indexes, tune configuration, analyze query plans. Real-world: GitLab tested performance extensively, found PostgreSQL 2-3x faster than MySQL for their workload.
Always test performance before cutover. Run load tests on new database. Compare query times with old database. Check slow queries, optimize indexes. Validate: P99 latency should be similar or better than old database.
High
Performance regression: New database slower than old, queries timeout, users complain, rollback needed