Cross-Database Compatibility: Interview
Module: Database-Specific Features
Explain the trade-offs between using ANSI SQL standard vs database-specific features. When would you choose each approach?
ANSI SQL standard provides portability (works across MySQL, PostgreSQL, SQL Server, Oracle) but is limited (no advanced features like PostgreSQL arrays, MySQL JSON functions). Database-specific features provide better performance and functionality but create vendor lock-in (can't switch databases without rewriting SQL). Trade-offs: (1) Portability vs Performance: ANSI SQL is slower than optimized database-specific SQL. Example: PostgreSQL RETURNING clause (1 query) vs ANSI SQL INSERT + SELECT (2 queries) = 2x slower. (2) Features vs Compatibility: Advanced features (PostgreSQL arrays, full-text search) not available in ANSI SQL. (3) Simplicity vs Flexibility: ANSI SQL is simpler (one codebase), database-specific SQL requires conditional logic (more complex). When to use ANSI SQL: (1) SaaS products supporting multiple databases (customers choose MySQL or PostgreSQL). (2) Early-stage products (database choice uncertain). (3) Open-source projects (wider adoption). When to use database-specific features: (1) Single database deployment (no need for portability). (2) Performance-critical queries (use database-specific optimizations). (3) Mature products (database choice stable). Hybrid approach: Use ANSI SQL for 90% of queries, database-specific SQL for 10% critical queries (with conditional logic). Real-world: GitLab uses PostgreSQL-specific features (arrays, JSONB) and documents that PostgreSQL is required (MySQL support is legacy).
How do ORMs (Hibernate, SQLAlchemy) achieve cross-database compatibility? What are the trade-offs?
ORMs achieve cross-database compatibility through SQL dialect abstraction: (1) Dialect system: Each database has a dialect (MySQL dialect, PostgreSQL dialect, SQL Server dialect) that generates database-specific SQL. (2) Automatic SQL generation: ORM translates object operations (save, find, update) to appropriate SQL for each database. (3) Type mapping: ORM maps programming language types (Java Integer, Python int) to database types (MySQL INT, PostgreSQL INTEGER). (4) Feature detection: ORM checks database version/features, uses appropriate syntax (LIMIT vs OFFSET/FETCH). Example: Hibernate query.setFirstResult(20).setMaxResults(10) generates: MySQL: SELECT ... LIMIT 10 OFFSET 20, PostgreSQL: SELECT ... LIMIT 10 OFFSET 20, SQL Server: SELECT ... OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY, Oracle: SELECT ... OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY. Trade-offs: (1) Performance overhead: ORMs add 5-30% overhead vs raw SQL (object mapping, SQL generation). (2) Limited to ORM features: Can't use advanced database-specific features (PostgreSQL arrays, MySQL JSON functions). (3) Complex queries difficult: ORMs struggle with complex joins, subqueries, window functions. (4) Learning curve: Must learn ORM API, not just SQL. When to use ORMs: (1) Multi-database support required. (2) CRUD operations (simple queries). (3) Rapid development (less boilerplate). When to avoid ORMs: (1) Performance-critical queries (use raw SQL). (2) Complex queries (use query builder or raw SQL). (3) Database-specific features needed. Real-world: GitLab uses ActiveRecord ORM (Ruby) to support PostgreSQL and MySQL. Hibernate supports 20+ databases (Java).
What are the main challenges when migrating from MySQL to PostgreSQL? How do you handle them?
MySQL to PostgreSQL migration challenges: (1) Schema differences: AUTO_INCREMENT → SERIAL/IDENTITY, TINYINT → SMALLINT, ENUM → CHECK constraint or custom type. (2) SQL syntax differences: LIMIT → LIMIT (same) or OFFSET/FETCH, || is string concat in PostgreSQL but bitwise OR in MySQL, identifier quotes differ (MySQL uses backticks, PostgreSQL uses double quotes). (3) Data type differences: MySQL TEXT (65KB) → PostgreSQL TEXT (unlimited), MySQL DATETIME → PostgreSQL TIMESTAMP. (4) Stored procedures: MySQL uses SQL syntax, PostgreSQL uses PL/pgSQL (must rewrite). (5) Case sensitivity: MySQL is case-insensitive by default (Windows), PostgreSQL is case-sensitive. (6) Transaction isolation: MySQL default is REPEATABLE READ, PostgreSQL default is READ COMMITTED. Migration process: (1) Schema migration: Convert DDL (CREATE TABLE) using tools (pgloader, ora2pg) or manual conversion. (2) Data migration: Export from MySQL (mysqldump), transform (fix encoding, data types), import to PostgreSQL (COPY). (3) Application migration: Update connection strings, replace MySQL-specific SQL (LIMIT stays same, but check other syntax), test thoroughly. (4) Testing: Unit tests, integration tests, performance tests, data validation (row counts, checksums). (5) Gradual rollout: Blue-green deployment (run both databases in parallel), canary deployment (migrate 1% of traffic first). Real-world: GitLab migrated from MySQL to PostgreSQL (2017) - took 6 months, improved performance 2-3x. Uber migrated from MySQL to PostgreSQL for ride matching - MVCC enabled non-blocking reads.
You're building a SaaS product that must support both MySQL and PostgreSQL. Write portable SQL for: (1) Pagination (page 3, 20 items per page), (2) Get users created in last 7 days, (3) Full name (first + last name).
-- Challenge: Write SQL that works on both MySQL and PostgreSQL
-- (1) Pagination: Page 3, 20 items per page (skip 40, take 20)
-- MySQL / PostgreSQL: LIMIT (both support same syntax)
SELECT user_id, username, email, created_at
FROM users
ORDER BY user_id
LIMIT 20 OFFSET 40;
-- Works on both MySQL and PostgreSQL
-- Alternative: OFFSET/FETCH (PostgreSQL only, not MySQL)
SELECT user_id, username, email, created_at
FROM users
ORDER BY user_id
OFFSET 40 ROWS
FETCH NEXT 20 ROWS ONLY;
-- Works on PostgreSQL 8.4+, NOT on MySQL
-- PORTABLE SOLUTION: Use LIMIT (works on both)
SELECT user_id, username, email, created_at
FROM users
ORDER BY user_id
LIMIT 20 OFFSET 40;
-- (2) Get users created in last 7 days
-- MySQL: DATE_SUB or INTERVAL
SELECT user_id, username, created_at
FROM users
WHERE created_at >= DATE_SUB(NOW(), INTERVAL 7 DAY);
-- MySQL-specific
-- PostgreSQL: INTERVAL
SELECT user_id, username, created_at
FROM users
WHERE created_at >= NOW() - INTERVAL '7 days';
-- PostgreSQL-specific
-- PORTABLE SOLUTION: Use CURRENT_TIMESTAMP and date arithmetic
-- Option 1: Use CURRENT_DATE (ANSI SQL)