PostgreSQL Features Deep Dive: Interview
Module: Database-Specific Features
Explain the difference between JSON and JSONB in PostgreSQL. When would you use each?
PostgreSQL has two JSON types: JSON (text-based) and JSONB (binary). Differences: (1) Storage: JSON stores as text (exact copy of input), JSONB stores as binary (parsed, normalized). (2) Performance: JSON is slower (parsed on every query), JSONB is faster (parsed once on insert). (3) Indexing: JSON cannot be indexed with GIN, JSONB can be indexed with GIN (10-100x faster queries). (4) Operators: JSON has limited operators, JSONB has rich operators (@>, ?, #>, etc). (5) Formatting: JSON preserves exact formatting (whitespace, key order), JSONB does not. When to use JSON: Almost never. Only if you need to preserve exact formatting (rare). When to use JSONB: Always. 99% of cases. Example: E-commerce product attributes. Use JSONB for flexible schema, fast queries with GIN indexes. Real-world: Shopify uses JSONB for 1M+ products. Performance: JSONB with GIN index: 5ms for 100K rows. JSON without index: 500ms (100x slower). Lesson: Always use JSONB unless you have specific reason to preserve formatting.
How does PostgreSQL MVCC work? Why did Uber migrate from MySQL to PostgreSQL for MVCC?
MVCC (Multi-Version Concurrency Control) is PostgreSQL concurrency model. How it works: (1) Each row has version number (xmin = transaction that created row, xmax = transaction that deleted row). (2) UPDATE creates new row version (doesn't modify old version). (3) Readers see snapshot (old version based on transaction start time). (4) Writers create new version. (5) No locks between readers and writers (non-blocking). MVCC vs MySQL locking: MySQL InnoDB uses row-level locking. UPDATE acquires lock, SELECT waits for lock (blocking). PostgreSQL MVCC: UPDATE creates new version, SELECT reads old version (non-blocking). Why Uber migrated: Uber ride matching requires reading driver locations while updating them. MySQL locking caused 100-500ms delays (reads blocked by writes). PostgreSQL MVCC: 10-50ms (non-blocking reads). 10x faster. MVCC trade-offs: Pros: Non-blocking reads, high concurrency. Cons: Dead tuples (old versions), need VACUUM to clean up. Real-world: Instagram uses MVCC for 2B users. Uber uses MVCC for 50K writes/sec. Lesson: MVCC is critical for high-concurrency workloads.
When should you use PostgreSQL arrays vs junction tables for many-to-many relationships?
Arrays vs junction tables decision depends on complexity. Use arrays when: (1) Simple lists (tags, categories, hashtags), (2) No metadata needed (no created_at, order, etc), (3) Order matters (arrays preserve order), (4) Query from one side only (find posts by tag, not tags by post count), (5) No referential integrity needed. Use junction tables when: (1) Complex relationships (many-to-many with metadata), (2) Need metadata (created_at, created_by, order), (3) Need referential integrity (foreign keys), (4) Query from both sides (find posts by tag AND tags by post count), (5) Need to update relationship (add/remove efficiently). Example: Blog posts with tags. Arrays: Simple list, query posts by tag. Junction table: Need tag statistics (post count per tag), need created_at. Performance: Arrays with GIN index: 5ms. Junction table with JOINs: 50ms (10x slower). But junction tables provide more features. Real-world: Instagram uses arrays for hashtags (simple list, query photos by hashtag). GitHub uses junction table for repository stars (need starred_at, query from both sides). Lesson: Use arrays for simple lists, junction tables for complex relationships with metadata.
You have a products table with JSONB attributes column. Write queries to: (1) Find products by brand, (2) Find products with specific feature, (3) Update product rating, (4) Create appropriate indexes.
-- Schema
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name VARCHAR(200),
price DECIMAL(10,2),
attributes JSONB
);
-- Sample data
INSERT INTO products (name, price, attributes) VALUES (
'Wireless Headphones',
99.99,
'{
"brand": "Sony",
"color": "black",
"features": ["noise-canceling", "bluetooth"],
"rating": 4.5
}'
);
-- (1) Find products by brand
-- Option 1: Using ->> operator (returns text)
SELECT product_id, name, price,
attributes->>'brand' as brand
FROM products
WHERE attributes->>'brand' = 'Sony';
-- Fast with expression index
-- Option 2: Using @> operator (contains)
SELECT product_id, name, price
FROM products
WHERE attributes @> '{"brand": "Sony"}';
-- Fast with GIN index
-- (2) Find products with specific feature
-- Check if array contains value
SELECT product_id, name, price