Normalization & Database Design Principles: Interview
Module: Foundational Concepts
Explain 1NF, 2NF, 3NF with examples
1NF (First Normal Form): Atomic values, no repeating groups
Example: Split "Laptop, Mouse, Keyboard" into separate rows
2NF (Second Normal Form): 1NF + no partial dependencies
Example: Move customer_email from ORDER_ITEMS to ORDERS table
3NF (Third Normal Form): 2NF + no transitive dependencies
Example: Move product_price from ORDER_ITEMS to PRODUCTS table
Result: Each fact stored once. Update price in PRODUCTS, affects all orders automatically.
When would you denormalize a database?
Denormalize only after measuring performance bottleneck:
When to denormalize:
• Read-heavy workload (10:1 read:write ratio)
• Profiling shows JOIN bottleneck
• Reporting/analytics queries
• Caching layer not sufficient
When NOT to denormalize:
• Write-heavy workload
• No measured performance issue
• Premature optimization
Rule: Normalize first, denormalize only when profiling proves necessity.
What are update/delete/insert anomalies?
Anomalies caused by unnormalized data:
Update Anomaly: Change customer email in multiple rows (error-prone)
Example: John's email in 100 order rows - must update all 100
Insert Anomaly: Cannot add customer without order
Example: Cannot store customer until they place first order
Delete Anomaly: Delete order loses customer data
Example: Delete last order, lose customer information
Solution: Normalization eliminates all three anomalies.
Design schema for social media platform
USERS (user_id PK, username, email, created_at)
POSTS (post_id PK, user_id FK, content, timestamp)
COMMENTS (comment_id PK, post_id FK, user_id FK, content, timestamp)
LIKES (like_id PK, post_id FK, user_id FK, timestamp)
FRIENDSHIPS (friendship_id PK, user_id_1 FK, user_id_2 FK, status)
Relationships:
• Users → Posts (1:N)
• Posts → Comments (1:N)
• Users ↔ Friends (M:N via FRIENDSHIPS)
• Users ↔ Liked Posts (M:N via LIKES)