PostgreSQL: Arrays & JSONB: Mistakes
Module: Database-Specific Features
Using JSON type instead of JSONB type for storing JSON data
Use JSONB type: CREATE TABLE table (data JSONB), 2-3x faster than JSON
JSON type stores data as text (must parse on every query). JSONB type stores data in binary format (pre-parsed, faster). Performance: JSON: Parse on every query (slow). JSONB: Pre-parsed binary (2-3x faster). Indexing: JSON: Not indexable. JSONB: GIN indexes enable fast queries. Storage: JSON: Preserves formatting/whitespace (larger). JSONB: Compressed binary (smaller). Example: 1M rows with JSON: 100ms query time. JSONB: 30ms query time (3x faster). Real-world: Stripe uses JSONB for metadata (2-3x faster than JSON). Migration: ALTER TABLE table ALTER COLUMN data TYPE JSONB USING data::jsonb. Lesson: Always use JSONB, not JSON (unless you need to preserve formatting).
Always use JSONB for JSON data. Only use JSON if you need to preserve exact formatting/whitespace (rare). JSONB is faster, indexable, and smaller.
High
Slow queries (text parsing), not indexable, wasted storage
Not creating GIN indexes on arrays or JSONB columns
Create GIN indexes: CREATE INDEX USING GIN (column), 10-100x faster queries
Without GIN index, PostgreSQL does sequential scan (checks every row). With GIN index, PostgreSQL uses index scan (checks only matching rows). Performance: Without GIN: Sequential scan, 1000ms for 1M rows. With GIN: Index scan, 10ms for 1M rows (100x faster). GIN index works for: Arrays: @>, &&, ANY queries. JSONB: ->, @>, ? queries. Example: Query without GIN: SELECT * FROM products WHERE 'electronics' = ANY(tags); -- Sequential scan, slow. Query with GIN: Same query, index scan, 100x faster. Real-world: GitLab uses GIN indexes on issue metadata (10x faster queries). Lesson: Always create GIN indexes on arrays/JSONB columns.
Create GIN indexes on all arrays and JSONB columns. Essential for performance. Use: CREATE INDEX idx_name ON table USING GIN (column).
Critical
Slow queries (sequential scan), poor performance on large tables
Using arrays for large lists (> 100 items) or complex relationships
Use junction tables for large lists or complex relationships
Arrays are fast for small lists (< 100 items) but slow for large lists. Problem with large arrays: (1) Sequential scan even with GIN index (slow), (2) Updates require rewriting entire array (slow), (3) Can't query relationship attributes. Example: Product with 1000 tags. Array: Slow queries/updates. Junction table: Fast queries/updates. When to use arrays: Simple lists (< 100 items), no relationship attributes. When to use junction tables: Large lists (> 100 items), complex relationships, need to query attributes. Real-world: Amazon uses arrays for product categories (< 50 categories). Uses junction tables for product reviews (1000+ reviews). Lesson: Arrays for small simple lists, junction tables for large complex relationships.
Use arrays for small lists (< 100 items). Use junction tables for large lists or when you need to query relationship attributes.
Medium
Slow queries/updates, can't query relationship attributes
Deep JSONB nesting (3+ levels) for frequently queried data
Flatten JSONB structure or use normalized tables for frequently queried nested data
Deep JSONB nesting (3+ levels) adds overhead: (1) Parsing overhead (must traverse multiple levels), (2) Complex query syntax (data -> 'a' -> 'b' -> 'c'), (3) Harder to index (GIN index less effective). Example: Deep nesting: {"user": {"profile": {"settings": {"theme": "dark"}}}}. Query: data -> 'user' -> 'profile' -> 'settings' ->> 'theme' (slow, complex). Flattened: {"user_profile_settings_theme": "dark"}. Query: data ->> 'user_profile_settings_theme' (faster, simpler). Alternative: Normalized tables for frequently queried nested data. Real-world: Stripe flattens JSONB for frequently queried fields (faster queries). Lesson: Keep JSONB nesting shallow (1-2 levels), flatten or normalize for frequently queried data.
Keep JSONB nesting shallow (1-2 levels). Flatten structure or use normalized tables for frequently queried nested data.
Medium
Slow queries (parsing overhead), complex query syntax
Using arrays when you need to query relationship attributes
Use junction tables when you need to query relationship attributes
Arrays store only values, not attributes. Problem: Can't query when tag was added, tag priority, tag metadata. Example: Product tags with created_at. Array: tags TEXT[] -- Can't store created_at. Junction table: products_tags (product_id, tag_id, created_at) -- Can query by created_at. When to use arrays: Simple lists (no attributes needed). When to use junction tables: Need to query attributes (created_at, priority, metadata). Real-world: GitHub uses junction tables for repository topics (can query when topic was added). Lesson: Arrays for simple lists, junction tables for relationships with attributes.
Use arrays for simple lists (no attributes). Use junction tables when you need to query relationship attributes (created_at, priority, etc.).
Medium
Can't query attributes (created_at, priority), limited flexibility