PostgreSQL: Arrays & JSONB: Mistakes

Q: Common mistake in PostgreSQL: Arrays & JSONB

Deep JSONB nesting (3+ levels) adds overhead: (1) Parsing overhead (must traverse multiple levels), (2) Complex query syntax (data -> 'a' -> 'b' -> 'c'), (3) Harder to index (GIN index less effective). Example: Deep nesting: {"user": {"profile": {"settings": {"theme": "dark"}}}}. Query: data -> 'user' -> 'profile' -> 'settings' ->> 'theme' (slow, complex). Flattened: {"user_profile_settings_theme": "dark"}. Query: data ->> 'user_profile_settings_theme' (faster, simpler). Alternative: Normalized

Module: Database-Specific Features

Using JSON type instead of JSONB type for storing JSON data

Use JSONB type: CREATE TABLE table (data JSONB), 2-3x faster than JSON

JSON type stores data as text (must parse on every query). JSONB type stores data in binary format (pre-parsed, faster). Performance: JSON: Parse on every query (slow). JSONB: Pre-parsed binary (2-3x faster). Indexing: JSON: Not indexable. JSONB: GIN indexes enable fast queries. Storage: JSON: Preserves formatting/whitespace (larger). JSONB: Compressed binary (smaller). Example: 1M rows with JSON: 100ms query time. JSONB: 30ms query time (3x faster). Real-world: Stripe uses JSONB for metadata (2-3x faster than JSON). Migration: ALTER TABLE table ALTER COLUMN data TYPE JSONB USING data::jsonb. Lesson: Always use JSONB, not JSON (unless you need to preserve formatting).

Always use JSONB for JSON data. Only use JSON if you need to preserve exact formatting/whitespace (rare). JSONB is faster, indexable, and smaller.

High

Slow queries (text parsing), not indexable, wasted storage

Not creating GIN indexes on arrays or JSONB columns

Create GIN indexes: CREATE INDEX USING GIN (column), 10-100x faster queries

Without GIN index, PostgreSQL does sequential scan (checks every row). With GIN index, PostgreSQL uses index scan (checks only matching rows). Performance: Without GIN: Sequential scan, 1000ms for 1M rows. With GIN: Index scan, 10ms for 1M rows (100x faster). GIN index works for: Arrays: @>, &&, ANY queries. JSONB: ->, @>, ? queries. Example: Query without GIN: SELECT * FROM products WHERE 'electronics' = ANY(tags); -- Sequential scan, slow. Query with GIN: Same query, index scan, 100x faster. Real-world: GitLab uses GIN indexes on issue metadata (10x faster queries). Lesson: Always create GIN indexes on arrays/JSONB columns.

Create GIN indexes on all arrays and JSONB columns. Essential for performance. Use: CREATE INDEX idx_name ON table USING GIN (column).

Critical

Slow queries (sequential scan), poor performance on large tables

Using arrays for large lists (> 100 items) or complex relationships

Use junction tables for large lists or complex relationships

Arrays are fast for small lists (< 100 items) but slow for large lists. Problem with large arrays: (1) Sequential scan even with GIN index (slow), (2) Updates require rewriting entire array (slow), (3) Can't query relationship attributes. Example: Product with 1000 tags. Array: Slow queries/updates. Junction table: Fast queries/updates. When to use arrays: Simple lists (< 100 items), no relationship attributes. When to use junction tables: Large lists (> 100 items), complex relationships, need to query attributes. Real-world: Amazon uses arrays for product categories (< 50 categories). Uses junction tables for product reviews (1000+ reviews). Lesson: Arrays for small simple lists, junction tables for large complex relationships.

Use arrays for small lists (< 100 items). Use junction tables for large lists or when you need to query relationship attributes.

Medium

Slow queries/updates, can't query relationship attributes

Deep JSONB nesting (3+ levels) for frequently queried data

Flatten JSONB structure or use normalized tables for frequently queried nested data

Deep JSONB nesting (3+ levels) adds overhead: (1) Parsing overhead (must traverse multiple levels), (2) Complex query syntax (data -> 'a' -> 'b' -> 'c'), (3) Harder to index (GIN index less effective). Example: Deep nesting: {"user": {"profile": {"settings": {"theme": "dark"}}}}. Query: data -> 'user' -> 'profile' -> 'settings' ->> 'theme' (slow, complex). Flattened: {"user_profile_settings_theme": "dark"}. Query: data ->> 'user_profile_settings_theme' (faster, simpler). Alternative: Normalized tables for frequently queried nested data. Real-world: Stripe flattens JSONB for frequently queried fields (faster queries). Lesson: Keep JSONB nesting shallow (1-2 levels), flatten or normalize for frequently queried data.

Keep JSONB nesting shallow (1-2 levels). Flatten structure or use normalized tables for frequently queried nested data.

Medium

Slow queries (parsing overhead), complex query syntax

Using arrays when you need to query relationship attributes

Use junction tables when you need to query relationship attributes

Arrays store only values, not attributes. Problem: Can't query when tag was added, tag priority, tag metadata. Example: Product tags with created_at. Array: tags TEXT[] -- Can't store created_at. Junction table: products_tags (product_id, tag_id, created_at) -- Can query by created_at. When to use arrays: Simple lists (no attributes needed). When to use junction tables: Need to query attributes (created_at, priority, metadata). Real-world: GitHub uses junction tables for repository topics (can query when topic was added). Lesson: Arrays for simple lists, junction tables for relationships with attributes.

Use arrays for simple lists (no attributes). Use junction tables when you need to query relationship attributes (created_at, priority, etc.).

Medium

Can't query attributes (created_at, priority), limited flexibility