PostgreSQL: Arrays & JSONB: Interview

Q: Explain the difference between JSON and JSONB. When would you use each?

JSON vs JSONB difference is storage format and performance. JSON: Text format (stores as string), must parse on every query (slow), preserves formatting/whitespace/key order, not indexable. JSONB: Binary format (pre-parsed), no parsing needed (2-3x faster), removes formatting/whitespace, reorders keys, indexable with GIN. Performance: JSON: Parse on every query (100ms for 1M rows). JSONB: Pre-parsed binary (30ms for 1M rows, 3x faster). Indexing: JSON: Not indexable (sequential scan). JSONB: GIN

Q: How do GIN indexes work for arrays and JSONB? What queries do they speed up?

GIN (Generalized Inverted Index) is inverted index that maps values to rows (like book index). How it works: (1) Index creation: Extract all values from arrays/JSONB, create mapping (value -> rows). (2) Query execution: Look up value in index, get matching rows (fast). (3) Index structure: B-tree of values, each value points to list of rows. Example: products table with tags array. Without GIN: Sequential scan (check every row). With GIN: Index scan (look up "electronics" in index, get matching

Module: Database-Specific Features

When would you use arrays vs junction tables? What are the trade-offs?

Arrays vs junction tables decision depends on data complexity and query patterns. Use arrays when: (1) Simple lists (tags, categories, phone numbers), (2) Small lists (< 100 items), (3) No relationship attributes needed, (4) Read-heavy workloads (no frequent updates), (5) Order matters (arrays preserve order). Use junction tables when: (1) Complex relationships (many-to-many), (2) Large lists (> 100 items), (3) Need to query relationship attributes (created_at, priority), (4) Write-heavy workloads (frequent add/remove), (5) Need flexible queries (JOIN with other tables). Trade-offs: Arrays: 2-3x faster (no JOIN), simpler schema (1 table vs 3 tables), but limited queries (can't query attributes). Junction tables: Slower (requires JOIN), complex schema (3 tables), but flexible queries (can query attributes, JOIN with other tables). Example: Product tags. Arrays: Fast queries (no JOIN), but can't query when tag was added. Junction tables: Slower queries (JOIN), but can query tag metadata. Real-world: Amazon uses arrays for product categories (< 50 categories, no attributes). Uses junction tables for product reviews (1000+ reviews, need to query review date/rating). Lesson: Arrays for small simple lists, junction tables for large complex relationships.

Explain the difference between JSON and JSONB. When would you use each?

JSON vs JSONB difference is storage format and performance. JSON: Text format (stores as string), must parse on every query (slow), preserves formatting/whitespace/key order, not indexable. JSONB: Binary format (pre-parsed), no parsing needed (2-3x faster), removes formatting/whitespace, reorders keys, indexable with GIN. Performance: JSON: Parse on every query (100ms for 1M rows). JSONB: Pre-parsed binary (30ms for 1M rows, 3x faster). Indexing: JSON: Not indexable (sequential scan). JSONB: GIN indexes enable fast queries (10-100x faster). Storage: JSON: Larger (preserves formatting). JSONB: Smaller (compressed binary). Use JSON when: (1) Need to preserve exact formatting/whitespace (rare), (2) Need to preserve key order (rare), (3) Write-once, never query (logs). Use JSONB when: (1) Need to query data (99% of cases), (2) Need fast queries, (3) Need to index data. Real-world: Stripe uses JSONB for metadata (2-3x faster, indexable). Almost all production systems use JSONB, not JSON. Lesson: Always use JSONB unless you have specific reason to preserve formatting (rare).

How do GIN indexes work for arrays and JSONB? What queries do they speed up?

GIN (Generalized Inverted Index) is inverted index that maps values to rows (like book index). How it works: (1) Index creation: Extract all values from arrays/JSONB, create mapping (value -> rows). (2) Query execution: Look up value in index, get matching rows (fast). (3) Index structure: B-tree of values, each value points to list of rows. Example: products table with tags array. Without GIN: Sequential scan (check every row). With GIN: Index scan (look up "electronics" in index, get matching rows). Performance: Without GIN: 1000ms for 1M rows (sequential scan). With GIN: 10ms for 1M rows (index scan, 100x faster). Queries sped up by GIN: Arrays: @> (contains), && (overlaps), ANY (exists). JSONB: -> (get key), @> (contains), ? (key exists), ?| (any key), ?& (all keys). Queries NOT sped up: Array element access (array[1]), JSONB path traversal without operators. Trade-offs: GIN indexes: Fast queries (10-100x), but slower updates (index must be updated), larger storage (index size). Real-world: GitLab uses GIN indexes on issue metadata (10x faster queries). Lesson: Always create GIN indexes on arrays/JSONB columns for fast queries.

Design a schema for an e-commerce product catalog that stores: (1) Product tags (electronics, portable, etc.), (2) Product specifications (flexible schema, varies by category). Use appropriate PostgreSQL types.

-- Design: Use arrays for tags, JSONB for specifications

CREATE TABLE products (

product_id SERIAL PRIMARY KEY,

name VARCHAR(200) NOT NULL,

description TEXT,

price DECIMAL(10, 2) NOT NULL,

-- Arrays for tags (simple list, < 100 items)

tags TEXT[],

-- JSONB for specifications (flexible schema, varies by category)

specifications JSONB,

created_at TIMESTAMP DEFAULT NOW(),

updated_at TIMESTAMP DEFAULT NOW()

);

-- Create GIN indexes for fast queries

CREATE INDEX idx_products_tags ON products USING GIN (tags);

CREATE INDEX idx_products_specifications ON products USING GIN (specifications);

-- Insert laptop with tags and specifications

INSERT INTO products (name, price, tags, specifications)

VALUES (

'MacBook Pro 16"',

2499.00,

ARRAY['electronics', 'computers', 'portable', 'apple'],

"brand": "Apple",

"processor": "M3 Pro",

"ram": "16GB",

"storage": "512GB SSD",

"screen_size": "16 inch",

"weight": "2.1 kg"

);

-- Insert phone with different specifications

INSERT INTO products (name, price, tags, specifications)