PostgreSQL: Advanced Data Types: Interview

Module: Database-Specific Features

When would you use UUID vs INT for primary keys? What are the trade-offs?

UUID vs INT decision depends on system architecture. Use UUID when: (1) Distributed systems (multiple databases, no coordination needed), (2) Public IDs (hide sequential IDs from users), (3) Merging databases (no ID conflicts), (4) Client-side generation (generate before insert). Use INT when: (1) Single database (no distribution), (2) Performance critical (INT joins are faster), (3) Sequential IDs acceptable, (4) Storage critical (INT is 4x smaller). Trade-offs: UUID: 16 bytes vs INT: 4 bytes (4x larger). UUID joins are slower (random order) vs INT (sequential, cache-friendly). UUID generation is fast (no database round-trip) vs INT (requires database). Real-world: Instagram uses UUID for photo IDs (distributed across 1000+ shards, no coordination). GitHub uses INT for repository IDs (single database, sequential IDs). Lesson: Use UUID for distributed systems, INT for single database.

Explain how range types with exclusion constraints prevent double-booking. Why is this better than application-level validation?

Range types with exclusion constraints prevent overlaps at database level. How it works: (1) TSRANGE stores time range [start, end), (2) Exclusion constraint: EXCLUDE USING GIST (user_id WITH =, time_slot WITH &&), (3) Database checks overlap on INSERT, (4) If overlap exists, INSERT fails. Why better than application: (1) No race conditions (database serializes checks), (2) No application-level locking needed, (3) Simpler code (database handles validation). Example: Scheduling app. Application-level: Request 1 checks (no overlap), Request 2 checks (no overlap), both insert (double-booking). Database-level: Request 1 inserts (success), Request 2 inserts (fails, overlap detected). Real-world: Calendly uses exclusion constraints for appointment scheduling (no race conditions, database-level validation). Lesson: Use exclusion constraints for scheduling, prevents race conditions at database level.

When should you use ENUM vs VARCHAR with CHECK constraint for status fields?

ENUM vs VARCHAR with CHECK decision depends on type safety needs. Use ENUM when: (1) Fixed set of values (status, priority, category), (2) Type safety important (can't insert invalid value), (3) Efficient storage needed (4 bytes vs 20+ bytes), (4) Values rarely change. Use VARCHAR with CHECK when: (1) Values change frequently (ENUM requires ALTER TYPE), (2) Need to store arbitrary strings, (3) Portability important (ENUM is PostgreSQL-specific). Trade-offs: ENUM: Type-safe (database rejects invalid values), 4 bytes storage, integer comparison (fast). VARCHAR with CHECK: Not type-safe (can bypass with ALTER TABLE), 20+ bytes storage, string comparison (slower). Example: Order status (pending, processing, shipped). ENUM: 4 bytes, type-safe. VARCHAR(20): 20+ bytes, not type-safe. Real-world: Shopify uses ENUM for order status (type-safe, efficient storage). Lesson: Use ENUM for fixed status fields, VARCHAR for dynamic values.

Design a schema for a scheduling app that prevents double-booking. Use appropriate PostgreSQL types.

-- Enable btree_gist extension (required for exclusion constraint)

CREATE EXTENSION btree_gist;

-- Create ENUM for appointment status

CREATE TYPE appointment_status AS ENUM (

'scheduled',

'confirmed',

'cancelled',

'completed'

);

-- Create appointments table with TSRANGE

CREATE TABLE appointments (

appointment_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

user_id UUID NOT NULL,

client_name VARCHAR(100),

time_slot TSRANGE NOT NULL,

status appointment_status DEFAULT 'scheduled',

created_at TIMESTAMP DEFAULT NOW(),

-- Exclusion constraint: Prevent overlapping appointments

EXCLUDE USING GIST (

user_id WITH =,

time_slot WITH &&

)

);

-- Create index for fast queries

CREATE INDEX idx_appointments_user ON appointments(user_id);

CREATE INDEX idx_appointments_time ON appointments USING GIST (time_slot);

-- Insert appointment (success)

INSERT INTO appointments (user_id, client_name, time_slot)

VALUES (

'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11',

'John Doe',

'[2024-01-15 10:00, 2024-01-15 11:00)'

);