Table Partitioning Strategies: Concept
Module: Query Optimization & Performance
Table partitioning divides a large table into smaller, manageable pieces called partitions. Each partition is a separate physical table, but queries treat them as one logical table.
Key benefit: Partition pruning. When you query with a filter on the partition key, the database scans only relevant partitions, ignoring the rest. This can make queries 10-100x faster.
Example: Orders table partitioned by month. Query for January orders scans only January partition (8% of data), not all 12 months.
When to partition: Tables >100GB, time-series data, data with natural boundaries (date, region, category).
**Partition Types:**
**1. Range Partitioning**
Partition by continuous ranges (dates, numbers).
Use case: Time-series data (logs, orders, events)
Example:
- orders_2023_q1: Jan-Mar 2023
- orders_2023_q2: Apr-Jun 2023
- orders_2023_q3: Jul-Sep 2023
- orders_2023_q4: Oct-Dec 2023
Query: WHERE created_at >= '2023-04-01' AND created_at < '2023-07-01'
Pruning: Scans only orders_2023_q2 (25% of data)
**2. List Partitioning**
Partition by discrete values (categories, regions).
Use case: Data with natural categories
Example:
- orders_us: country = 'US'
- orders_eu: country IN ('UK', 'FR', 'DE')
- orders_asia: country IN ('JP', 'CN', 'IN')
- orders_other: all other countries
Query: WHERE country = 'US'
Pruning: Scans only orders_us partition
**3. Hash Partitioning**
Partition by hash of column value (distributes evenly).
Use case: Distribute data evenly, no natural boundaries
Example:
- users_p0: hash(user_id) % 4 = 0
- users_p1: hash(user_id) % 4 = 1
- users_p2: hash(user_id) % 4 = 2
- users_p3: hash(user_id) % 4 = 3
Query: WHERE user_id = 12345
Pruning: Scans only one partition (25% of data)
Note: Hash partitioning doesn't help range queries.
**Partition Pruning:**
Partition pruning is the optimizer's ability to skip irrelevant partitions.
Without pruning:
- Query scans all partitions