PARTITION BY vs GROUP BY: Concept
Module: Window Functions
Think of GROUP BY and PARTITION BY as two different ways to organize a library. GROUP BY is like creating summary cards - "Fiction: 500 books, average 300 pages" - you get the big picture but lose individual book details. PARTITION BY is like adding shelf tags to every book - each book keeps its title, author, and pages, but also shows "Fiction section average: 300 pages." GROUP BY answers "What's the summary?" while PARTITION BY answers "How does this individual item compare to its group?" The choice depends on whether you need to see the forest (GROUP BY) or the trees with forest context (PARTITION BY).
**
**GROUP BY - The Data Summarizer:**
GROUP BY collapses rows into groups, creating summary statistics. It's perfect when you only need aggregate information.
```sql
-- GROUP BY: Creates summary rows
SELECT
department,
COUNT(*) as employee_count,
AVG(salary) as avg_salary,
SUM(salary) as total_payroll,
MIN(salary) as min_salary,
MAX(salary) as max_salary
FROM employees
GROUP BY department;
-- Result: 3 rows (one per department)
-- Sales | 2 | 55000 | 110000 | 50000 | 60000
-- IT | 2 | 75000 | 150000 | 70000 | 80000
-- HR | 1 | 45000 | 45000 | 45000 | 45000
```
**PARTITION BY - The Context Provider:**
PARTITION BY keeps all rows but adds group-level calculations to each row. Perfect for comparative analysis.
```sql
-- PARTITION BY: Keeps all rows, adds context
SELECT
employee_name,
department,
salary,
COUNT(*) OVER (PARTITION BY department) as dept_size,
AVG(salary) OVER (PARTITION BY department) as dept_avg,
salary - AVG(salary) OVER (PARTITION BY department) as vs_dept_avg,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
-- Result: 5 rows (all original employees with context)
-- Alice | Sales | 50000 | 2 | 55000 | -5000 | 2
-- Bob | Sales | 60000 | 2 | 55000 | 5000 | 1
-- Carol | IT | 70000 | 2 | 75000 | -5000 | 2
-- Dave | IT | 80000 | 2 | 75000 | 5000 | 1
-- Eve | HR | 45000 | 1 | 45000 | 0 | 1
```