PARTITION BY vs GROUP BY: Real-World
Module: Window Functions
This is the most fundamental decision in SQL analytics - whether to collapse data into summaries or preserve individual rows with analytical context. Netflix uses GROUP BY to create daily viewing summaries for executives (total hours watched per genre), but PARTITION BY to show individual user sessions with genre averages for personalization algorithms. Amazon uses GROUP BY for sales reports (revenue per product category), but PARTITION BY for individual order analysis with category context for recommendations. Goldman Sachs uses GROUP BY for regulatory reporting (total trading volume per desk), but PARTITION BY for individual trade analysis with desk performance context for trader evaluation. The choice determines whether you lose or preserve granular data - a critical business decision.
Netflix Content Strategy - Executive vs Operational Analytics
Netflix needs both high-level content performance summaries for executives and detailed viewer analytics for content recommendation algorithms. Executives need genre performance summaries for strategic decisions, while recommendation engines need individual viewing sessions with genre context for personalization.
-- EXECUTIVE DASHBOARD: GROUP BY for strategic insights
-- "How are our content genres performing overall?"
SELECT
content_genre,
COUNT(DISTINCT content_id) as total_titles,
COUNT(*) as total_viewing_sessions,
SUM(watch_duration_minutes) as total_watch_time,
AVG(watch_duration_minutes) as avg_session_length,
COUNT(DISTINCT user_id) as unique_viewers,
-- Strategic metrics
SUM(watch_duration_minutes) / COUNT(DISTINCT content_id) as avg_watch_time_per_title,
COUNT(*) / COUNT(DISTINCT content_id) as avg_sessions_per_title,
-- Performance categorization
CASE
WHEN AVG(watch_duration_minutes) > 45 THEN 'High Engagement'
WHEN AVG(watch_duration_minutes) > 25 THEN 'Medium Engagement'
ELSE 'Low Engagement'
END as engagement_category
FROM viewing_sessions
WHERE session_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY content_genre
HAVING COUNT(*) >= 1000 -- Only genres with significant volume
ORDER BY total_watch_time DESC;
-- RECOMMENDATION ENGINE: PARTITION BY for personalization
-- "How does each viewing session compare to genre patterns?"
SELECT
user_id,
content_id,
content_genre,
watch_duration_minutes,
session_date,
user_age_group,
-- Genre context for recommendations
AVG(watch_duration_minutes) OVER (PARTITION BY content_genre) as genre_avg_duration,
STDDEV(watch_duration_minutes) OVER (PARTITION BY content_genre) as genre_duration_stddev,
-- User behavior within genre
AVG(watch_duration_minutes) OVER (PARTITION BY user_id, content_genre) as user_genre_avg,