SQL Practice Logo

SQLPractice Online

PARTITION BY vs GROUP BY: Real-World

Module: Window Functions

This is the most fundamental decision in SQL analytics - whether to collapse data into summaries or preserve individual rows with analytical context. Netflix uses GROUP BY to create daily viewing summaries for executives (total hours watched per genre), but PARTITION BY to show individual user sessions with genre averages for personalization algorithms. Amazon uses GROUP BY for sales reports (revenue per product category), but PARTITION BY for individual order analysis with category context for recommendations. Goldman Sachs uses GROUP BY for regulatory reporting (total trading volume per desk), but PARTITION BY for individual trade analysis with desk performance context for trader evaluation. The choice determines whether you lose or preserve granular data - a critical business decision.

Netflix Content Strategy - Executive vs Operational Analytics

Netflix needs both high-level content performance summaries for executives and detailed viewer analytics for content recommendation algorithms. Executives need genre performance summaries for strategic decisions, while recommendation engines need individual viewing sessions with genre context for personalization.

-- EXECUTIVE DASHBOARD: GROUP BY for strategic insights

-- "How are our content genres performing overall?"

SELECT

content_genre,

COUNT(DISTINCT content_id) as total_titles,

COUNT(*) as total_viewing_sessions,

SUM(watch_duration_minutes) as total_watch_time,

AVG(watch_duration_minutes) as avg_session_length,

COUNT(DISTINCT user_id) as unique_viewers,

-- Strategic metrics

SUM(watch_duration_minutes) / COUNT(DISTINCT content_id) as avg_watch_time_per_title,

COUNT(*) / COUNT(DISTINCT content_id) as avg_sessions_per_title,

-- Performance categorization

CASE

WHEN AVG(watch_duration_minutes) > 45 THEN 'High Engagement'

WHEN AVG(watch_duration_minutes) > 25 THEN 'Medium Engagement'

ELSE 'Low Engagement'

END as engagement_category

FROM viewing_sessions

WHERE session_date >= CURRENT_DATE - INTERVAL '30 days'

GROUP BY content_genre

HAVING COUNT(*) >= 1000 -- Only genres with significant volume

ORDER BY total_watch_time DESC;

-- RECOMMENDATION ENGINE: PARTITION BY for personalization

-- "How does each viewing session compare to genre patterns?"

SELECT

user_id,

content_id,

content_genre,

watch_duration_minutes,

session_date,

user_age_group,

-- Genre context for recommendations

AVG(watch_duration_minutes) OVER (PARTITION BY content_genre) as genre_avg_duration,

STDDEV(watch_duration_minutes) OVER (PARTITION BY content_genre) as genre_duration_stddev,

-- User behavior within genre

AVG(watch_duration_minutes) OVER (PARTITION BY user_id, content_genre) as user_genre_avg,