Real-World Window Function Patterns: Interview

Q: How would you detect gaps in sequential data, and why is proper partitioning crucial for this pattern?

Gap detection uses LAG() or LEAD() to compare current values with previous/next values in a sequence. Pattern: 1) Use LAG(date_column) OVER (PARTITION BY group_column ORDER BY date_column) to get previous date, 2) Calculate difference between current and previous dates, 3) Flag unusual gaps. Proper partitioning is crucial because without it, you compare across different logical groups. Example: analyzing customer order patterns - without PARTITION BY customer_id, you'd compare one customer's las

Q: What is cohort analysis and how do window functions enable this type of analysis?

Cohort analysis tracks groups of users over time to measure retention, engagement, or behavior changes. It groups users by a common characteristic (signup month) and tracks their behavior across subsequent periods. Window functions enable this by: 1) Using DATE_TRUNC to create cohort groups, 2) Calculating period numbers with date arithmetic, 3) Using COUNT(DISTINCT user_id) with appropriate partitioning to measure active users per cohort per period, 4) Calculating retention rates by comparing t

Module: Window Functions

Explain the top-N per group pattern and provide a real-world business scenario where you would use it.

The top-N per group pattern finds the best/worst performers within each category using RANK() or ROW_NUMBER() with PARTITION BY, then filtering the results. Business scenario: An e-commerce company wants to find the top 3 best-selling products in each category for their homepage. Query structure: 1) Use RANK() OVER (PARTITION BY category ORDER BY sales DESC) to rank products within categories, 2) Use CTE or subquery to filter WHERE rank <= 3, 3) This gives exactly 3 products per category. This pattern is used everywhere: top employees per department, highest revenue customers per region, most popular content per genre.

How would you detect gaps in sequential data, and why is proper partitioning crucial for this pattern?

Gap detection uses LAG() or LEAD() to compare current values with previous/next values in a sequence. Pattern: 1) Use LAG(date_column) OVER (PARTITION BY group_column ORDER BY date_column) to get previous date, 2) Calculate difference between current and previous dates, 3) Flag unusual gaps. Proper partitioning is crucial because without it, you compare across different logical groups. Example: analyzing customer order patterns - without PARTITION BY customer_id, you'd compare one customer's last order with another customer's first order, giving meaningless results. Always partition by the logical grouping (customer, product, account) to analyze sequences within each group.

What is cohort analysis and how do window functions enable this type of analysis?

Cohort analysis tracks groups of users over time to measure retention, engagement, or behavior changes. It groups users by a common characteristic (signup month) and tracks their behavior across subsequent periods. Window functions enable this by: 1) Using DATE_TRUNC to create cohort groups, 2) Calculating period numbers with date arithmetic, 3) Using COUNT(DISTINCT user_id) with appropriate partitioning to measure active users per cohort per period, 4) Calculating retention rates by comparing to initial cohort size. This analysis is crucial for subscription businesses, mobile apps, and any service tracking user lifecycle - it reveals which user groups are most valuable and when churn typically occurs.

Write a query to find the top 2 highest-paid employees in each department, including their rank and percentage of department payroll.

WITH department_rankings AS (

SELECT

employee_name,

department,

salary,

-- Rank within department

RANK() OVER (

PARTITION BY department

ORDER BY salary DESC

) as salary_rank,

-- Department total for percentage calculation

SUM(salary) OVER (PARTITION BY department) as dept_total_payroll,

-- Percentage of department payroll

ROUND(

100.0 * salary / SUM(salary) OVER (PARTITION BY department), 2

) as pct_of_dept_payroll

FROM employees

)

SELECT

department,

employee_name,

salary,

salary_rank,

pct_of_dept_payroll

FROM department_rankings

WHERE salary_rank <= 2

ORDER BY department, salary_rank;

This demonstrates the top-N per group pattern with additional percentage calculations, showing how to combine ranking with analytical functions for comprehensive reporting.

Create a query to analyze customer purchase patterns: identify customers with gaps longer than 60 days between orders and calculate their average order frequency.

WITH customer_gaps AS (

SELECT

customer_id,

order_date,