Window Frames (ROWS vs RANGE): Interview
Module: Window Functions
Explain the fundamental difference between ROWS and RANGE frame types. Give a practical example where each would be appropriate.
ROWS counts physical database rows regardless of their values - "give me exactly the 5 rows before this one". RANGE groups by ORDER BY values - "give me all rows with ORDER BY values in this range, including ties". Example: For a 7-day moving average of stock prices, use ROWS BETWEEN 6 PRECEDING AND CURRENT ROW to get exactly 7 trading days. For daily sales totals where multiple orders occur per day, use RANGE BETWEEN CURRENT ROW AND CURRENT ROW to group all orders on the same date. ROWS is predictable and fast, RANGE handles business logic requiring tie grouping.
How do ROWS and RANGE handle tied values differently, and why does this matter for business applications?
ROWS ignores tied values and counts physical rows in processing order. RANGE includes ALL rows with tied ORDER BY values as a group. This matters because: 1) Financial moving averages need exact day counts (ROWS), 2) Daily business summaries need all same-day transactions grouped (RANGE), 3) ROWS can arbitrarily split tied values, 4) RANGE ensures logical business grouping. Example: If 3 orders occur on Jan 1st, ROWS BETWEEN 1 PRECEDING AND CURRENT ROW might include 2 of the 3 orders arbitrarily, while RANGE includes all 3 orders logically. Wrong choice leads to incorrect business calculations.
What are the performance implications of ROWS vs RANGE, and how would you optimize queries using each?
ROWS is 20-40% faster because it uses simple row counting. RANGE is slower because it must compare ORDER BY values to detect ties. Optimization strategies: 1) Use ROWS for moving averages and sequential analysis, 2) Index ORDER BY columns for RANGE queries, 3) Use covering indexes to avoid key lookups, 4) Pre-aggregate data for complex RANGE operations, 5) Avoid RANGE on high-cardinality unique columns, 6) Monitor memory usage - RANGE uses more memory for tie detection. For large datasets, consider pre-aggregating daily data then using ROWS instead of RANGE on raw transactions.
When would you choose RANGE over ROWS despite the performance cost?
Choose RANGE when business logic requires logical grouping of tied values: 1) Daily/monthly summaries where multiple events per period must be grouped, 2) Regulatory reporting requiring all same-timestamp transactions together, 3) Time-based analysis with INTERVAL (PostgreSQL), 4) Quality control where all items in same batch must be analyzed together, 5) Financial reporting where all trades at same price must be grouped. The key is when splitting tied values would violate business rules or produce incorrect results. Performance cost is justified when correctness requires tie-inclusive behavior.
How do you ensure deterministic results when using ROWS with tied ORDER BY values?
Add secondary sort columns to break ties: ORDER BY date, product_id instead of just ORDER BY date. Without tie-breakers, ROWS can return different physical row orders on each execution, causing inconsistent results. Best practices: 1) Include unique columns (ID, timestamp with microseconds) in ORDER BY, 2) Use multiple columns to create deterministic ordering, 3) Test with tied data to verify consistency, 4) Document sort logic for maintenance. Example: ORDER BY order_date, customer_id, order_id ensures consistent row ordering even when multiple orders occur on the same date.
Write two queries for the same dataset: one using ROWS for exact 5-day moving average, another using RANGE for daily summaries. Explain when each is appropriate.
-- ROWS: Exact 5-day moving average (financial analysis)
SELECT
trade_date,
closing_price,
AVG(closing_price) OVER (
ORDER BY trade_date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as sma_5_days,
-- Always exactly 5 days (or fewer at start)
COUNT(*) OVER (
ORDER BY trade_date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) as days_in_average
FROM stock_prices
ORDER BY trade_date;
-- RANGE: Daily summaries (business reporting)
SELECT
trade_date,
closing_price,
-- All trades on same date
COUNT(*) OVER (
ORDER BY trade_date
RANGE BETWEEN CURRENT ROW AND CURRENT ROW
) as trades_per_day,
-- Daily high/low (all same-date trades)
MAX(closing_price) OVER (
ORDER BY trade_date
RANGE BETWEEN CURRENT ROW AND CURRENT ROW
) as daily_high