Window Functions Fundamentals: Interview

Module: Window Functions

What is the fundamental difference between window functions and GROUP BY?

GROUP BY collapses rows into summary groups (fewer output rows), while window functions preserve all original rows and add calculated columns alongside existing data. GROUP BY is for summarization, window functions are for analysis with context. For example, GROUP BY gives you "total sales per month" (12 rows for a year), while window functions give you "each individual sale plus monthly totals" (all original rows preserved).

flowchart LR

subgraph "GROUP BY Behavior"

A1["1000 rows"] --> B1["GROUP BY dept"]

B1 --> C1["5 summary rows"]

C1 --> D1["Lost individual details"]

end

subgraph "Window Function Behavior"

A2["1000 rows"] --> B2["OVER(PARTITION BY dept)"]

B2 --> C2["1000 rows + dept totals"]

C2 --> D2["Preserved all details"]

end

Explain the OVER clause and its components.

The OVER clause defines the "window" of rows for calculation. It has three optional components: 1) PARTITION BY - divides data into groups (like GROUP BY but doesn't collapse rows), 2) ORDER BY - defines row sequence for cumulative calculations, 3) Frame specification - defines exact row range (ROWS/RANGE BETWEEN). Empty OVER() means entire result set. Example: OVER(PARTITION BY department ORDER BY salary) creates separate windows per department, ordered by salary.

flowchart TB

A["OVER Clause"] --> B["PARTITION BY (optional)"]

A --> C["ORDER BY (optional)"]

A --> D["Frame Spec (optional)"]

B --> E["Divides into groups<br/>Like GROUP BY but no collapse"]

C --> F["Defines row sequence<br/>Enables running calculations"]

D --> G["Specifies row range<br/>ROWS/RANGE BETWEEN"]

When should you use window functions instead of correlated subqueries?

Use window functions when you need multiple related calculations or when performance matters. Window functions are typically 10-50x faster than correlated subqueries because they execute once per query instead of once per row. They're also more readable and maintainable. Use correlated subqueries only when you need complex conditional logic that window functions cannot express. Example: Instead of multiple subqueries for rankings, averages, and totals, one query with multiple window functions is much more efficient.

flowchart TB

subgraph "Correlated Subquery (Slow)"

A1["Main Query Row 1"] --> B1["Execute Subquery"]

A2["Main Query Row 2"] --> B2["Execute Subquery"]

A3["Main Query Row N"] --> B3["Execute Subquery"]

B1 --> C1["N subquery executions"]

B2 --> C1

B3 --> C1

end

subgraph "Window Function (Fast)"

D1["Single Query"] --> E1["Window Function"]

E1 --> F1["One execution for all rows"]

end

What are the performance implications of window functions?

Window functions are generally efficient but require proper indexing. Index PARTITION BY and ORDER BY columns for optimal performance. They use more memory than simple queries due to sorting and partitioning. Multiple window functions in one query are processed together efficiently. Avoid window functions on very large result sets without proper WHERE filtering. They're much faster than equivalent correlated subqueries but slower than simple GROUP BY summaries.

flowchart TB