Ranking Functions Deep Dive: Interview

Module: Window Functions

Explain the key differences between ROW_NUMBER, RANK, and DENSE_RANK with a practical example.

These functions handle ties differently: ROW_NUMBER assigns unique sequential numbers (1,2,3,4,5,6) regardless of ties - perfect for pagination. RANK gives tied values the same rank but skips subsequent ranks (1,2,2,4,4,6) - perfect for competitions like Olympics where tied silver medalists mean no bronze. DENSE_RANK gives tied values the same rank but continues sequentially (1,2,2,3,3,4) - perfect for business tiers where you want continuous levels. Example: For salaries $100k, $90k, $90k, $80k - ROW_NUMBER gives ranks 1,2,3,4; RANK gives 1,2,2,4; DENSE_RANK gives 1,2,2,3.

flowchart TB

A["Sample: $100k, $90k, $90k, $80k"] --> B["ROW_NUMBER: 1,2,3,4"]

A --> C["RANK: 1,2,2,4"]

A --> D["DENSE_RANK: 1,2,2,3"]

B --> E["Use for: Pagination Unique IDs"]

C --> F["Use for: Competitions Contests"]

D --> G["Use for: Tiers Categories"]

When would you choose RANK over DENSE_RANK in a business scenario?

Choose RANK when gaps after ties make business sense, typically in competitive scenarios. Examples: sales contests where tied 2nd place means no 3rd place bonus tier; sports rankings where tied silver medalists mean the next medal is bronze; academic competitions where tied winners affect subsequent rankings. Choose DENSE_RANK for business categories where continuous levels matter: product tiers (Premium, Gold, Silver, Bronze), employee performance bands, or customer loyalty levels. The key question: "If two people tie for 2nd place, should the next person be 3rd (DENSE_RANK) or 4th (RANK)?"

flowchart LR

A["Business Decision"] --> B{"Do gaps after ties make sense?"}

B -->|Yes| C["RANK() Competitions Contests"]

B -->|No| D["DENSE_RANK() Tiers Categories"]

How do you optimize ranking function performance on large datasets?

Key optimizations: 1) Index PARTITION BY and ORDER BY columns in a composite index matching the exact order and direction. 2) Filter data with WHERE before ranking when possible. 3) Use ROW_NUMBER when you need unique positions (fastest). 4) Consider materialized views for frequently-used rankings. 5) Use QUALIFY clause for direct rank filtering (PostgreSQL). Example index: CREATE INDEX idx_sales ON sales(region, total_sales DESC) for RANK() OVER(PARTITION BY region ORDER BY total_sales DESC). This can improve performance from 45 seconds to 2 seconds on million-row tables.

flowchart TB

A["Large Dataset Performance"] --> B["1. Index Strategy"]

A --> C["2. Filter Early"]

A --> D["3. Choose Right Function"]

B --> E["Composite index: (partition_cols, order_cols)"]

C --> F["WHERE before ranking Reduce dataset size"]

D --> G["ROW_NUMBER fastest DENSE_RANK slowest"]

Explain how PARTITION BY affects ranking functions and provide a use case.

PARTITION BY creates separate ranking competitions within each group, resetting ranks for each partition. Without PARTITION BY, ranking is global across all rows. With PARTITION BY, each group gets its own ranking starting from 1. Use case: "Top 3 salespeople per region" - RANK() OVER(PARTITION BY region ORDER BY sales DESC) creates separate leaderboards for each region. Each region has its own 1st, 2nd, 3rd place. This is essential for fair comparisons within similar groups rather than global rankings that might be dominated by one high-performing group.

flowchart TB

subgraph "Without PARTITION BY"

A1["Global Ranking All employees together 1,2,3,4,5,6..."]

end

subgraph "With PARTITION BY region"

B1["East: 1,2,3... West: 1,2,3... North: 1,2,3..."]

end

C["Use Case: Fair regional comparisons"] --> B1

Write a query to find the top 3 highest-paid employees in each department, handling ties appropriately.

WITH ranked_employees AS (

SELECT

employee_name,

department,

salary,