SELECT & Data Retrieval: Concept
Module: SQL Fundamentals
SELECT retrieves data from database tables. Think of it as asking the database a question: "Show me these columns from this table." The database returns rows that match your request. Understanding the projection phase - how databases read full rows then project only requested columns - is key to writing efficient queries.
SELECT is deceptively simple but has important nuances:
**Basic Structure:**
SELECT column1, column2 FROM table_name;
**Key Concepts:**
1. **Column Selection**: Specify exactly which columns you need
2. **Wildcard (*)**: Selects all columns - convenient but dangerous in production
3. **Aliases**: Rename columns in output using AS keyword
4. **Expressions**: Calculate values, concatenate strings, format data
5. **DISTINCT**: Remove duplicate rows from results
**Execution Flow & Projection Phase:**
1. Database locates the table
2. Scans rows (full table scan or index scan)
3. **Reads full rows from disk** (all columns, even if not selected)
4. **Projection phase**: Filters to requested columns in memory
5. Returns result set to client
**Critical Insight**: Database reads entire rows from disk, then projects columns. However, with covering indexes, only index columns are read (10-100x faster).
**Performance Impact - Real Metrics:**
- **SELECT * on 30-column table**: Transfers 10x more data than 3-column SELECT
- **Example**: 1M rows, 30 columns (10KB/row) = 10GB transferred
- **Optimized**: 1M rows, 3 columns (1KB/row) = 1GB transferred
- **Result**: 10x less I/O, 10x less network bandwidth, 10x faster query
- **Cost**: At AWS pricing, 10GB transfer = $0.90, 1GB = $0.09 (10x savings)
- Large text/blob columns significantly slow queries if not needed
**Cross-Database LIMIT Syntax:**
- **MySQL/PostgreSQL/SQLite**: LIMIT n OFFSET m
- **SQL Server**: TOP n (no OFFSET in older versions), OFFSET-FETCH in 2012+
- **Oracle**: FETCH FIRST n ROWS ONLY (12c+), ROWNUM (legacy)
- **Standard SQL**: FETCH FIRST n ROWS ONLY (ANSI SQL:2008)
SELECT is the #1 most used SQL command (95% of all queries). Mastering it is essential for data analysts, backend engineers, BI developers, and anyone working with databases. Poor SELECT queries are the leading cause of slow applications. Companies like Netflix and Amazon optimize every SELECT query because milliseconds matter at billions of requests per day.
Every database interaction starts with SELECT. Used to generate reports, populate dashboards, export data, validate records, and power every application that reads from a database. A typical web application makes 100-1000 SELECT queries per second. At scale, SELECT * on a 30-column table transfers 10x more data than selecting 3 needed columns - costing thousands in bandwidth and latency.