SQL Server - Query Optimization Techniques
Optimizing SQL Server queries is a balance of art and science. It involves reducing I/O operations, minimizing CPU usage, and managing memory effectively.
I. Data Retrieval & Selection Strategies
The most common performance bottlenecks occur simply because the query asks for too much data.
1. Avoid SELECT *
Fetching all columns consumes unnecessary I/O and network bandwidth. It also prevents the use of "Covering Indexes" (indexes that contain all the data required for a query).
Bad Practice:
SELECT * FROM Sales.Orders WHERE OrderDate > '2023-01-01';Best Practice:
SELECT OrderID, CustomerID, TotalDue FROM Sales.Orders WHERE OrderDate > '2023-01-01';
2. Use TOP or OFFSET FETCH (Pagination)
If you only need a sample or the first $N$ rows, do not retrieve the whole dataset.
Edge Case: Using
TOPwithout anORDER BYis non-deterministic (you will get random rows). Always define a sort order.Example:
-- Standard Pagination SELECT OrderID, TotalDue FROM Sales.Orders ORDER BY OrderDate DESC OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
3. UNION ALL vs. UNION
UNION removes duplicate records, which requires a hidden sorting operation (expensive). UNION ALL simply concatenates datasets.
- Optimization: If you know the datasets do not overlap, or you don't care about duplicates, always use
UNION ALL.
II. SARGability (Search ARGument ABle)
This is the most critical concept for Indexing. For an index to be used (an "Index Seek"), the column in the WHERE clause must be left alone.
4. Avoid Functions on Indexed Columns
wrapping a column in a function prevents SQL Server from using the index b-tree effectively, forcing an "Index Scan" (reading every row).
Bad Practice (Non-SARGable):
-- SQL has to calculate YEAR() for every single row in the table SELECT * FROM Employees WHERE YEAR(HireDate) = 2023;Best Practice (SARGable):
-- SQL can jump directly to the start of the range SELECT * FROM Employees WHERE HireDate >= '2023-01-01' AND HireDate < '2024-01-01';
5. Avoid Leading Wildcards
Bad Practice:
WHERE Name LIKE '%Smith'(Must scan the whole table).Best Practice:
WHERE Name LIKE 'Smith%'(Can use the index alphabetically).
6. Watch for Implicit Conversions
If you compare a VARCHAR column to an NVARCHAR parameter (Unicode), SQL Server must convert the entire column to match the parameter, ignoring the index.
Edge Case:
DECLARE @Phone NVARCHAR(20) = '555-0199'; -- If PhoneNumber column is VARCHAR, this causes an implicit conversion scan SELECT * FROM Customers WHERE PhoneNumber = @Phone;Fix: Ensure variable data types match column data types exactly.
III. Logical Operators & Joins
7. EXISTS vs. IN
For checking existence, EXISTS is often faster because it short-circuits (stops looking once it finds a match). IN can be risky with NULL values.
Edge Case (The
NOT INTrap): If the subquery insideNOT INreturns even a singleNULL, the entire query returns zero rows.Best Practice:
SELECT * FROM Orders o WHERE EXISTS (SELECT 1 FROM Customers c WHERE c.CustomerID = o.CustomerID AND c.Region = 'North');
8. Join Elimination & Subqueries
Modern SQL optimizers are good at rewriting subqueries as joins, but explicit Joins are generally preferred for readability and ensuring the optimizer doesn't get confused by deep nesting.
- Optimization: Avoid joining tables you don't need columns from if you just want to filter. Use
EXISTSinstead.
9. Avoid DISTINCT as a "Band-aid"
Developers often use DISTINCT to fix a query that is returning duplicates due to a bad JOIN (e.g., a one-to-many join where you only wanted the "one" side). This causes an expensive Sort operation. Fix the Join logic instead.
IV. Indexing Architecture
10. The Covering Index
This is the "Holy Grail" of optimization. If an index contains (includes) all columns requested in the SELECT list, SQL Server never has to touch the actual table (Heap/Clustered Index). It serves data entirely from the smaller index file.
Example:
-- Query SELECT EmailAddress FROM Customers WHERE LastName = 'Jones'; -- Optimized Index CREATE NONCLUSTERED INDEX IX_Customers_LastName ON Customers (LastName) INCLUDE (EmailAddress); -- 'EmailAddress' is stored at the leaf level
11. Clustered vs. Non-Clustered
Clustered: The physical sorting of the data (usually Primary Key).
Non-Clustered: A separate look-up list.
Tip: If you frequently sort by a column (e.g.,
DateCreated), consider if that should be your Clustered Index, or if it needs a Non-Clustered index to support sorting.
V. Advanced & Server-Level Edge Cases
12. Parameter Sniffing
Sometimes SQL Server creates an execution plan based on the first parameter passed (e.g., a Date range with 1 row). Later, you pass a parameter that returns 1,000,000 rows, but it uses the "small" plan, crashing performance.
Solution:
-- Forces a new plan to be generated for every execution SELECT * FROM Sales WHERE Region = @Region OPTION (RECOMPILE);
13. Transactions and Locking (NOLOCK)
Locking blocks other users. If you are running a heavy reporting query and data precision (exact real-time consistency) isn't 100% required, use NOLOCK.
Warning: This allows "Dirty Reads" (reading data currently being updated but not committed).
SELECT * FROM HugeTable WITH (NOLOCK) WHERE ID > 1000;
14. Temporary Tables vs. Table Variables
Table Variables (
@Table): Stored in memory (mostly). Good for small datasets (<100 rows). Do not have statistics, so the optimizer assumes they always have 1 row (bad for joins).Temp Tables (
#Table): Stored intempdb. Good for larger datasets. Can be indexed and have statistics.
Summary Checklist for Optimization
| Technique | Area | Impact |
| Select Specific Columns | I/O | High |
| Remove Leading Wildcards | Indexing | High |
| Fix Implicit Conversions | CPU/Index | High |
Use EXISTS over IN | Logic | Medium |
| Covering Indexes (INCLUDE) | I/O | Very High |
| SARGable Dates | CPU/Index | Very High |