Skip to main content

Command Palette

Search for a command to run...

SQL Server - Full-Text Search

Published
5 min read

While standard indexes (B-Trees) are like the index at the back of a book (sorted lists), Full-Text Indexes are like a search engine (Google/Bing) built inside your database. They are designed to query large amounts of unstructured text.

1. The Problem: Why not use LIKE?

Standard SQL uses the LIKE operator for text searching.

  • Query: SELECT * FROM Products WHERE Description LIKE '%phone%'

  • The Issue: The leading wildcard (%) prevents the database from using a standard index. The engine must scan every single row in the table and read the text to see if it contains "phone."

  • The Limitation: It is literal. A search for "run" will not find "ran" or "running."

2. The Solution: The Inverted Index

Full-Text Search works by creating an Inverted Index. Instead of storing the full text, it breaks the text into "tokens" (words) and lists every location where that token appears.

Concept: If Row 1 contains: "The quick brown fox" The Inverted Index stores:

  • brown: Found in Row 1, position 3

  • fox: Found in Row 1, position 4

  • quick: Found in Row 1, position 2

  • (Note: "The" is usually ignored as a "Stop Word")


3. Setting it Up (The Architecture)

FTS requires a specific hierarchy of objects. You cannot just "add" it to a table; you must build the container first.

Step A: Create a Full-Text Catalog

This is a logical container for your full-text indexes.

CREATE FULLTEXT CATALOG FTCatalog AS DEFAULT;

Step B: Create the Full-Text Index

You define which columns are indexed.

  • Requirement: The table must have a unique, non-null key (usually the Primary Key).

  • Language: You specify the language (LCID) so SQL knows how to break words (e.g., English handles "driving" differently than German).

CREATE FULLTEXT INDEX ON dbo.ProductDocuments
(
    -- Column to index      -- Language ID (1033 is English)
    DocumentSummary         LANGUAGE 1033,
    DocumentBody            LANGUAGE 1033
)
KEY INDEX PK_ProductDocuments -- The Unique Key on the table
ON FTCatalog; -- The catalog created in Step A
WITH CHANGE_TRACKING AUTO; -- Automatically update index when data changes

4. Querying Strategies (The Predicates)

You do not use LIKE with FTS. You use specific predicates in the WHERE clause.

Used for exact matches, boolean logic, and proximity.

1. Boolean Logic (AND, OR, NOT) Find rows containing "monitor" AND "4k", but NOT "refurbished".

SELECT Name
FROM Products
WHERE CONTAINS(Description, '"monitor" AND "4k" AND NOT "refurbished"');

2. Prefix Search (Wildcards) Find "ride", "riding", "rider". (Note: The asterisk must be inside the quotes).

SELECT Name
FROM Products
WHERE CONTAINS(Description, '"rid*"');

3. Proximity Search (NEAR) This is a superpower of FTS. Find documents where "database" is near "failure" (e.g., within 10 words).

SELECT Title
FROM SupportTickets
WHERE CONTAINS(Notes, 'NEAR((database, failure), 10)');

4. Inflectional Generation (Stemming) Find different tenses of a verb. Searching for "Drive" will automatically find "Driving", "Drove", and "Driven".

SELECT Title
FROM BlogPosts
WHERE CONTAINS(Content, 'FORMSOF(INFLECTIONAL, drive)');

Used when you want the engine to "guess" the intent. It tokenizes the search string and finds matches based on meaning, not exact wording. It uses an internal thesaurus.

If you search for "Software Engineer":

  • CONTAINS looks for that exact phrase.

  • FREETEXT might find "Computer Programmer" or "Developer" depending on the thesaurus.

SELECT Title
FROM Resumes
WHERE FREETEXT(Summary, 'Software Engineer expert in SQL');
-- Matches rows containing "Developer", "SQL", "Expertise", "Coding", etc.

5. Ranking Results (Relevance)

In a Google search, you want the most relevant result at the top. CONTAINS only returns True/False. To get a "Relevance Score," use CONTAINSTABLE or FREETEXTTABLE.

Scenario: Search for "Performance Tuning" and sort by how often those words appear in the document.

SELECT
    T.Title,
    T.Body,
    FT_Table.RANK -- The calculated relevance score (0 to 1000)
FROM Documents AS T
INNER JOIN CONTAINSTABLE(Documents, Body, 'Performance OR Tuning') AS FT_Table
    ON T.ID = FT_Table.[KEY] -- Join back to the main table
ORDER BY FT_Table.RANK DESC;

6. Advanced Capability: Indexing Binary Files (PDFs, Word)

SQL Server can index text inside binary files (stored in VARBINARY(MAX) columns) if you store the file extension in a separate column.

How it works: SQL Server uses iFilters (installed on the OS). If it sees a .pdf extension, it uses the Adobe iFilter to extract the text from the binary blob and index it.

CREATE FULLTEXT INDEX ON dbo.Documents
(
    FileContent                         -- The binary column
    TYPE COLUMN FileExtension           -- The column holding '.pdf', '.docx'
    LANGUAGE 1033
)
KEY INDEX PK_Documents;
  • Result: You can run CONTAINS on a PDF file stored in the database!

7. Maintenance & Edge Cases

Stop Lists (Noise Words)

FTS ignores common words to save space (e.g., "the", "is", "at", "www").

  • Issue: If a user searches WHERE CONTAINS(Text, 'at'), they get zero results.

  • Fix: You can create custom STOPLISTS if you need to index specific system words usually considered noise.

The "Updates" Lag

Unlike standard indexes which update synchronously (immediately) when you insert a row, FTS is often asynchronous.

  • There might be a slight delay (seconds) between inserting a row and it appearing in search results, as the "IFilter Daemon" processes the text in the background.

Thesaurus Files

You can edit XML files on the server to define synonyms.

  • Example: Map "IE", "Explorer", and "Edge" to "Browser".

  • If a user searches "Browser" using FREETEXT, it will find rows containing "IE".

More from this blog

S

SQL Insights

31 posts