Step-by-Step: Enabling Semantic Search on SQL Server 2012

Semantic Search is a powerful new addition to SQL Server 2012. Semantic Search builds upon the full-text index feature and adds the ability for SQL Server to determine key words in documents. Uses can include analysis of text to determine similarity between two documents, extract key words from documents, or find documents that contain key words.

Step-by-step, enabling Semantic Search on a column in SQL Server 2012 involves:

  1. Install the Full-Text and Semantic Extractions for Search feature
  2. Install the Microsoft Office 2010 Filter Packs and Service Pack 1
  3. Install, attach and register the semantic language database
  4. Create a full-text catalog
  5. Create a full-text index with the Statistical_Semantics option enabled

1. Install the Full-Text and Semantic Extractions for Search Feature

If you already enabled the Full-Text and Semantic Extractions for Search feature of the Database Engine during SQL Server 2012 setup, no additional actions are necessary.

If you did not choose to include that feature, you will need to use the SQL Server Installation Center or the installation media to launch SQL Server setup again and select that feature:

Figure 1: SQL Server 2012 Setup screen showing Full-Text and Semantic Extractions for Search

2. Install the Microsoft Office 2010 Filter Packs and Service Pack 1

The Microsoft Office 2010 Filter Packs contain IFilters (the DLLs used by SQL Server’s full-text indexing to extract text from various file formats, including those from Office 2010, but also other formats) that are not included by default. Without these additional IFilters, SQL Server Full-Text (and by extension, Semantic Search) only understands 50 file formats. After adding the additional filter packs, the number goes up to 157 formats.

The Microsoft Office 2010 Filter Packs can be downloaded from http://www.microsoft.com/download/en/details.aspx?id=17062 and their Service Pack 1 from http://www.microsoft.com/download/en/details.aspx?id=26606 (for 32-bit operating systems) or http://www.microsoft.com/download/en/details.aspx?id=26604 (for 64-bit operating systems).

If you want to index and search PDF files, you will need the Adobe IFilter from http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025.

After installing the filter packs, you will need to execute a few T-SQL commands to actually let SQL Server know that additional IFilters are available:

EXEC sp_fulltext_service @action='load_os_resources', @value=1;
EXEC sp_fulltext_service 'update_languages';
EXEC sp_fulltext_service 'restart_all_fdhosts'; 

After running these commands, you can obtain a list of supported formats using this SQL command:

EXEC sp_help_fulltext_system_components 'filter'

3. Install, Attach and Register the Semantic Language Database

The semantic analysis performed by SQL Server is essentially statistical analysis of the words in the column contents that have already been indexed by Full-Text Indexing. This statistical analysis relies on base data which is provided as a SQL Server database. This database however is not installed on your system by default; it requires executing a separate installation as well as manually attaching the database files to your SQL Server instance.

The installation is found on the SQL Server installation media in a folder for your processor type:

  • For x86, the setup is \x86\Setup\SemanticLanguageDatabase.msi
  • For x64, the setup is \x64\Setup\SemanticLanguageDatabase.msi

The setup file will simply extract the MDF and LDF database files to a location of your choice. After the installation, you will attach the database to your SQL Server instance. You would do this like you would any other database file. You can use SSMS or T-SQL script.

Finally, you must register that database as the language statistics database using the following T-SQL command:

EXEC sp_fulltext_semantic_register_language_statistics_db @dbname = '_SemanticsDB'

4. Create a Full-Text Catalog

For each database where you want to perform Full-Text Indexing and semantic search, you must create at least one full-text catalog. This T-SQL command creates a Full-Text Catalog and sets it as the default catalog for new Full-Text Indices:

CREATE FULLTEXT CATALOG CatalogName AS DEFAULT

5. Create a Full-Text Index with the Statistical_Semantics Option Enabled

The last step is to create a Full-Text Index and specify that you want to use statistical semantics also:

CREATE FULLTEXT INDEX ON TableName
(
	ColumnName
		LANGUAGE 1033
		Statistical_Semantics
)
KEY INDEX PK__PrimaryKey

This code creates a full-text index in the default full-text catalog for the table with name “TableName” and includes only one column: ColumnName. This code assumes that the column selected for the index contains the actual text data. If the column contains binary data (for example, a PDF file), you must also specify the TYPE COLUMN clause with the name of the column that stores the file extension. This will allow proper selection of the IFilter for that row.

The LANGUAGE clause uses a standard language identifier to indicate the language of the text. In SQL Server 2012, Full-Text Indexing supports 58 languages, but semantic search only supports 15 languages.

The Statistical_Semantics clause enables the statistical semantics analysis on that column. You can enable semantic search on a column-by-column basis, even within a single table.

You can also use the SSMS GUI to create the Full-Text Catalog and Index.

Finally…

Now that you have set up your semantic search, you can start using the Table-Valued Functions in SQL Server 2012 to analyze the data in your selected columns. There are several semantic tables that address most scenarios. A future blog post will show an example of resume analysis using this functions.

Advertisements

5 thoughts on “Step-by-Step: Enabling Semantic Search on SQL Server 2012

  1. Hallo
    Im busy studying SQL, ms training kit book. I please need help attaching the scematicdb.
    Keep on getting the error
    Msg 5120, Level 16, State 101, Line 1
    Unable to open the physical file “C:\Program Files\Microsoft Semantic Language Database\semanticsdb.
    mdf”. Operating system error 123: “123(The filename, directory name, or volume label syntax is incorrect.)”.

    Can you help or direct me to website that offer help in these matters?

    1. You should confirm that the semanticsdb.mdf file actually exists in that location. If it does, it would seem that the account under which SQL Server is running does not have permission to access that location. You will need to check the service account and allow it full control (NTFS permission) over the MDF and LDF files.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s