Putting Quality Into Your Data – Part 2


Part 1 can be found at: https://sharepointmike.wordpress.com/2013/01/26/

If you previously installed SQL Server 2012 on a test machine but did not specifically select Data Quality Services (DQS) (the server component) from the Database Engine Services group or the Data Quality Client Integration Services and SQL Server Data Tools from the Shared Features group, you need to go back to your install disk to include these three components on the feature selection page of the 2012 Installation wizard.  You probably should also include Management Tools – Complete.  These can be installed on different machines, but I am going to assume that your test environment, like mine, is all on the same machine.

Before you can run Data Quality Client, you need to run the installer.  You should find the installer in the Microsoft SQL Server 2012 group of your Start menu in the Data Quality Services folder.  The installer name is cleverly named Data Quality Server Installer.  The installer will run in a Command window.  It first prompts for a password for the Database Master Key.  This password must be at least 8 characters long and contain a mix of upper and lower case characters, numbers and special characters.  After entering the password, you must re-enter it to confirm it.  Upon completion of the install, you should see the message: Press any key to continue.

Next you have to define Data Quality Service Roles for users through SQL Server Management Studio.  For test purposes, you at least need a role for yourself.  In the Security folder of the database server create a new login or use your existing one if present.  You must use Windows authentication for this login account.  Next select User Mapping from the Login Properties dialog.  Select the database DQS_MAIN and set the role to dqs.administrator.  Then close the dialog.

You can now make your data available for DQS Operations. You can do this by copying your data to the DQS_STAGIONG_DATA database.  To do this you need an account with db_datawriter, db_datareader, and db_ddladmin rights on this database.  However, you can also simply create a copy of your database in the same server instance as the Data Quality Server which in this case is my test server.  Because I am running both Data Quality Server and Data Quality Client on the same machine, I do not have to worry about installing and configuring TCP/IP Network Protocol to remotely access the Data Quality Server.

I can now start running the Data Quality Client from the Microsoft SQL Server 2012 group of the Start menu selecting localhost as my server.  After connecting to the Data Quality Server, I will see the Data Quality Client home screen shown below.  Now the fun can begin.

DQS001

Obviously, the Data Quality Client has three major sections; Knowledge Base Manager, Data Quality Projects, and Administration.  These are arranged left to right in the order that most people use them.  In other words, to begin you must have a knowledge base against which you will evaluate your data.  Out-of-the-box, SQL Server 2012 includes a sample knowledge base that includes eight domains which can be used to correct and validate data.  These domains include the following:

Country/Region – This domain includes various abbreviations for the countries of the world including the international three-digit country identifier also known as the ISO 3166-1 numeric code.

DQS002

DQS003

It also includes the 2-character ISO 3166-1 alpha-2 code and the 3-character ISO 3166 alpha-3 code.  It each of these cases, it translates the value to the full country name.

Country/Region (three-letter leading) – This domain corrects data to the ISO 3166-1 alpha-3 code.

Country/Region (two-letter leading) – This domain corrects data to the ISO 3166-1 alpha-2 code.

US Counties – This domain lists all counties in United States (including parishes, municipalities, boroughs, etc.)

US Last Name – This domain consists of over 151,000 last names which occurred 100 times or more in the 2000 census.

US Places – This domain includes the names of places within the 50 states of United States, District of Columbia, and Puerto Rico pulled from the 2010 census.

US – States – This domain allows you to convert state abbreviations to display the full state name.

US – State (2-letter leading) – This domain allows you to convert state names to display the two-character state abbreviation.

You can also create new knowledge databases, a topic I will cover at a later time.

The Data Quality Project section allows you to create and manage your data quality projects.  This is where you will go to clean your data and perform matching.  DQS will also make suggestions in this section to correct your data.  Specifically, Data Cleansing uses rules to analyze the data and make changes to the data.  Data Matching can be used to group similar records to help you identify and remove duplicate records.

The Administration section is where you can monitor your data quality activities, configure the server, and reference data providers.

That’s all for this week, Until next time, set up your Data Quality Server and Data Quality Client so we can begin using and creating knowledge databases and set up our first data quality project.

C’ya next time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s