First, you need to know that the MDS I’m talking about here is Master Data Services introduced in SQL Server 2008 R2. I’ve already talked about Data Quality Services (DQS) which I believe was only introduced in SQL Server 2012. So the natural question is what is MDS and why do I need it together with the newer DQS or is DQS just a replacement for MDS and can we forget about learning MDS now that we have DQS?
Previously I talked about DQS and showed how you can use it to create individual knowledgebase domains that can then be used to clean data. The application of DQS on a regular based to clean data going into a production database should improve your overall efficiency compared to manually cleaning the same data. This is because DQS remember the correction you made previously. You can also define rules for ‘fixing’ individual domains. Efficiency improves over time because the knowledge base understands how to handle an increasing number of problems without requiring manual intervention. However, at the same time, users have to be realistic. There will probably never come a time when absolute all the possible data issues have been addressed allowing for fully automatic data cleansing.
You could think of DQS as the detergent that helps you clean bad data from a data source getting rid of invalid and misspelled data. It can also help you reduce duplicate records within that individual data source. It can even output a data file of cleansed information with duplicate records removed or at least marked. However DQS provides no assistance in merging the clean output data file with another data source. The ability to combine a freshly cleaned data source with a master data file is what Master Data Services can provide. To some extent, both products have the ability to cleanse and match data from an input data source. However, the rules in DQS are domain (column) centric while the rules in MDS are typically more record centric. Also MDS as the name implies helps you import data from new sources into a common master table. Another way of saying this is that MDS might be the gatekeeping of your enterprise data system that controls what data can or cannot be added to your data warehouse.
Because both DQS and MDS provide the ability to define cleansing rules, users may jump to the conclusion that these two products are mutually exclusive. However, the purpose of the cleansing rules between these two products needs to be clarified. The purpose of the cleansing rules in DQS is primarily to correct and validate values added to each domain. DQS rules can determine which values are allowed or not allo0wed in the domain. These rules are not record based, but column based.
On the other hand, the business rules created in MDS primarily exist to prevent invalid records from being added to a master data table. In other words, it keeps new records from entering or updating the master set of records which may sit in your enterprise data warehouse.
Based on this description, you should see that DQS and MDS are clearly different products. Furthermore, you may not always need to use both of them. If your purpose is merely to clean bad data from a data entry system and possibly remove duplicate records, you may be perfectly satisfied with using only DQS. Similarly, if your task is to merge clean data from multiple data sources such as sales data from different countries, you may be more interested in the abilities of MDS to combine records based on business rules that prevent record duplication while allowing data to be merged. Of course if the data sources that you want to merge have not first been cleaned, you may need to first run them through a DQS project with a corresponding knowledgebase to ‘fix’ any domain issues such as different product codes for the same product in different countries or perhaps even differences in the way customer address information is recorded.
In future weeks, I will take you an exploration of MDS in between some other technical topics that I want to cover. Before I end today, I want to remind all my readers of two events coming up very soon where I can meet you.
The first is the SQL Saturday Orlando event coming up on September 14th. SQL Saturday events started right here in Orlando back in November 2007 by Andy Warren. I’ve been fortunate to have been a speaker at every Orlando event. This year I will be presenting an introduction to DQS that I call, ‘Putting Quality Back Into Your Data’.
The second big event coming up is the 2013 PASS Summit that this year is being held in Charlotte, North Carolina during the week of October 15-18. This will be my second year speaking at the PASS Summit. If you have never been to a PASS Summit, you are missing the opportunity to meet, talk with and listen to presentations from SQL professionals from around the world. I feel very honored to be able to give two presentations this year, one on DQS and the other on using SSAS’s Tabular Model to build Cubes that you can manipulate as Excel PowerPivot models.
If you have the opportunity to get to either of these events, please stop by to say Hello. I would love to meet you all.
C’ya next time.