Links from the SSIS Roadmap Session

Here are some of the resources I mentioned in the SSIS Roadmap session at the PASS Summit.

SSIS Reporting Pack from Jamie Thomson



DQS Matching Transform from OH22 Data



DQS Domain Value Import Destination from OH22 Data


There is also a great series of blog posts (part 1, part 2, part 3) on using these transforms on the DQS team blog.

Data Feed Publishing Components


Quick Tip: How can I tell when my DQS knowledge base has been updated?

imageYou can improve data quality in your SSIS data flows by using the DQS Cleansing transform (new in SQL 2012). When you’re using DQS, the data is cleansed according to the rules you’ve built up in the DQS knowledge base. As you update the knowledge base with better rules, your overall data quality goes up. Therefore, you might find that you want to re-process data that’s already been through DQS anytime you make a significant update to the knowledge base.

Think of your data as “water”, and your knowledge base as a “water filter”… as your filter improves in quality (you improve your knowledge base), the water (data) you run through it gets cleaner.

I usually recommend storing the date value of when the data was processed with DQS. This could be a column value, if you want to track things at the row level, or a entry in a separate file/table, if you’re tracking it at the table/DB level. (I describe this pattern in detail in the SSIS Design Patterns book)

While there is no official notification system or API to know when a DQS knowledge base has been updated, the DQS database does contain a PUBLISH_DATE field that you can query. This won’t tell you which individual domains were touched, or what was actually modified, but given that you probably won’t be publishing a new version of your knowledge base without a good reason, it should be good enough for most automated data cleansing scenarios.

Query the Publish Date for a DQS Knowledge Base

We’re interested in the [dbo].[A_KNOWLEDGEBASE] table in the [DQS_MAIN] database. This table lists all of the knowledge bases in your system, as well as any active projects. You can use the following query to get the last time the knowledge base was updated (replace the [Name] clause with the name of your KB).

EIM presentation material from DevTeach Montreal

I presented an Enterprise Information Manager talk earlier this week at the DevTeach Montreal conference. Unlike my previous talk from TechEd North America, which tackled the problem from the Data Curation (DQS/MDS) side, this talk has a focus on using SSIS to integrate and automate your solution. The demo files are now available from my Skydrive share, and the slides are embedded below.

Slides from DQS presentation

A big thank you to everyone who came to tonight’s DQS presentation for the New England SQL Server User Group. As promised, you can find the slides on my skydrive share (as well as embedded below). I also wanted to list some of the great DQS resources mentioned in the talk:


Speaking at the New England SQL Server User Group

I’ve been invited to do a presentation about Data Quality Services for the New England SQL Server User Group on October 18th. I’ll be presenting an in depth look at DQS, what some of our customers are currently doing with the product, and might even have time to show how DQS fits into larger Enterprise Information Management solutions as well.

The abstract:

Microsoft’s SQL Server Data Quality Services (DQS) is a unique solution that is based on the creation and maintenance of Data Quality Knowledge Bases (DQKB) and the ability to use them efficiently for a variety of Data Quality improvements. In this session we’ll walk the creation of a DQS solution, discuss the main concepts behind the creation of the DQKB, and how to use DQS in various scenarios and activities.

For more details, please see the event posting on the NESQL site. I’m told that to attend, you must be on the group’s mailing list (the registration link is available on their website) so you can RSVP when the invitation email goes out.

Hope to see you there!

SSIS & EIM Talks From TechEd North America 2012 Now Available

The recordings of my the SSIS sessions I presented at TechEd North America 2012 are now available online.

Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012

Enterprise Information Management (EIM): Bringing Together SSIS, DQS, and MDS

SQL Server 2012 Case Studies for DQS

I’ve had a lot of people ask me recently for real-life examples of how customers are using Data Quality Services (DQS). Even though SQL Server 2012 has been out less than a month, we already have a number of case studies published which describe how DQS plays a key role within a customer’s infrastructure. Most of the studies involve end-to-end Enterprise Information Management (EIM) solutions which include SSIS and Master Data Services (MDS) as well.

Here are the five DQS case studies that are currently available on

  • Areva – Energy Firm Speeds the Delivery of Reliable, Centralized Master Data to Customers
  • China Guangdong Nuclear Power Holding Corporation – Chinese Energy Utility Builds BI Solution to Improve Information Sharing and Efficiency
  • Super 8 Hotels Co., Ltd. – Hotel Chain Uses Business Intelligence Tools to Guide Rapid Growth Across China
  • Great Western Bank – Fast-Growing Bank Gains Customers and Maximizes Profits with Microsoft BI Tools
  • RealtyTrac – Real Estate Website Helps Customers Make Better Decisions with Higher Quality Data

DQS – Using the Changes History Panel

The DQS client will automatically sort the list of Domain Values, which can make adding new values to a big list tricky. Once you enter the new value at the bottom of the list, it gets automatically sorted, and you need to scroll up the list to find it again. However, there is a better way!

At the far end of the toolbar, there is a drop down button which exposes the lesser used commands. One of these is the ShowHide domain values changes history panel button, which turns out to be incredible useful.

Domain Values Changes History panel button

When the panel is enabled, it is displayed at the bottom of the Domain Values page. When a value is added or modified, the change will be displayed in the log window, with a hyperlink to the domain value. Clicking on this link will automatically scroll/focus on the list to the target value.


Getting Started with DQS and MDS

If you’re looking to get started with Data Quality Services (DQS) and Master Data Services (MDS), there are some fantastic resources available on Technet. The site includes videos and slides for full day training sessions on both products.

Data Quality Services for SQL Server 2012

  • Data Quality Basics and Introducing DQS: Video | Slides
  • Knowledge Management and Data Cleansing in DQS: Video | Slides
  • Data Matching in DQS: Video | Slides
  • DQS Integration with SSIS: Data Cleansing using SSIS: Video | Slides
  • DQS Integration with MDS: Data Matching using MDS: Video | Slides

Master Data Services for SQL Server 2012

  • Master Data Services Overview: Video | Slides
  • Managing Data Warehousing Dimensions with MDS, Part 1: Video | Slides
  • Managing Data Warehousing Dimensions with MDS, Part 2: Video
  • Data Loading via Entity Based Staging (EBS): Video | Slides
  • MDS Hierarchies and Collections: Video | Slides
  • Business Rules and Workflow in MDS: Video | Slides
  • MDS Model Migration and Upgrade: Video | Slides
  • Security Features and Guidelines in MDS: Video | Slides
  • Eliminate Duplicate Data with the MDS Add-In for Excel: Video | Slides

Data Quality Services Performance Best Practices Guide – now available!

The Data Quality Services (DQS) Performance Best Practices Guide (or DQSPDPG for short) is now available on the Microsoft Download Center. It covers hardware and setup considerations, how matching policies will impact your performance, and some best practices when using the DQS Cleansing transform in SSIS. I was also happy to see a section in there about the impact of using advanced DQS functionality, such as Composite Domains, Term-Based Relations, and Reference Data Services. A must read for all DQS users…