You can improve data quality in your SSIS data flows by using the DQS Cleansing transform (new in SQL 2012). When you’re using DQS, the data is cleansed according to the rules you’ve built up in the DQS knowledge base. As you update the knowledge base with better rules, your overall data quality goes up. Therefore, you might find that you want to re-process data that’s already been through DQS anytime you make a significant update to the knowledge base.
Think of your data as “water”, and your knowledge base as a “water filter”… as your filter improves in quality (you improve your knowledge base), the water (data) you run through it gets cleaner.
I usually recommend storing the date value of when the data was processed with DQS. This could be a column value, if you want to track things at the row level, or a entry in a separate file/table, if you’re tracking it at the table/DB level. (I describe this pattern in detail in the SSIS Design Patterns book)
While there is no official notification system or API to know when a DQS knowledge base has been updated, the DQS database does contain a PUBLISH_DATE field that you can query. This won’t tell you which individual domains were touched, or what was actually modified, but given that you probably won’t be publishing a new version of your knowledge base without a good reason, it should be good enough for most automated data cleansing scenarios.
Query the Publish Date for a DQS Knowledge Base
We’re interested in the [dbo].[A_KNOWLEDGEBASE] table in the [DQS_MAIN] database. This table lists all of the knowledge bases in your system, as well as any active projects. You can use the following query to get the last time the knowledge base was updated (replace the [Name] clause with the name of your KB).
WHERE [TYPE] = 1 -- knowledge bases only (no projects)
AND [NAME] = N'DQS DATA'