Links from the SSIS Roadmap Session

Here are some of the resources I mentioned in the SSIS Roadmap session at the PASS Summit.

SSIS Reporting Pack from Jamie Thomson



DQS Matching Transform from OH22 Data



DQS Domain Value Import Destination from OH22 Data


There is also a great series of blog posts (part 1, part 2, part 3) on using these transforms on the DQS team blog.

Data Feed Publishing Components


Quick Tip: How can I tell when my DQS knowledge base has been updated?

imageYou can improve data quality in your SSIS data flows by using the DQS Cleansing transform (new in SQL 2012). When you’re using DQS, the data is cleansed according to the rules you’ve built up in the DQS knowledge base. As you update the knowledge base with better rules, your overall data quality goes up. Therefore, you might find that you want to re-process data that’s already been through DQS anytime you make a significant update to the knowledge base.

Think of your data as “water”, and your knowledge base as a “water filter”… as your filter improves in quality (you improve your knowledge base), the water (data) you run through it gets cleaner.

I usually recommend storing the date value of when the data was processed with DQS. This could be a column value, if you want to track things at the row level, or a entry in a separate file/table, if you’re tracking it at the table/DB level. (I describe this pattern in detail in the SSIS Design Patterns book)

While there is no official notification system or API to know when a DQS knowledge base has been updated, the DQS database does contain a PUBLISH_DATE field that you can query. This won’t tell you which individual domains were touched, or what was actually modified, but given that you probably won’t be publishing a new version of your knowledge base without a good reason, it should be good enough for most automated data cleansing scenarios.

Query the Publish Date for a DQS Knowledge Base

We’re interested in the [dbo].[A_KNOWLEDGEBASE] table in the [DQS_MAIN] database. This table lists all of the knowledge bases in your system, as well as any active projects. You can use the following query to get the last time the knowledge base was updated (replace the [Name] clause with the name of your KB).

SSIS, DQS and MDS Training at SQL Saturday #229 | Dublin, Ireland

I’m honored to be invited back to this year’s SQL Saturday Dublin event. I must have done a decent job last year on my SSIS Design Patterns pre-conference session, because I’ve been asked to do another one this time around as part of their Training Day (June 21st).

This year I’ll be doing a full day training session on Enterprise Information Management, which greatly expands upon one of my more popular talks about combining SSIS, DQS and MDS. The session will include some advanced SSIS topics (such as automation and dynamic package generation), and some of the main SSIS design patterns for data warehousing. The abstract is included below:

Enterprise Information Management (EIM) is an industry term for technologies to manage your data for integration, quality, and governance. This full day training session will show you how Integration Services (SSIS), Data Quality Services (DQS), and Master Data Services (MDS) work together to provide a comprehensive EIM solution in SQL Server 2012. Focusing on Data Warehousing scenarios, we’ll explore the key steps needed to build such a solution, from the beginning (planning, benchmarking), to data curation (cleansing, matching, managing dimensions), to data loading using SSIS. We’ll also touch on some advanced data warehousing ETL design patterns, such as change data capture (CDC), slowly changing dimension processing, and automation.


Course Modules

  • Data cleansing and matching
  • Reference data management
  • SSIS design patterns
  • Incremental data loading
  • SSIS deployment, management, and monitoring
  • Automation

Registration details can be found on the Prodata site.

The session schedule hasn’t been posted yet, but I see that a number of people submitted interesting SSIS sessions to the conference. I see a lot of big names from the SQL community, and expect there will be an awesome turnout (just like last time).

I submitted a brand new session entitled Cats, Facebook, and Online Dating with Microsoft BI that I hope will be accepted. I’ll show the SSIS demos I’ve done for the BI Power Hour (2011 | 2012), how I built them, and the lessons that I learned throughout. I’ll try not to make it too egotistical (“Hey look at me and how I made people laugh that one time!”), and try to provide content that attendees will find useful.

Oh, who I am kidding? I’ll totally be showing off Mr. Wiggles.

Hope to see you there!

Presenting to the .NET Montreal community group | May 27th

The fine folks over at Communauté .NET Montréal have invited me to speak at their meeting on May 27th, 2013. I’ll be using this as a practice run for my sessions at the DevTeach Toronto event later that week. Come join us at the Microsoft Montreal office if you’re interested in finding out more about:

Data Quality Services (DQS) – End to End
Microsoft’s SQL Server Data Quality Services (DQS) is a unique solution that is based on the creation and maintenance of Data Quality Knowledge Bases (DQKB) and the ability to use them efficiently for a variety of Data Quality improvements. In this session we’ll walk through the creation of a DQS solution, discuss the main concepts behind the creation of the DQKB, and how to use DQS in various scenarios and activities.


Developing Extensions for SSIS
This code heavy session walks you through the creation of a custom SQL Server Integration Services (SSIS) Task, and a custom Data Flow component. If you’ve ever thought of customizing SSIS functionality beyond what you can do with scripting, this talk is for you! All examples will be in coded in C#, and made available following the session.


Registration details can be found here.

Slides from Bringing Together SSIS, DQS and MDS | TechDays Hong Kong 2013

A big thank you to everyone who attended my 2nd session at the TechDays Hong Kong event, Enterprise Information Management (EIM): Bringing Together SQL Server Integration Services (SSIS), Data Quality Services (DQS) and Master Data Services (MDS) (wow that’s a long title). The slides are available below. The files for my integrated demo are available on my skydrive share.

Note, this session is very similar to the one I did at DevTeach Montreal and the 2012 PASS Summit (when I co-presented with Matthew Roche). I keep meaning to retire it, but I keep getting asked to present it. I think the integrated demo which shows how all of the products can work together really resonates well with people. If you are interested in this content, be sure to check out the other EIM session that Matthew Roche and I presented at TechEd North America 2012 – the slides are similar, but approaches the problem from the DQS/MDS perspective, rather than focusing on SSIS.

Dynamic Lookups With Data Explorer

In my previous post, I used Data Explorer to grab the list of upcoming events from the SQL Saturday website. Excel has had the ability to grab data from HTML pages for awhile, but using Data Explorer provided a richer set of options, and allowed me to apply a series of transforms to get the data exactly how I wanted it.

Using this events data, I want to do two things:

  • Enrich the data with some more details about the event venue’s location
  • Plot the events on a map/chart so I can see which ones are close to me

This post will cover the first step, and let’s me show off some of the more advanced Data Explorer functionality.

Address Lookups

There are a number of web services that allow you to validate an address, or fetch additional details (such as latitude and longitude). I just happen to have a subscription to Melissa Data’s Address Check service, provided through the Windows Azure Marketplace. While the service was designed to work with Data Quality Services, it can also be invoked programmatically, and returns a whole bunch of additional information for the address you provide. It will even return suggestions when an exact match for the address isn’t found.


I previously covered how to retrieve data from the Windows Azure Marketplace, but unfortunately that method won’t help us here because we’re not dealing with a pre-defined set of data (I’m not sure how long it would take to pull in all possible US and Canada address combinations to Excel, but it’s probably not something I’d want to try on my laptop). The Address Check service I want to use is an OData Service (in fact, all data market offerings have OData interfaces), and it exposes a single function with three parameters

  • Address
  • MaximumSuggestions
  • MinimumConfidence

I found all of the details (including the service URL) on the service description page in the marketplace.

Dynamic Lookups

I’ll use the list of events I retrieved in my previous post as a starting point. To keep the original query intact, I’ll Duplicate my query using the option on the QUERY tab.


Next, I click on the Filter & Shape button to bring up my query.


I want to invoke the address lookup function on my OData Service for each row of my data set, and pass in the value of the Location field, but couldn’t find an option that would allow this on the query editor UI. Instead, we can turn to the Data Explorer formula language, and start adding some custom transformations.

First, we need to add a new transformation step by clicking the formula button.


By default, the step we just created displays the same data as the previous step. We can change this by modifying the value in the formula bar. Note, at the moment there is no intellisense or syntax highlighting for the formulas, but I’m hoping we’ll see this in a future update. The formula to invoke the external OData Service took some trial and error, but I eventually got it working.

First we define the lookup as an auxiliary query. We’ll name the step Lookup to make it easy to reference.

= OData.Feed(“”){0}[Data]

This formula is accessing the OData Service at the given URL, and returning the data ([Data]) at the first member ({0}), which in this case is the Melissa Data SuggestAddress function that we’re looking for.


When the function is returned, Data Explorer provides a UI that shows us the arguments, and indicates that the result will come back in a table format. Pressing the Invoke button lets us enter static values (shown below for demonstration purposes), but as we’re interested in dynamic lookups using the data from our existing query, we’ll skip this for now.


We then add a second step which invokes the Lookup function we’ve defined for each row in our data set. We can click the formula button again to create the new step, and use this value:

= Table.AddColumn(RenamedColumns, “Address”, each Lookup([Location], 1, 0))

Let’s break down what this formula is doing:

  • Add a new column to the result of RenamedColumns step
  • Name the column “Address”
  • For each row in the data set, call the function defined by the Lookup step
  • When calling the function, pass in the [Location] field (with two hard coded values – 1, 0 – for the remaining arguments)

After entering the formula, we get a notification that credentials are required to access this data source.


Clicking Edit credentials brings up the Data Explorer credential manager. The Windows Azure Marketplace uses the concept of a Feed Key (similar to an “API key” used by many web/REST based APIs). Clicking the link on the page brings you directly to the Marketplace page which displays your key – you can copy & paste it from there. (Be careful with the copy & paste! On my first couple of tries, I included extra whitespace before and after the key, which ended up giving me a permission error when I tried to connect)


After entering the key, we can Save and go back to our query. We immediate see another error:


Formula.Firewall: Query ‘SqlSaturdayEvents’ is accessing data sources which can not be used together.


By default, it appears that Data Explorer doesn’t allow you to pass data between services, as it could result in a “leak”. For example, in this case I’m sending my original Location data (coming from the SQL Saturday website) to the OData Service (run by Melissa Data). Thankfully, there is a way to allow this behavior using functionality called “Fast Combine”.

We can click Done to save the query we have at this point (even though it’s in an error state), and go back to Excel. On the DATA EXPLORER tab, we can see the Fast Combine option.


Clicking it brings up a warning dialog that enabling Fast Combine may improve query performance, but may introduce privacy issues.


Click Enable, go back to Filter & Shape, and now the data is displayed.


The results of the OData Service come back as a table, which we can expand inline by clicking the small icon next to the column title.


After selecting the columns we want, we can see the new values in our data set.


The Melissa Data service we’re using is for US and Canadian addresses, so we can see that the non-US locations aren’t getting Latitude and Longitude results. Addresses that aren’t matched should come back with a confidence level of 0, so we can use this to filter out these unmatched locations from our list. Our filter will only include rows that have a Confidence value greater than 0.



The remaining results look pretty good. We can click Done to save the query and get the results into Excel.



Wow – I’m impressed with Data Explorer so far. I really like the idea of mashing up data sources like this, and browsing through the formula language gave me an idea of what’s possible.

In my next post, I’ll take a look at importing the data into a PowerPivot, and then doing some visualizing with Power View.

For reference, here are the queries I used at each step:


Source = Table.TransformColumns(Data0,{},Value.FromText)
HiddenColumns = Table.RemoveColumns(ChangedType,{“Column1″, “Column3″})
SplitColumnDelimiter = Table.SplitColumn(HiddenColumns,”Column4″,Splitter.SplitTextByEachDelimiter({“-“}, null, false),{“Column4.1″, “Column4.2″})
SplitColumnPos = Table.SplitColumn(SplitColumnDelimiter,”Column4.1″,Splitter.SplitTextByPositions({0, 13}, false),{“Column4.1.1″, “Column4.1.2″})
HiddenColumns1 = Table.RemoveColumns(SplitColumnPos,{“Column4.1.1″})

= Table.RenameColumns(HiddenColumns1,{{“Column2″, “Date”}, {“Column4.1.2″, “Number”}, {“Column4.2″, “Name”}, {“Column5″, “Location”}})

Lookup = OData.Feed(“”){0}[Data]

= Table.AddColumn(RenamedColumns, “Address”, each Lookup([Location], 1, 0))

Expand Address

= Table.ExpandTableColumn(AddressLookup, “Address”, {“Confidence”, “Latitude”, “Longitude”, “City”, “State”, “Country”}, {“Address.Confidence”, “Address.Latitude”, “Address.Longitude”, “Address.City”, “Address.State”, “Address.Country”})


= Table.SelectRows(#”Expand Address”, each [Address.Confidence] > 0)

DevTeach Toronto – May 27-31

I’ll be heading to Toronto in May to present at the DevTeach conference. I’ll be doing an intermediate (300) level talk about DQS, and an advanced (400 level) talk about developing custom extensions for SSIS (tasks, components, connection managers, and loggers).

Data Quality Services (DQS) – End to End
Microsoft’s SQL Server Data Quality Services (DQS) is a unique solution that is based on the creation and maintenance of Data Quality Knowledge Bases (DQKB) and the ability to use them efficiently for a variety of Data Quality improvements. In this session we’ll walk the creation of a DQS solution, discuss the main concepts behind the creation of the DQKB, and how to use DQS in various scenarios and activities.

Developing Extensions for SSIS
This code heavy session walks you through the creation of a custom SQL Server Integration Services (SSIS) Task, and a custom Data Flow component. If you’ve ever thought of customizing SSIS functionality beyond what you can do with scripting, this talk is for you! All examples will be in coded in C#, and made available following the session.




EIM presentation material from DevTeach Montreal

I presented an Enterprise Information Manager talk earlier this week at the DevTeach Montreal conference. Unlike my previous talk from TechEd North America, which tackled the problem from the Data Curation (DQS/MDS) side, this talk has a focus on using SSIS to integrate and automate your solution. The demo files are now available from my Skydrive share, and the slides are embedded below.

Slides from DQS presentation

A big thank you to everyone who came to tonight’s DQS presentation for the New England SQL Server User Group. As promised, you can find the slides on my skydrive share (as well as embedded below). I also wanted to list some of the great DQS resources mentioned in the talk: