Bulk Loading into MDS using SSIS

Each entity in SQL Server 2012 Master Data Services (MDS) will have it’s own staging table (stg.<name>_Leaf). Using this staging table, you can create, update, deactivate and delete left members in bulk. This post describes how to bulk load into an entity staging table and trigger the stored procedure to start the batch import process.

Staging Tables and Stored Procedures

The new entity based staging tables are an excellent feature in MDS 2012, and make it very easy to bulk load into MDS from SSIS. If you take a look at the SQL database used by your MDS instance, you’ll see at least one table in the stg schema for each entity. For this example I’ve created a Suppliers entity and I see a matching table called [stg].[Suppliers_Leaf]. If your entity is using hierarchies, you will have three staging tables (see BOL for details). If we expand the columns, we’ll see all of the attributes have their own columns, as well as some system columns that every staging table will have.

image

Each staging table will also have a stored procedure that is used to tell MDS that new data is ready to load. Details of the arguments can be found in BOL.

image

Import Columns

To load into this table from SSIS, our data flow will need to do the following:

  • Set a value for ImportType (see below)
  • Set a value for BatchTag
  • Map the column values in the data flow to the appropriate attribute columns

See the Leaf Member Staging Table BOL entry for details on the remaining system columns. If your Code value isn’t set to be generated automatically, then you’d also need to specify it in your data flow. Otherwise, the default fields can be safely ignored when we’re bulk importing.

The BatchTag column is used as an identifier in the UI – it can be any string value, as long as it’s unique (and under 50 characters).

MDS uses the same staging table for creating, updating and deleting entities. The ImportType column indicates which action you want to perform. The possible values are listed in the table below.

 


Value Description
0 Create new members. Replace existing MDS data with staged data, but only if the staged data is not NULL. NULL values are ignored. To change a string attribute value to NULL, set it ~NULL~. To change a number attribute value to NULL, set it to -98765432101234567890. To change a datetime attribute value to NULL, set it to 5555-11-22T12:34:56.
1 Create new members only. Any updates to existing MDS data fail.
2 Create new members. Replace existing MDS data with staged data. If you import NULL values, they will overwrite existing MDS values.
3 Deactivate the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are maintained but no longer available in the UI. If the member is used as a domain-based attribute value of another member, the deactivation will fail. See ImportType 5 for an alternative.
4 Permanently delete the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are permanently deleted. If the member is used as a domain-based attribute value of another member, the deletion will fail. See ImportType 6 for an alternative.
5 Deactivate the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are maintained but no longer available in the UI. If the member is used as a domain-based attribute value of other members, the related values will be set to NULL. ImportType 5 is for leaf members only.
6 Permanently delete the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are permanently deleted. If the member is used as a domain-based attribute value of other members, the related values will be set to NULL. ImportType 6 is for leaf members only.

When you are bulk loading data into MDS, you’ll use 0, 1 or 2 as the ImportType. To summarize the different modes:

  • Use 0 or 2 when you are adding new members and/or updating existing ones (i.e. doing a merge)
    • The difference between 0 and 2 is the way they handle NULLs when updating an existing member. With 0, NULL values are ignored (and require special handling if you actually want to set a NULL value). With 2, all values are replaced, even when the values are NULL.
  • Use 1 when you are only inserting new members. If you are specifying a code, then a duplicate value will cause the import to fail.

Package Design

You control flow will have at least two tasks:

  1. A Data Flow Task that loads your incoming data into the MDS staging table for your entity
  2. An Execute SQL Task which runs the staging table’s stored procedure which tells MDS to start processing the batch

image

Your data flow will have (at least) three steps:

  1. Read the values you want to load into MDS
  2. Add the BatchTag and ImportType column values (using a derived column transform)
  3. Load into the MDS staging table

image

As noted above, in your OLE DB Destination you’ll need to map your data flow columns to your member attributes (including Code if it’s not auto-generated), the BatchTag value (which can be automatically generated via expression), and the ImportType.

image

After the Data Flow, you’ll run the staging table stored procedure.

The first three parameters are required:

  1. The version name (i.e. VERSION_1)
  2. Whether this operation should be logged as an MDS transaction (i.e. do you want to record the change history, and make the change reversible?)
  3. The BatchTag value that you specified in your data flow

 

Additional resources:

SSIS, DQS and MDS Training at SQL Saturday #229 | Dublin, Ireland

I’m honored to be invited back to this year’s SQL Saturday Dublin event. I must have done a decent job last year on my SSIS Design Patterns pre-conference session, because I’ve been asked to do another one this time around as part of their Training Day (June 21st).

This year I’ll be doing a full day training session on Enterprise Information Management, which greatly expands upon one of my more popular talks about combining SSIS, DQS and MDS. The session will include some advanced SSIS topics (such as automation and dynamic package generation), and some of the main SSIS design patterns for data warehousing. The abstract is included below:

Enterprise Information Management (EIM) is an industry term for technologies to manage your data for integration, quality, and governance. This full day training session will show you how Integration Services (SSIS), Data Quality Services (DQS), and Master Data Services (MDS) work together to provide a comprehensive EIM solution in SQL Server 2012. Focusing on Data Warehousing scenarios, we’ll explore the key steps needed to build such a solution, from the beginning (planning, benchmarking), to data curation (cleansing, matching, managing dimensions), to data loading using SSIS. We’ll also touch on some advanced data warehousing ETL design patterns, such as change data capture (CDC), slowly changing dimension processing, and automation.

 

Course Modules

  • Data cleansing and matching
  • Reference data management
  • SSIS design patterns
  • Incremental data loading
  • SSIS deployment, management, and monitoring
  • Automation

Registration details can be found on the Prodata site.

The session schedule hasn’t been posted yet, but I see that a number of people submitted interesting SSIS sessions to the conference. I see a lot of big names from the SQL community, and expect there will be an awesome turnout (just like last time).

I submitted a brand new session entitled Cats, Facebook, and Online Dating with Microsoft BI that I hope will be accepted. I’ll show the SSIS demos I’ve done for the BI Power Hour (2011 | 2012), how I built them, and the lessons that I learned throughout. I’ll try not to make it too egotistical (“Hey look at me and how I made people laugh that one time!”), and try to provide content that attendees will find useful.

Oh, who I am kidding? I’ll totally be showing off Mr. Wiggles.

Hope to see you there!

Slides from Bringing Together SSIS, DQS and MDS | TechDays Hong Kong 2013

A big thank you to everyone who attended my 2nd session at the TechDays Hong Kong event, Enterprise Information Management (EIM): Bringing Together SQL Server Integration Services (SSIS), Data Quality Services (DQS) and Master Data Services (MDS) (wow that’s a long title). The slides are available below. The files for my integrated demo are available on my skydrive share.

Note, this session is very similar to the one I did at DevTeach Montreal and the 2012 PASS Summit (when I co-presented with Matthew Roche). I keep meaning to retire it, but I keep getting asked to present it. I think the integrated demo which shows how all of the products can work together really resonates well with people. If you are interested in this content, be sure to check out the other EIM session that Matthew Roche and I presented at TechEd North America 2012 – the slides are similar, but approaches the problem from the DQS/MDS perspective, rather than focusing on SSIS.

EIM presentation material from DevTeach Montreal

I presented an Enterprise Information Manager talk earlier this week at the DevTeach Montreal conference. Unlike my previous talk from TechEd North America, which tackled the problem from the Data Curation (DQS/MDS) side, this talk has a focus on using SSIS to integrate and automate your solution. The demo files are now available from my Skydrive share, and the slides are embedded below.

Presenting at TechEd North America 2012

TechEd is my favorite Microsoft event – the size of the conference, and the number of sessions you can choose from always amazes me. I’m honored to have been selected to present once again. This year I’ll be doing three sessions:

 

Enterprise Information Management (EIM): Bringing Together SSIS, DQS, and MDS
DBI310 | Advanced – 300 | Monday, June 11 @ 4:45 PM

Enterprise Information Management (EIM) is an industry term for managing your data for data integration, quality, and governance, and is an important part of the Microsoft SQL Server 2012 release. This session revolves around a demo which brings together our EIM functionality in SQL Server 2012 and which tells our Credible, Consistent Data story. We show you how SQL Server Integration Services (SSIS), Data Quality Services (DQS), Master Data Services (MDS) and other Microsoft technologies work together to provide a comprehensive EIM solution.

 

Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012
DBI322 | Advanced – 300 | Tuesday, June 12 @ 10:15 AM

Exploring real-world customer scenarios, we take a look at how the new Change Data Capture (CDC) components for SSIS in Microsoft SQL Server 2012 simplify incremental ETL and Data Warehouse loads. We demo the new functionality in SSIS, and show off how the new tools make it easy to set up and configure CDC against both SQL Server and Oracle sources. We present the different CDC configuration options, and demonstrate design patterns that can be used to overcome common problems you’ll face while doing incremental loading of data. If your ETL solution needs to handle incremental loads from data sources that were designed explicitly to support them, then this is a session you can’t afford to miss.

 

BI Power Hour
DBI207 | Intermediate – 200 | Tuesday, June 12 @ 5:00 PM

The legacy continues. The Microsoft Business Intelligence team is excited to once again present the BI PowerHour. In this entertaining session, we dazzle you with new demonstrations that expose the lighter side of Business Intelligence exposing the flexibility and power of the Microsoft BI Platform. If you’ve never been to a Power Hour session, you definitely don’t want to miss it. We promise you’ll walk away impressed.

 

techEdHeaderLogo[1] 

SQL Server 2012 Case Studies for DQS

I’ve had a lot of people ask me recently for real-life examples of how customers are using Data Quality Services (DQS). Even though SQL Server 2012 has been out less than a month, we already have a number of case studies published which describe how DQS plays a key role within a customer’s infrastructure. Most of the studies involve end-to-end Enterprise Information Management (EIM) solutions which include SSIS and Master Data Services (MDS) as well.

Here are the five DQS case studies that are currently available on Microsoft.com:

  • Areva – Energy Firm Speeds the Delivery of Reliable, Centralized Master Data to Customers
  • China Guangdong Nuclear Power Holding Corporation – Chinese Energy Utility Builds BI Solution to Improve Information Sharing and Efficiency
  • Super 8 Hotels Co., Ltd. – Hotel Chain Uses Business Intelligence Tools to Guide Rapid Growth Across China
  • Great Western Bank – Fast-Growing Bank Gains Customers and Maximizes Profits with Microsoft BI Tools
  • RealtyTrac – Real Estate Website Helps Customers Make Better Decisions with Higher Quality Data

Getting Started with DQS and MDS

If you’re looking to get started with Data Quality Services (DQS) and Master Data Services (MDS), there are some fantastic resources available on Technet. The site includes videos and slides for full day training sessions on both products.

Data Quality Services for SQL Server 2012

  • Data Quality Basics and Introducing DQS: Video | Slides
  • Knowledge Management and Data Cleansing in DQS: Video | Slides
  • Data Matching in DQS: Video | Slides
  • DQS Integration with SSIS: Data Cleansing using SSIS: Video | Slides
  • DQS Integration with MDS: Data Matching using MDS: Video | Slides

Master Data Services for SQL Server 2012

  • Master Data Services Overview: Video | Slides
  • Managing Data Warehousing Dimensions with MDS, Part 1: Video | Slides
  • Managing Data Warehousing Dimensions with MDS, Part 2: Video
  • Data Loading via Entity Based Staging (EBS): Video | Slides
  • MDS Hierarchies and Collections: Video | Slides
  • Business Rules and Workflow in MDS: Video | Slides
  • MDS Model Migration and Upgrade: Video | Slides
  • Security Features and Guidelines in MDS: Video | Slides
  • Eliminate Duplicate Data with the MDS Add-In for Excel: Video | Slides