Bulk Loading into MDS using SSIS

Each entity in SQL Server 2012 Master Data Services (MDS) will have it’s own staging table (stg.<name>_Leaf). Using this staging table, you can create, update, deactivate and delete left members in bulk. This post describes how to bulk load into an entity staging table and trigger the stored procedure to start the batch import process.

Staging Tables and Stored Procedures

The new entity based staging tables are an excellent feature in MDS 2012, and make it very easy to bulk load into MDS from SSIS. If you take a look at the SQL database used by your MDS instance, you’ll see at least one table in the stg schema for each entity. For this example I’ve created a Suppliers entity and I see a matching table called [stg].[Suppliers_Leaf]. If your entity is using hierarchies, you will have three staging tables (see BOL for details). If we expand the columns, we’ll see all of the attributes have their own columns, as well as some system columns that every staging table will have.

image

Each staging table will also have a stored procedure that is used to tell MDS that new data is ready to load. Details of the arguments can be found in BOL.

image

Import Columns

To load into this table from SSIS, our data flow will need to do the following:

  • Set a value for ImportType (see below)
  • Set a value for BatchTag
  • Map the column values in the data flow to the appropriate attribute columns

See the Leaf Member Staging Table BOL entry for details on the remaining system columns. If your Code value isn’t set to be generated automatically, then you’d also need to specify it in your data flow. Otherwise, the default fields can be safely ignored when we’re bulk importing.

The BatchTag column is used as an identifier in the UI – it can be any string value, as long as it’s unique (and under 50 characters).

MDS uses the same staging table for creating, updating and deleting entities. The ImportType column indicates which action you want to perform. The possible values are listed in the table below.

 


Value Description
0 Create new members. Replace existing MDS data with staged data, but only if the staged data is not NULL. NULL values are ignored. To change a string attribute value to NULL, set it ~NULL~. To change a number attribute value to NULL, set it to -98765432101234567890. To change a datetime attribute value to NULL, set it to 5555-11-22T12:34:56.
1 Create new members only. Any updates to existing MDS data fail.
2 Create new members. Replace existing MDS data with staged data. If you import NULL values, they will overwrite existing MDS values.
3 Deactivate the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are maintained but no longer available in the UI. If the member is used as a domain-based attribute value of another member, the deactivation will fail. See ImportType 5 for an alternative.
4 Permanently delete the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are permanently deleted. If the member is used as a domain-based attribute value of another member, the deletion will fail. See ImportType 6 for an alternative.
5 Deactivate the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are maintained but no longer available in the UI. If the member is used as a domain-based attribute value of other members, the related values will be set to NULL. ImportType 5 is for leaf members only.
6 Permanently delete the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are permanently deleted. If the member is used as a domain-based attribute value of other members, the related values will be set to NULL. ImportType 6 is for leaf members only.

When you are bulk loading data into MDS, you’ll use 0, 1 or 2 as the ImportType. To summarize the different modes:

  • Use 0 or 2 when you are adding new members and/or updating existing ones (i.e. doing a merge)
    • The difference between 0 and 2 is the way they handle NULLs when updating an existing member. With 0, NULL values are ignored (and require special handling if you actually want to set a NULL value). With 2, all values are replaced, even when the values are NULL.
  • Use 1 when you are only inserting new members. If you are specifying a code, then a duplicate value will cause the import to fail.

Package Design

You control flow will have at least two tasks:

  1. A Data Flow Task that loads your incoming data into the MDS staging table for your entity
  2. An Execute SQL Task which runs the staging table’s stored procedure which tells MDS to start processing the batch

image

Your data flow will have (at least) three steps:

  1. Read the values you want to load into MDS
  2. Add the BatchTag and ImportType column values (using a derived column transform)
  3. Load into the MDS staging table

image

As noted above, in your OLE DB Destination you’ll need to map your data flow columns to your member attributes (including Code if it’s not auto-generated), the BatchTag value (which can be automatically generated via expression), and the ImportType.

image

After the Data Flow, you’ll run the staging table stored procedure.

The first three parameters are required:

  1. The version name (i.e. VERSION_1)
  2. Whether this operation should be logged as an MDS transaction (i.e. do you want to record the change history, and make the change reversible?)
  3. The BatchTag value that you specified in your data flow

 

Additional resources:

SSIS, DQS and MDS Training at SQL Saturday #229 | Dublin, Ireland

I’m honored to be invited back to this year’s SQL Saturday Dublin event. I must have done a decent job last year on my SSIS Design Patterns pre-conference session, because I’ve been asked to do another one this time around as part of their Training Day (June 21st).

This year I’ll be doing a full day training session on Enterprise Information Management, which greatly expands upon one of my more popular talks about combining SSIS, DQS and MDS. The session will include some advanced SSIS topics (such as automation and dynamic package generation), and some of the main SSIS design patterns for data warehousing. The abstract is included below:

Enterprise Information Management (EIM) is an industry term for technologies to manage your data for integration, quality, and governance. This full day training session will show you how Integration Services (SSIS), Data Quality Services (DQS), and Master Data Services (MDS) work together to provide a comprehensive EIM solution in SQL Server 2012. Focusing on Data Warehousing scenarios, we’ll explore the key steps needed to build such a solution, from the beginning (planning, benchmarking), to data curation (cleansing, matching, managing dimensions), to data loading using SSIS. We’ll also touch on some advanced data warehousing ETL design patterns, such as change data capture (CDC), slowly changing dimension processing, and automation.

 

Course Modules

  • Data cleansing and matching
  • Reference data management
  • SSIS design patterns
  • Incremental data loading
  • SSIS deployment, management, and monitoring
  • Automation

Registration details can be found on the Prodata site.

The session schedule hasn’t been posted yet, but I see that a number of people submitted interesting SSIS sessions to the conference. I see a lot of big names from the SQL community, and expect there will be an awesome turnout (just like last time).

I submitted a brand new session entitled Cats, Facebook, and Online Dating with Microsoft BI that I hope will be accepted. I’ll show the SSIS demos I’ve done for the BI Power Hour (2011 | 2012), how I built them, and the lessons that I learned throughout. I’ll try not to make it too egotistical (“Hey look at me and how I made people laugh that one time!”), and try to provide content that attendees will find useful.

Oh, who I am kidding? I’ll totally be showing off Mr. Wiggles.

Hope to see you there!

Slides from Bringing Together SSIS, DQS and MDS | TechDays Hong Kong 2013

A big thank you to everyone who attended my 2nd session at the TechDays Hong Kong event, Enterprise Information Management (EIM): Bringing Together SQL Server Integration Services (SSIS), Data Quality Services (DQS) and Master Data Services (MDS) (wow that’s a long title). The slides are available below. The files for my integrated demo are available on my skydrive share.

Note, this session is very similar to the one I did at DevTeach Montreal and the 2012 PASS Summit (when I co-presented with Matthew Roche). I keep meaning to retire it, but I keep getting asked to present it. I think the integrated demo which shows how all of the products can work together really resonates well with people. If you are interested in this content, be sure to check out the other EIM session that Matthew Roche and I presented at TechEd North America 2012 – the slides are similar, but approaches the problem from the DQS/MDS perspective, rather than focusing on SSIS.

Presenting at Tech Days Hong Kong 2013

I’ve been invited to the Microsoft Tech•Days event in Hong Kong next month – March 12th to 14th. This will be my first trip to Hong Kong, and I’m really looking forward to presenting to an international audience. I’ll be doing three SSIS related sessions:

  • EIM – Bringing Together DQS, MDS and SSIS
  • SSIS Performance Design Patterns
  • Using SQL Server Integration Services with Oracle

The EIM session will be similar to the ones I presented recently at the DevTeach Montreal and TechEd North America 2012 events, with a new end to end scenario that I recently worked on. The SSIS Performance Design Patterns talk will cover some of the patterns from the book, along with some brand new patterns for SQL 2012. The final session will cover using SSIS with Oracle, which is something I’ve never presented on before. I’ll be talking about the new CDC functionality for getting data out of Oracle, as well as the different connectivity options for getting data into Oracle (such as the high speed connectors in 2008/2012).

 

image 

SQL Saturday #146 | Nashua, New Hampshire

sql_sat_nashua

I’ve been invited to speak at SQL Saturday #146 in Nashua in October. I’ll be presenting two talks related to Data Movement:

Efficient On-Premise to Cloud Data Transfer

Thinking about moving some of your operations to Azure? Have multiple remote sites, and want to use the cloud to centralize and share data between them?? Just like hearing talks about data transfer performance?! Have we got the session for you! We’ll cover some common user scenarios, and describe when and how to use the latest Microsoft data transfer technologies, including SQL Server Integration Services (SSIS), SQL Data Sync (a capability of SQL Database), and more.

EIM – Bringing Together SSIS, DQS and MDS

Enterprise Information Management (EIM) is an industry term for managing your data for data integration, quality, and governance. This session revolves around a demo which brings together the EIM functionality in SQL Server, a key part of our Credible, Consistent Data story for the 2012 release. We will show you how SQL Server Integration Services (SSIS), Data Quality Services (DQS), Master Data Services (MDS) and other Microsoft technologies work together to provide a comprehensive EIM solution.

Both are intermediate (200-300) level talks, and will cover multiple technologies. I’ll be using this as a venue to preview of some of the new content I’ll be presenting at the PASS  Summit in November. I’ll most likely end up in SSIS Design Pattern book co-author Andy Leonard’s (Blog | Twitter) two SSIS sessions, as well – I always love to see other people present SSIS topics, and Andy is one of the best SSIS presenters around.

The full schedule is now available on the event site. If you’re at the event, please come by and say hello! I’ll be happy to sign your ebook edition of my two SQL 2012 books.

The timing is perfect, as I’ll be presenting to the New England SQL Server User Group a couple of days before… Hope to see you at both events!