Advanced SSIS Catalog presentation from TechEd North America 2013

Matthew Roche (blog | twitter) and I teamed up once again to present an advanced SSIS Catalog session at TechEd North America 2013 – Deep Inside the Microsoft SQL Server Integration Services Server. The video and slide deck are now available on the Channel9 site. The slide deck actually contains 10 additional slides that we didn’t have time to cover during the regular session (with some further details about the security model).

I want to extend a big thank you to everyone who attended, and to all the great feedback we got. It can be tough doing a 400 level SQL session at TechEd, and while I could see some people’s heads spinning, it sounded like most people were able to learn something new.

The TechEd team picked an excellent preview picture for the session (below). It comes from Matthew’s intro – you’ll have to watch the video to see how he worked a picture of kittens into a 400 level SSIS session.


If you didn’t already know, Channel 9 has many TechEd presentations available online. You can see recordings of my previous events on my speaker page, and Matthew’s as well.

Can I Automate SSIS Project Deployment?

Yes, yes you can. Scripted or automated deployment can be done in a number of ways in SQL Server 2012.

Integration Services Deployment Wizard

Yes, this is the primary UI tool for SSIS project deployment, but it can also be run silently on the command line. When you run through the wizard, the Review page will actually list all of the parameters you need to do the same deployment from the command line.


Run ISDeploymentWizard.exe /? from a command prompt, and you’ll see the full list of arguments it supports.


The SSISDB [catalog] schema has a number of public stored procedures, including one that can be used for deployment. We even provide samples on how to use it in Books Online.


All SSIS Catalog operations can be automated through PowerShell. I previously blogged about a deployment script I use to setup my demos.

Custom Utility

The SSIS Catalog management object model (MOM) exposes a set of SMO classes you can use to code your own catalog utilities. You’ll want to use the CatalogFolder.DeployProject method to do the actual deployment. If SMO or .NET isn’t your thing, you can also code a custom utility which interacts directly with the T-SQL API.

Data Flow Optimization Checklist


I put this together for an SSIS performance presentation I did a while ago, and thought it warranted its own post.

In addition to looking for appropriate Design Patterns, I have a checklist of things to look for when trying to optimize an SSIS Data Flow. This list is by no means extensive, but it contains some of the more common causes of performance issues in the packages I’ve worked with.

OLE DB command operates on a row by row basis. Stage the data and perform the operation in a batch mode using the Execute SQL Task instead.

Learn MERGE, and love it.

Sometimes you do. Most of the time you don’t. If you do need them (i.e. you are loading from a flat file, and need to do a merge), considering maxing out the DefaultBufferMaxRows setting so you can sort more data in memory (sort will keep at most 5 active buffers in memory before it starts swapping to disk).

  • Lookup Transforms
    • Can the lookup be avoided?
    • Are you using the right cache mode?
    • Should you use a Cache Connection Manager?

Lots of related lookup posts can be found here.


Have fun optimizing!

Webcast: Developing Extensions for SSIS | 2013-05-22

Andy Leonard will be joining me to present a Developing Extensions for SSIS webcast on May 22nd. It’s been a long time since I’ve had the chance to present a code heavy session like this, and I’m really looking forward to it. I’ll be using this session as a dry run for a talk I’ll be giving at the DevTeach conference in Toronto the following week, so feedback will be much appreciated.

Andy has a great blog series about creating custom SSIS Tasks, and his company (Linchpin People) recently released an SSIS Event Task. I plan on expanding on the content in Andy’s blog posts a little bit, and talk about some of the key things you need to know when building an SSIS Data Flow component.

The abstract for the talk is available below, and registration page can be found here.


Join Matt Masson and Andy Leonard for a discussion and demonstrations on extending SSIS with custom tasks and data flow components. This code heavy session walks you through the creation of a custom SQL Server Integration Services (SSIS) Task, and a custom Data Flow component. If you’ve ever thought of customizing SSIS functionality beyond what you can do with scripting, this talk is for you! All examples will be in coded in C#, and made available following the session.

SSIS, DQS and MDS Training at SQL Saturday #229 | Dublin, Ireland

I’m honored to be invited back to this year’s SQL Saturday Dublin event. I must have done a decent job last year on my SSIS Design Patterns pre-conference session, because I’ve been asked to do another one this time around as part of their Training Day (June 21st).

This year I’ll be doing a full day training session on Enterprise Information Management, which greatly expands upon one of my more popular talks about combining SSIS, DQS and MDS. The session will include some advanced SSIS topics (such as automation and dynamic package generation), and some of the main SSIS design patterns for data warehousing. The abstract is included below:

Enterprise Information Management (EIM) is an industry term for technologies to manage your data for integration, quality, and governance. This full day training session will show you how Integration Services (SSIS), Data Quality Services (DQS), and Master Data Services (MDS) work together to provide a comprehensive EIM solution in SQL Server 2012. Focusing on Data Warehousing scenarios, we’ll explore the key steps needed to build such a solution, from the beginning (planning, benchmarking), to data curation (cleansing, matching, managing dimensions), to data loading using SSIS. We’ll also touch on some advanced data warehousing ETL design patterns, such as change data capture (CDC), slowly changing dimension processing, and automation.


Course Modules

  • Data cleansing and matching
  • Reference data management
  • SSIS design patterns
  • Incremental data loading
  • SSIS deployment, management, and monitoring
  • Automation

Registration details can be found on the Prodata site.

The session schedule hasn’t been posted yet, but I see that a number of people submitted interesting SSIS sessions to the conference. I see a lot of big names from the SQL community, and expect there will be an awesome turnout (just like last time).

I submitted a brand new session entitled Cats, Facebook, and Online Dating with Microsoft BI that I hope will be accepted. I’ll show the SSIS demos I’ve done for the BI Power Hour (2011 | 2012), how I built them, and the lessons that I learned throughout. I’ll try not to make it too egotistical (“Hey look at me and how I made people laugh that one time!”), and try to provide content that attendees will find useful.

Oh, who I am kidding? I’ll totally be showing off Mr. Wiggles.

Hope to see you there!

Using SSIS Term Extraction to Process Search Terms

This blog uses WordPress, and I recently started using the Jetpack plugin. One of the features that Jetpack provides is a list of search terms that people used to arrive at your site. It doesn’t account for all views for your site (or even the majority of them), but it is interesting to see what people are looking for when they come to your site.

Yes, according to the site stats, most people are searching for my name when they arrive to my site. I’m not sure how I should feel about that.

From these full search queries, I wanted to see what the most common terms were. This blog posts describes how I did this using the SSIS Term Extraction transform.

Extracting the Data

The first step was to actually get the list of search terms from my blog’s dashboard. When dealing with data on a webpage, I typically turn to Excel, and figured I could make use of the new Data Explorer add-in. Unfortunately, since my blog requires you to be logged in, Data Explorer ends up getting (silently) redirected to the login page, with no way to proceed. I had hit this issue before, so I tried to use the regular “From Web” functionality in Excel. This let me log into the site, but unfortunately, Excel didn’t recognize the data on the page as a table, which prevented me from creating a refreshable web query. Alas, I had to resort to the low tech approach of Copy &Paste, but it did the job.

I copy and pasted the data from the web page into Excel, and saved as CSV.


Why CSV and not .xls/.xlsx? Simplicity. One advantage of reading from Excel is that you get some metadata and limited query support from the Access Database Engine provider (Jet/Ace). However, the intricacies of working with Excel typically make me prefer a straight CSV document. Unless I need to use a refreshable query, CSV is my preferred format for all data imports (it’s faster, too).

Processing with SSIS

I created a new SSIS package, added a flat file connection manager, and pointed it at my recently created CSV file. I specified the Text qualifier (Excel uses ” (double quote) by default), and marked that the first row contains headers.


On Advanced tab, I set the data types (DT_WSTR(500) and DT_I4).

Note, even though the text values in the CSV are DT_STR (ansi strings – no special characters), I used DT_WSTR because that is what the Term Extraction transform supports. Alternatively, we could have used a Data Convert or Derived Column transform to perform the conversion, but I’d rather do it directly in the Source component.

I then added a Data Flow Task, Flat File Source which used my new connection manager, and a Term Extraction transform. There’s not much configuration involved with the term extraction – you select the column you want to analyze, and (optionally) provide names for the output columns.


On the Advanced tab, I changed the Frequency threshold to 1 and Maximum length of term to 20. Changing these values can affect performance of the component, but since we’re processing a limited number of rows (about 500 total), I figured it was safe to do.


The Term Extraction transform redirects error rows by default, so you’ll get a warning if you don’t configure the output path. Instead, I configured the error output to Fail Component on error since we’re not expecting to have any errors in our list of simple terms.


I then closed off the data flow to see the results with a data viewer.


Running the package gave me a break down of the common terms.


The more the term appears in the input, the higher the score will be. This is really useful, but unfortunately it’s not completely representative of the actual search terms since the values had already been aggregates (the “Views” value from the original data set). To get a more accurate number, I had to expand the results before processing them.

Expanding the Search Results with a Script Component

By default Script Components are synchronous – among other things, this means they will have the same number of rows going in as they have coming out. To expand the values counts correctly, we’ll need to add more rows to the data flow. This means I’ll need to make the Script Component Asynchronous.

I added a Script Component (transform) right after my Flat File Source, and opened the editor.

I selected both input columns.


I went to the Inputs and Outputs tab, selected the single output, and set the Synchronous ID value to None.

I added a column for the Term (DT_WSTR(500)).


Open the script editor to start entering the code.

The code is pretty simple. Using an Asynchronous Script Component isn’t that different from a Synchronous one – you just need to remember that you’re adding rows in a new buffer, rather than changing values in the current one. The only method I needed to worry about is the one that processes the incoming rows – Input0_ProcessInputRow.

For each incoming row, I want to look at the term (Search) and the number of times it appears (Views). I then programmatically add a row which contains the term to the output buffer for each time it was viewed.

Running the package after this change, I see that I have a lot more rows going into the Term Extraction transform then I did before (1043 vs. 498), and more accurate scores as a result.


That’s it – thanks for reading!

Consistent Design Patterns and SSIS Frameworks

Question: How do I make sure everyone at my company is using the same SSIS framework and design patterns?

SSIS Design Patterns and frameworks are one of my favorite things to talk (and write) about. A recent search on SSIS frameworks highlighted just how many different frameworks there are out there, and making sure that everyone at your company is following what you consider to be best practices can be a challenge.

In an ideal scenario, before you do any SSIS development, your team has an experienced SSIS consultant/trainer come in. They provide you with a framework that works for your environment and establishes a set of best practices that your team sticks to for all future package development, until the end of time.

This rarely happens.

It’s Too Easy

SSIS has a learning curve, but is fairly easy to get started with. You can throw together a simple workflow & ETL running on your local machine in a short amount of time. It starts to get a bit more complicated when you need to deploy that solution to another machine, and even more so when you start adding multiple developers to the mix. This is where the best practices start to be required.

The downside to SSIS being easy enough for people to learn on their own is many people come up with their own way of doing things. Given that there are usually multiple ways to accomplish common ETL tasks, your company can end up with a mix of “best practices”, “frameworks” and ways of using SSIS.

Did you know there are at least 5 different SSIS frameworks used internally at Microsoft? Each one has its own take on how to do configuration, dynamic package execution/generation, and logging. I wouldn’t be surprised if there are a number of others that I’m not aware of.

We improved this in SQL 2012 with the addition of the SSIS Catalog (SSISDB). In 2005/2008, the biggest differences between SSIS projects would typically be the way they do deployment & configuration. The Catalog was meant to provide a “default” best practice, and remove the need to rely on their own custom frameworks. There’s still more we could do here, but given the adoption rates we’re seeing, it was definitely a step in the right direction.

When we asked our MVPs to create whitepapers for SQL 2008, we ended up with four configuration related articles. Each one does things in a slightly different way.

So the question becomes, how do I make sure that the SSIS developers in my company are following best practices and using the framework we have in place?

Training plays a big role, here. If you show your developers that there is a better way of doing things, they should be willing to adopt it. That said, many times you’ll find consistency is more important than doing things in a slightly better way. I know this sounds ironic coming from someone who worked on the product (*COUGH*three separate data type systems*COUGH*), but I give this advice to other people, speaking from experience.

Tools to Help Apply Best Practices

Tools can really help here, as well. There are two third party tools that comes to mind:

BI xPress from Pragmatic Works has many different features that help here. The Auditing Framework helps with consistent logging, and their Templates, Snippets and Data Flow Nuggets functionality lets you apply and reuse patterns across your packages. They also recently added a Best Practices Analyzer that helps you maintain some consistency.

If you’re looking to apply design patterns and/or you require dynamic package generation, then Biml from Varigence is a great solution. It’s a powerful markup language/compiler that makes it really easy to create your SSIS packages in a consistent way. There is a free version of Biml integrated into BIDS Helper, and a more advanced version (and separate designer – Mist) available from Varigence. There is also a Biml repository site that is great for examples.

Presenting to the .NET Montreal community group | May 27th

The fine folks over at Communauté .NET Montréal have invited me to speak at their meeting on May 27th, 2013. I’ll be using this as a practice run for my sessions at the DevTeach Toronto event later that week. Come join us at the Microsoft Montreal office if you’re interested in finding out more about:

Data Quality Services (DQS) – End to End
Microsoft’s SQL Server Data Quality Services (DQS) is a unique solution that is based on the creation and maintenance of Data Quality Knowledge Bases (DQKB) and the ability to use them efficiently for a variety of Data Quality improvements. In this session we’ll walk through the creation of a DQS solution, discuss the main concepts behind the creation of the DQKB, and how to use DQS in various scenarios and activities.


Developing Extensions for SSIS
This code heavy session walks you through the creation of a custom SQL Server Integration Services (SSIS) Task, and a custom Data Flow component. If you’ve ever thought of customizing SSIS functionality beyond what you can do with scripting, this talk is for you! All examples will be in coded in C#, and made available following the session.


Registration details can be found here.

Using Data Explorer with SSIS

Data Explorer is a great new add-on for Excel. It provides a simple interface over a power data transformation engine (powered by M), and it has become my tool of choice anytime I need to get data into Excel. In this post I show how you can use Data Explorer to do some initial data shaping, and then automate the data loading process through SSIS.

In a previous post, I showed how I used Data Explorer to grab the list of upcoming SQL Saturday events. When the Data Explorer query is executed, the data is downloaded, transformed, and stored in the Excel sheet.


We can read this data in SSIS using a Data Flow Task with an Excel Source component.

After we add the Excel Source, double click to bring up the component UI. We can create a new connection manager, and point it at the saved Excel file (.xlsx).


Note that even if you use Office 2010 or 2013 to create the file, you should select the “Microsoft Excel 2007” version (which was when the .xlsx format was introduced, replacing the .xls binary format).

On the Excel Source, we can now select “Table or view” as the data access mode. The drop down list will contain all of the sheets within our workbook. We can either select the name of the Sheet (in this case, Data), or the data query (Data$ExternalData_1). Since the result of the query is the only thing on this sheet, it will come out to the same thing in this instance.


On the Mapping tab, we can see the four columns from the workbook have been read.


We can click OK to save the settings, and then we can setup the rest of the data flow. Running the package should show us that all the rows have been read.


(Yes, that is a custom made Trash destination. I am pretty fancy.)

As we see, we can grab the data from the workbook using SSIS. The problem is that this data is a static copy – reading the data doesn’t automatically re-run the Data Explorer query that we used in the workbook. Thankfully, we can workaround that using a Script Task – see my previous post about dynamically refreshing a workbook. Go ahead and ready that post real quick – I’ll wait.

The refresh code in my previous post updates the excel file in place. If we watch the Task Manager, we can see the EXCEL.EXE process runs in the background.


Important – if you run the package with the refresh code added and receive a COMException – {“Class not registered”} error, you’re probably missing the Data Explorer add-in. Data Explorer uses a custom OLEDB provider internally, which requires you to install the add-in on the machine that is running your SSIS package.

If we open the file after the package runs, we should see the updated values from the refreshed query.


Closing Thoughts

Anyone familiar with SSIS will see some overlap between the transformations you can do in Data Explorer, and what you can do in the SSIS Data Flow. At this point Data Explorer is tool for getting data into Excel, and not what most people would consider a full fledged ETL platform. However, underneath the fancy, easy to use interface, is a powerful transformation engine (driven by the M language), and I can see a lot of potential applications for it. While the approach I describe above does get the products working together, I can imagine that closer integration between the products would be coming at some point in the future – but that’s pure speculation on my part.

What do you think? How would you like to see the products integrated?

Error: The Microsoft Access database engine cannot open or write to the file

Issue – you run an SSIS package that uses an Excel Source or Excel Destination and get the following error:

[Connection manager "Excel Connection Manager"] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.

An OLE DB record is available. Source: "Microsoft Access Database Engine" Hresult: 0x80004005 Description: "The Microsoft Access database engine cannot open or write to the file ”. It is already opened exclusively by another user, or you need permission to view and write its data.".

Steps to resolve:

  1. Verify that the file isn’t actual open (check task manager for suspicious EXCEL.EXE processes)
  2. Verify that there are no other SSIS packages (or other background processes) that might be opening this file at the same time
  3. If you are in Visual Studio when this error happens, restart VS and try again

The third issue typically happens if you click the Preview button on the component editor. This ends up loading the Excel file through the Access database engine, and can keep a reference to the file open until the designer is restarted.