Webcast: Developing Extensions for SSIS | 2013-05-22

Andy Leonard will be joining me to present a Developing Extensions for SSIS webcast on May 22nd. It’s been a long time since I’ve had the chance to present a code heavy session like this, and I’m really looking forward to it. I’ll be using this session as a dry run for a talk I’ll be giving at the DevTeach conference in Toronto the following week, so feedback will be much appreciated.

Andy has a great blog series about creating custom SSIS Tasks, and his company (Linchpin People) recently released an SSIS Event Task. I plan on expanding on the content in Andy’s blog posts a little bit, and talk about some of the key things you need to know when building an SSIS Data Flow component.

The abstract for the talk is available below, and registration page can be found here.

 

Join Matt Masson and Andy Leonard for a discussion and demonstrations on extending SSIS with custom tasks and data flow components. This code heavy session walks you through the creation of a custom SQL Server Integration Services (SSIS) Task, and a custom Data Flow component. If you’ve ever thought of customizing SSIS functionality beyond what you can do with scripting, this talk is for you! All examples will be in coded in C#, and made available following the session.

How to Localize Your Custom Connection Manager

Similar to localizing a task, you can localize the display name and description of a custom connection manager by setting the LocalizationType property of the DtsConnection attribute. SSIS will look for two properties on the class:

  • ConnectionDescription
  • ConnectionDisplayName

These must be public, static string properties on your LocalizationType class.

The code looks like this:

DevTeach Toronto – May 27-31

I’ll be heading to Toronto in May to present at the DevTeach conference. I’ll be doing an intermediate (300) level talk about DQS, and an advanced (400 level) talk about developing custom extensions for SSIS (tasks, components, connection managers, and loggers).

Data Quality Services (DQS) – End to End
Microsoft’s SQL Server Data Quality Services (DQS) is a unique solution that is based on the creation and maintenance of Data Quality Knowledge Bases (DQKB) and the ability to use them efficiently for a variety of Data Quality improvements. In this session we’ll walk the creation of a DQS solution, discuss the main concepts behind the creation of the DQKB, and how to use DQS in various scenarios and activities.

Developing Extensions for SSIS
This code heavy session walks you through the creation of a custom SQL Server Integration Services (SSIS) Task, and a custom Data Flow component. If you’ve ever thought of customizing SSIS functionality beyond what you can do with scripting, this talk is for you! All examples will be in coded in C#, and made available following the session.

 

 

image

Encrypting Sensitive Properties in a Custom Connection Manager

SSIS provides a mechanism to automatically encrypt the sensitive properties of your Connection Manager (or Task) based on the package ProtectionLevel. To make use of this in your custom code, you’ll need to do two things:

  1. Implement the IDTSComponentPersist interface
  2. Add a Sensitive=”1” attribute to one or more elements of your persisted Connection Manager (or Task)

Sample code:

How to Localize Your Custom Task

The DtsTask attribute has a LocalizationType member. This should be your resource class. SSIS will look for two properties:

  • TaskDisplayName
  • TaskDescription

These must be public, static string properties on your “LocalizationType” class.

Your task code would look like this:

You can add localizable string resources to your class from the Project settings:

clip_image001

Note, the Access Modifier must be set to Public.

Double note (as I hit this when putting together the blog post) – in SQL 2012, when you first add a new task to your <sql>110DTSTasks directory, you can right click on the SSIS Toolbox and select Refresh Toolbox to make it show up.

clip_image002

If your LocalizationType settings are incorrect (for example, you forgot to set the Access Modified to Public), your task will show up with an empty name. You must fix your class and rebuild, but you’ll need to restart Visual Studio before the changes take effect (as the class has been instantiated, SSIS will have the old object cached…).

How Can I Improve the Performance of my Destination Component

Question: I’ve created a custom destination component, but I’m not happy with the performance. I tried using multiple threads, but the pipeline buffer doesn’t allow free traversal. What can I do?

 

I’d start by profiling the component to make sure you identify the correct bottlenecks. The most common source of problems is accessing the COM interfaces at runtime – make sure that you keep a reference to any object retrieved through IDTSxxx100 members to avoid repeated COM interop calls. It’s a best practice to cache everything you need in the pre-execute phase.

If you’ve determined that the bottleneck is at the destination, and that multiple threads will improve performance, then you’ll need to make a copy of the data in the buffer before passing it to your destination. I did something similar in the old Data Services Destination sample on codeplex (main/DataServices/Src/Destination/Component.cs). The logic was:

  1. Loop through the buffer
  2. Copy row data into local object
  3. Add batch of data to a work item threadpool for insertion (or use the new Task Parallel Library)
  4. When loop completes, wait for all threads/work items to complete before returning from ProcessInput

In the sample I linked to above, a “batch” was a single row (as the destination only supported RBAR at the time), but you can configure this to be whatever batch size makes sense for your destination.

Note that step #4 is important, as the buffer object that is passed into ProcessInput can be destroyed/modified once the function completes. Waiting allows you to redirect error rows if insertions fail at the destination. The downside to waiting is that your destination component can’t start processing a new buffer until you’re “done” with the current one (i.e. have returned from ProcessInput). You can choose not to wait and return right after you’ve read all of the rows, but you must ensure that:

  1. You make a copy (a real copy – not just an object reference) of any data you need from the PipelineBuffer object
  2. When you receive EndOfRowset, make sure you complete all outstanding work items before returning from ProcessInput

The Data Services sample also contains a really useful class called ThreadPoolWait, which makes it easier to wait for threads to complete.

Hope that helps!