Andy Leonard will be joining me to present a Developing Extensions for SSIS webcast on May 22nd. It’s been a long time since I’ve had the chance to present a code heavy session like this, and I’m really looking forward to it. I’ll be using this session as a dry run for a talk I’ll be giving at the DevTeach conference in Toronto the following week, so feedback will be much appreciated.
Andy has a great blog series about creating custom SSIS Tasks, and his company (Linchpin People) recently released an SSIS Event Task. I plan on expanding on the content in Andy’s blog posts a little bit, and talk about some of the key things you need to know when building an SSIS Data Flow component.
The abstract for the talk is available below, and registration page can be found here.
Join Matt Masson and Andy Leonard for a discussion and demonstrations on extending SSIS with custom tasks and data flow components. This code heavy session walks you through the creation of a custom SQL Server Integration Services (SSIS) Task, and a custom Data Flow component. If you’ve ever thought of customizing SSIS functionality beyond what you can do with scripting, this talk is for you! All examples will be in coded in C#, and made available following the session.
Similar to localizing a task, you can localize the display name and description of a custom connection manager by setting the LocalizationType property of the DtsConnection attribute. SSIS will look for two properties on the class:
These must be public, static string properties on your LocalizationType class.
The code looks like this:
[DtsConnection(ConnectionType = "MYCONNECTION", LocalizationType = typeof(Strings))]
public class MyConnectionManager : ConnectionManagerBase
Question: I’ve created a custom destination component, but I’m not happy with the performance. I tried using multiple threads, but the pipeline buffer doesn’t allow free traversal. What can I do?
I’d start by profiling the component to make sure you identify the correct bottlenecks. The most common source of problems is accessing the COM interfaces at runtime – make sure that you keep a reference to any object retrieved through IDTSxxx100 members to avoid repeated COM interop calls. It’s a best practice to cache everything you need in the pre-execute phase.
If you’ve determined that the bottleneck is at the destination, and that multiple threads will improve performance, then you’ll need to make a copy of the data in the buffer before passing it to your destination. I did something similar in the old Data Services Destination sample on codeplex (main/DataServices/Src/Destination/Component.cs). The logic was:
- Loop through the buffer
- Copy row data into local object
- Add batch of data to a work item threadpool for insertion (or use the new Task Parallel Library)
- When loop completes, wait for all threads/work items to complete before returning from ProcessInput
In the sample I linked to above, a “batch” was a single row (as the destination only supported RBAR at the time), but you can configure this to be whatever batch size makes sense for your destination.
Note that step #4 is important, as the buffer object that is passed into ProcessInput can be destroyed/modified once the function completes. Waiting allows you to redirect error rows if insertions fail at the destination. The downside to waiting is that your destination component can’t start processing a new buffer until you’re “done” with the current one (i.e. have returned from ProcessInput). You can choose not to wait and return right after you’ve read all of the rows, but you must ensure that:
- You make a copy (a real copy – not just an object reference) of any data you need from the PipelineBuffer object
- When you receive EndOfRowset, make sure you complete all outstanding work items before returning from ProcessInput
The Data Services sample also contains a really useful class called ThreadPoolWait, which makes it easier to wait for threads to complete.
Hope that helps!