Feb
21

CDC Control Task Operations

The CDC Control Task is used for a number of different operations. This post lists each of those operations, and briefly describes when to use them.

CDC Control Task Editor

Mark initial load start

This operation is used when executing an initial load from an active database without a snapshot. It is invoked at the beginning of an initial-load package to record the current LSN in the source database before the initial-load package starts reading the source tables. A walkthrough of how this process works can be found in my CDC in SSIS for SQL Server 2012 post.

Mark initial load end

This operation is used when executing an initial load from an active database without a snapshot. It is invoked at the end of an initial-load package to record the current LSN in the source database after the initial-load package finished reading the source tables. This LSN is determined by recording the current time when this operation occurred and then querying the cdc.lsn_time_mapping table in the CDC database looking for a change that occurred after that time

Mark CDC start

This operation is used when then the initial load is made from a snapshot database database or from a quiescence database. It is invoked at any point within the initial load package. The operation accepts a parameter that can be a snapshot LSN, a name of a snapshot database (from which the snapshot LSN will be derived automatically) or it can be left empty, in which case the current database LSN is used as the start LSN for the change processing package. This operation is used as an alternative to the Mark Initial Load Start/End operations.

Get processing range

This operation is used in a change processing package before invoking the data flow that uses the CDC Source data flow. It establishes a range of LSNs that the CDC Source data flow reads when invoked. The range is stored in an SSIS package variable (StateVariable property) that is used by the CDC Source during data flow processing.

Mark processed range

This operation is used in a change processing package at the end of a CDC run (after the CDC data flow is completed successfully) to record the last LSN that was fully processed in the CDC run. The next time Get processing range is used, this position determines the start of the next processing range.

Reset CDC state

This operation is used to reset the persistent CDC state associated with the current CDC context. After this operation is run, the current maximum LSN from the LSN-timestamp sys.fn_cdc_get_max_lsn table becomes the start of the range for the next processing range. An example of when this operation is used is when you want to process only the newly created change records and ignore all old change records.

 

Notes

  • The following operations open a connection to the source system
    • Mark initial load start
    • Mark initial load end
    • Mark CDC start

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/cdc-control-task-operations/

Feb
17

Thoughts on Branching Strategies for SSIS Projects

A question came in recently from a customer about branching strategies for SSIS projects. Their development team has been using SSIS for years, but have not been using any form of source control software up until now. They felt that the package format improvements in SQL Server 2012, which were focused on making it possible to diff/merge package files, were a good motivation for them to adopt this best practice.

I gave the following high level guidance:

  • Treat your SSIS packages as if they were source files from a compiled language (i.e. C#)
  • Think of your SSIS project as an “Application” (I frequently refer to SSIS projects as “Data Integration Applications” in my talks)
  • There is a lot of guidance out there on software development lifecycle best practices, most of which applies to SSIS development
    • It may take some experimenting to determine what methodology works best for your project and team

I’ve always liked Jamie Thomson’s posts about his experiences with SSIS Team Development (as well as general best practices and naming conventions that apply here as well). The changes we made in SQL 2012 alleviate the need for some of these best practices, but many of them still apply.

In regards to specific branching strategies, it’s common to have three core branches.

Development

This is the branch developers work out of. Developers are responsible for testing changes, and making sure the project is fully functional in this branch. If a developer is making significant changes that would be disruptive to everyone else, or cause the project to be broken for any amount of time, then you should create a sub branch for them to work out of. This branch should be regularly merged (weekly?) to the Integration branch so that issues not caught during developer testing can be identified early.

Integration

The builds from this branch go up to your test environment. This is where you’d do signoffs before the project is moved to production. Most changes coming into this branch should be coming from Development – the only time you’d directly check something in here is during the stabilization period (and this should be rare). Once QA have signed off on the release, you’d merge this to your Main branch.

Main

This is what gets released to production. You should use build tags to identify specific versions/releases. Nothing should be checked directly into this branch without first going through Integration testing.

Note, this branching strategy assumes a single team/project. If you have multiple teams/projects, you need to decide if you are sharing a common Main, or if everyone goes into the same Integration branch (and whether there are multiple levels of Integration). Within the SQL org, for example, we’d have a “Release” branch past Main that we actually do the external builds from (it also allows us to do servicing builds post-RTM).

(The actual SQL branching strategy is a lot more complex than that, but this is the basic idea!)

Are you currently working with SSIS in team environment? Are you using source control? I’d love to hear any recommendations you might have.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/thoughts-on-branching-strategies-for-ssis-projects/

Feb
16

Speaking at SQL Saturday #111 | Atlanta

Wow, what a lineup for SQL Saturday #111. I’m still amazed and honored anytime I’m on a speaker list of this caliber!

I’ll be co-presenting a pre-con event (full day training) with Andy Leonard on April 13th, and talking about SSIS in SQL Server 2012 as part of the main event. Rumor has it that I might even get to co-present with Patrick LeBlanc, who has been working hard on High Availability Solutions for SSIS in SQL 2012. It’s all very exciting – I hope to see you there!


If you can’t make it to Atlanta, you can also see me for SSIS Performance Design Pattern training at SQL Saturday #105 in Dublin, and SQL Bits X in London.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/speaking-at-sql-saturday-111-atlanta/

Feb
15

Exit Codes, DTEXEC, and SSIS Catalog

DTEXEC has two execution modes when running packages in the SSIS Catalog.

By default, the process runs asynchronously – the package execution is scheduled on the server, and DTEXEC returns immediately. DTEXEC does not wait for the package execution to complete. In this case, %ERRORLEVEL% would represent whether the package execution was created successfully, and will not reflect the result of the execution.

To get the result of the package execution, you can set the $ServerOption::SYNCHRONIZED property to True. In this mode, DTEXEC will wait for the package execution to complete. If the execution was successful, the %ERRORLEVEL% will be set to 0. If it failed, it will be set to 1.

See the examples below – the first execution runs a package (2-RowCounts.dtsx) on my local instance, and completes successfully. The second execution runs a package (6-ErrorContext.dtsx) that will fail.

 

C:\>dtexec /ISserver \SSISDB\Demo\MyCatalogProject\2-RowCounts.dtsx /Par "$ServerOption::SYNCHRONIZED(Boolean)";True
Microsoft (R) SQL Server Execute Package Utility
Version 11.0.XX.XX for 64-bit
Copyright (C) Microsoft Corporation. All rights reserved.

Started:  9:49:16 AM
Execution ID: 11.
To view the details for the execution, right-click on the Integration Services Catalog,
and open the [All Executions] report
Started:  9:49:16 AM
Finished: 9:49:22 AM
Elapsed:  6.015 seconds

C:\>echo %ERRORLEVEL%
0

C:\>dtexec /ISserver \SSISDB\Demo\MyCatalogProject\6-ErrorContext.dtsx /Par "$ServerOption::SYNCHRONIZED(Boolean)";True
Microsoft (R) SQL Server Execute Package Utility
Version 11.0.XXX.XX for 64-bit
Copyright (C) Microsoft Corporation. All rights reserved.

Started:  9:49:43 AM
Package execution on IS Server failed. Execution ID: 12, Execution Status:4.
To view the details for the execution, right-click on the Integration Services Catalog,
and open the [All Executions] report
Started:  9:49:43 AM
Finished: 9:49:49 AM
Elapsed:  6.235 seconds

C:\>echo %ERRORLEVEL%
1

For more details on the new DTEXEC command line parameters for the SSIS Catalog, see:

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/exit-codes-dtexec-and-ssis-catalog/

Feb
13

Queries for Data Flow Component Performance in SQL 2012

The built-in SSIS Catalog reports in SSMS show a breakdown of task and component performance of a package execution when you run with a logging level of “Performance”.

SSIS Catalog package performance report

For a more detailed or customized view, you can query the catalog tables directly. The key table to get data flow performance information is [catalog].[execution_component_phases]. Here are two sample queries – the first returns the total time spent on each component (similar to what is displayed in the performance report), and the second returns the times broken down by phase.

 

--
-- Determine the execution id for the last package execution
--

DECLARE @execution_id bigint;
SET @execution_id = (SELECT MAX(execution_id) from [catalog].execution_component_phases)

--
-- Break down time spent per transform
--

SELECT	package_name as PackageName,
        subcomponent_name as Transform,
        SUM(DATEDIFF(ms,start_time,end_time)) as ActiveTime
FROM	[catalog].execution_component_phases
WHERE   execution_id = @execution_id
GROUP BY package_name, task_name, subcomponent_name, execution_path
ORDER BY ActiveTime DESC

--
-- Break down transforms per phase
--

SELECT  package_name as PackageName,
        subcomponent_name as Transform,
        phase as Phase,
        SUM(DATEDIFF(ms,start_time,end_time)) as ActiveTime
FROM	[catalog].execution_component_phases
WHERE   execution_id = @execution_id
GROUP BY package_name, task_name, subcomponent_name, execution_path, phase
ORDER BY ActiveTime DESC

Running these queries returns something like this:

image

Some notes:

  1. You need to run the package with a Performance logging level to capture this data
  2. You can set a default logging level for the Catalog
  3. The query above determines the execution_id for the latest run – you will need to change this value to look at the results for specific package executions

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/queries-for-data-flow-component-performance-in-sql-2012/

Jan
30

Retrieving the Value of a Sensitive Parameter in a Script

SQL Server 2012 introduces Sensitive Parameters. These parameter values are masked in the UI, protected according to the ProtectionLevel of your package/project when saved to disk, and encrypted when deployed to the SSIS Catalog. When using SSIS Expressions, sensitive parameters can only be assigned to sensitive properties (such as a connection manager’s Password property).

You may wish to use a sensitive parameter in a script (for example, if you’re doing FTP tasks using a Script Task). Retrieving a sensitive parameter’s value like any other variable/parameter will result in an error at runtime.

For example, the following ScriptTask code to retrieve the value for the “FtpPassword” parameter:

        public void Main()
        {
            try
            {
                string value = Dts.Variables["$Package::FtpPassword"].Value.ToString();
                MessageBox.Show("My super secret sensitive value is: " + value);

                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception e)
            {
                Dts.Log(e.Message, 0, null);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }

Results in the following exception at runtime:

 

Error: Accessing value of the parameter variable for the sensitive parameter "<XXX>" is not allowed. Verify that the variable is used properly and that it protects the sensitive information.

 

The correct way to retrieve the value is by using the GetSensitiveValue() function. The following ScriptTask code will return the value successfully.

        public void Main()
        {
            try
            {
                string value = Dts.Variables["$Package::FtpPassword"].GetSensitiveValue().ToString();
                MessageBox.Show("My super secret sensitive value is: " + value);

                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception e)
            {
                Dts.Log(e.Message, 0, null);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }

Note: Once you retrieve the value out of the sensitive parameter, you are responsible for ensuring it is securely handled. Be careful of unintentionally exposing the actual value through logging. See the Handling Passwords entry in books online for more information.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/01/retrieving-the-value-of-a-sensitive-parameter-in-a-script/

Jan
25

SSIS Quick Tip: Copy & Paste Packages into a Visual Studio Project

The Add Existing Package option for SSIS projects is very helpful, but has one major limitation. The dialog only allows you to add a single package at a time. When working with a lot of packages, this can be a time consuming process.

image

Thankfully, there is a shortcut. You can add multiple packages to a project using Copy & Paste.

  1. Select the packages you want to add to your projects in Windows Explorer, and Copy (Ctrl-C)
  2. Switch to Visual Studio
  3. Select the Project node
  4. Paste (Ctrl-V)

The packages will be added to the SSIS project (and copied to the project directory).

image

Note: This works in all versions of SSIS.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/01/ssis-quick-tip-copy-paste-packages-into-a-visual-studio-project/

Jan
17

SQL Bits X Schedule has been Posted

The SQL Bits X schedule has been posted and I am very impressed with the lineup! Most conferences will have 3-4 tracks running in parallel – Bits will have 8 sessions to choose from in most timeslots, offering you a wide variety of options.

Friday schedule
Saturday (Community Day) schedule

More details about the pre-con training event I’m doing can be found here.

Here are some of the SSIS related sessions from the conference:

Permanent link to this article: http://www.mattmasson.com/index.php/2012/01/sql-bits-x-schedule-has-been-posted/

Jan
16

Speaking at SQL Saturday #105 in Dublin, Ireland

I’m honored to be presenting at SQL Saturday #105 in Dublin, Ireland on March 24th. They recently added a 4th session track to keep up with demand, and I was more than happy to fill in one of the timeslots with a session about the developer experience enhancements in SSIS 2012.

I’ll also be covering SSIS Performance Design Patterns in a full day training session the day before the event. The content will be similar to the pre-con session I’m doing the following week at SQL Bits X, with a couple of changes – I’ll be presenting solo, and will spend the last portion of the day covering the enhancements we made to SSIS in SQL Server 2012.

Registration information and details are available here.

SQL Saturday #105

Permanent link to this article: http://www.mattmasson.com/index.php/2012/01/speaking-at-sql-saturday-105-in-dublin-ireland/

Jan
13

SSIS Quick Tip: Required Parameters

SQL Server 2012 introduces the new Parameter model. Setting a parameter’s Required property to True means that a value must be supplied at runtime by the caller of the package (dtexec, SSMS, SQL Agent, the Execute Package Task, etc). This means that although the parameter has a default value at design time, it will never be used once the project is deployed.

Example

I have a package which loads a single flat file into a destination table. I’ve defined an InputFile parameter for the path of the file I want to load, and use it to set the ConnectionString of my Flat File Connection Manager. For my package to validate, I need to give it a default value (which points to a sample data file I can use for testing).

image

Setting Required to True for this parameter ensures that I won’t accidentally use it after deploying it to my test and production environments, as the package won’t run until I specify a new value.

image

This will hopefully prevent some of the painful-yet-amusing-because-they-didn’t-happen-to-me type stories I’ve heard over the years, like:

  • Having your ETL processes running for a week before you realize that you forgot to update a connection string, and have been updating the test data warehouse with real customer data
  • Loading sample data files that developers use for testing into your production data warehouse, causing such products as “YOUR MOM” and “I HATE THIS JOB” to show up on the company wide weekly sales report

Permanent link to this article: http://www.mattmasson.com/index.php/2012/01/ssis-quick-tip-required-parameters/

Older posts «