Mar
07

AlwaysOn Multi-Subnet Failover and SSIS

Updated 2012-05-11: Updated the .NET Framework update link to point to 4.0.3, as it has some additional updates for AlwaysOn.

If you plan on using Multi-Subnet Failover Clustering, note that the SQL Native Client 11.0 OLEDB provider does not support the new connection string key words (MultiSubnetFailover=True). Instead you should use the ADO.NET or ODBC Source and Destination components.

Details can be found in the SQL Server 2012 Release Notes.

MultiSubnetFailover=True Not Supported by NET Framework 3.5 or OLEDB

If your Availability Group or Failover Cluster Instance has a listener name (known as the network name or Client Access Point in the WSFC Cluster Manager) depending on multiple IP addresses from different subnets, and you are using either ADO.NET with .NET Framework 3.5SP1 or SQL Native Client 11.0 OLEDB, potentially, 50% of your client-connection requests to the availability group listener will hit a connection timeout.

Workarounds: We recommend that you do one of the following tasks.

  • If do not have the permission to manipulate cluster resources, change your connection timeout to 30 seconds (this value results in a 20-second TCP timeout period plus a 10-second buffer).
  • Pros: If a cross-subnet failover occurs, client recovery time is short.
  • Cons: Half of the client connections will take more than 20 seconds.
  • If you have the permission to manipulate cluster resources, the more recommended approach is to set the network name of your availability group listener to RegisterAllProvidersIP=0. For more information, see “Sample PowerShell Script to Disable RegisterAllProvidersIP and Reduce TTL”, later in this section.
  • Pros: You do not need to increase your client-connection timeout value.
  • Cons: If a cross-subnet failover occurs, the client recovery time could be 15 minutes or longer, depending on your HostRecordTTL setting and the setting of your cross-site DNS/AD replication schedule.

Driver support summary for AlwaysOn:

 

Driver Multi-Subnet Failover ApplicationIntent Read-Only Routing Multi-Subnet Failover: Faster Single Subnet Endpoint Failover Multi-Subnet Failover: Named Instance Resolution For SQL Clustered Instances
SQL Native Client 11.0 ODBC Yes Yes Yes Yes Yes
SQL Native Client 11.0 OLEDB No Yes Yes No No
ADO.NET with .NET Framework 4.0 Update 4.0.3 Yes Yes Yes Yes Yes
ADO.NET with .NET Framework 3.5 No Yes Yes Yes Yes
Microsoft JDBC driver 4.0 for SQL Server Yes Yes Yes Yes Yes

 

As mentioned on the SSIS Team Blog, the SQL Native Client OLEDB provider is being deprecated, but still provides the best interoperability with SSIS components in the SQL Server 2012 release.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/03/alwayson-multi-subnet-failover-and-ssis/

Mar
06

Data Quality Services Performance Best Practices Guide – now available!

The Data Quality Services (DQS) Performance Best Practices Guide (or DQSPDPG for short) is now available on the Microsoft Download Center. It covers hardware and setup considerations, how matching policies will impact your performance, and some best practices when using the DQS Cleansing transform in SSIS. I was also happy to see a section in there about the impact of using advanced DQS functionality, such as Composite Domains, Term-Based Relations, and Reference Data Services. A must read for all DQS users…

Permanent link to this article: http://www.mattmasson.com/index.php/2012/03/data-quality-services-performance-best-practices-guide-now-available/

Feb
24

Resources for SSIS Performance Best Practices

Simple post today.

Here are a number of links to resources that I reference in my Performance Design Patterns presentation.

 


You can see me for SSIS Performance Design Pattern training at SQL Saturday #105 in Dublin, SQL Bits X in London, and SQL Saturday #111 in Atlanta.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/resources-for-ssis-performance-best-practices/

Feb
21

CDC Control Task Operations

The CDC Control Task is used for a number of different operations. This post lists each of those operations, and briefly describes when to use them.

CDC Control Task Editor

Mark initial load start

This operation is used when executing an initial load from an active database without a snapshot. It is invoked at the beginning of an initial-load package to record the current LSN in the source database before the initial-load package starts reading the source tables. A walkthrough of how this process works can be found in my CDC in SSIS for SQL Server 2012 post.

Mark initial load end

This operation is used when executing an initial load from an active database without a snapshot. It is invoked at the end of an initial-load package to record the current LSN in the source database after the initial-load package finished reading the source tables. This LSN is determined by recording the current time when this operation occurred and then querying the cdc.lsn_time_mapping table in the CDC database looking for a change that occurred after that time

Mark CDC start

This operation is used when then the initial load is made from a snapshot database database or from a quiescence database. It is invoked at any point within the initial load package. The operation accepts a parameter that can be a snapshot LSN, a name of a snapshot database (from which the snapshot LSN will be derived automatically) or it can be left empty, in which case the current database LSN is used as the start LSN for the change processing package. This operation is used as an alternative to the Mark Initial Load Start/End operations.

Get processing range

This operation is used in a change processing package before invoking the data flow that uses the CDC Source data flow. It establishes a range of LSNs that the CDC Source data flow reads when invoked. The range is stored in an SSIS package variable (StateVariable property) that is used by the CDC Source during data flow processing.

Mark processed range

This operation is used in a change processing package at the end of a CDC run (after the CDC data flow is completed successfully) to record the last LSN that was fully processed in the CDC run. The next time Get processing range is used, this position determines the start of the next processing range.

Reset CDC state

This operation is used to reset the persistent CDC state associated with the current CDC context. After this operation is run, the current maximum LSN from the LSN-timestamp sys.fn_cdc_get_max_lsn table becomes the start of the range for the next processing range. An example of when this operation is used is when you want to process only the newly created change records and ignore all old change records.

 

Notes

  • The following operations open a connection to the source system
    • Mark initial load start
    • Mark initial load end
    • Mark CDC start

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/cdc-control-task-operations/

Feb
17

Thoughts on Branching Strategies for SSIS Projects

A question came in recently from a customer about branching strategies for SSIS projects. Their development team has been using SSIS for years, but have not been using any form of source control software up until now. They felt that the package format improvements in SQL Server 2012, which were focused on making it possible to diff/merge package files, were a good motivation for them to adopt this best practice.

I gave the following high level guidance:

  • Treat your SSIS packages as if they were source files from a compiled language (i.e. C#)
  • Think of your SSIS project as an “Application” (I frequently refer to SSIS projects as “Data Integration Applications” in my talks)
  • There is a lot of guidance out there on software development lifecycle best practices, most of which applies to SSIS development
    • It may take some experimenting to determine what methodology works best for your project and team

I’ve always liked Jamie Thomson’s posts about his experiences with SSIS Team Development (as well as general best practices and naming conventions that apply here as well). The changes we made in SQL 2012 alleviate the need for some of these best practices, but many of them still apply.

In regards to specific branching strategies, it’s common to have three core branches.

Development

This is the branch developers work out of. Developers are responsible for testing changes, and making sure the project is fully functional in this branch. If a developer is making significant changes that would be disruptive to everyone else, or cause the project to be broken for any amount of time, then you should create a sub branch for them to work out of. This branch should be regularly merged (weekly?) to the Integration branch so that issues not caught during developer testing can be identified early.

Integration

The builds from this branch go up to your test environment. This is where you’d do signoffs before the project is moved to production. Most changes coming into this branch should be coming from Development – the only time you’d directly check something in here is during the stabilization period (and this should be rare). Once QA have signed off on the release, you’d merge this to your Main branch.

Main

This is what gets released to production. You should use build tags to identify specific versions/releases. Nothing should be checked directly into this branch without first going through Integration testing.

Note, this branching strategy assumes a single team/project. If you have multiple teams/projects, you need to decide if you are sharing a common Main, or if everyone goes into the same Integration branch (and whether there are multiple levels of Integration). Within the SQL org, for example, we’d have a “Release” branch past Main that we actually do the external builds from (it also allows us to do servicing builds post-RTM).

(The actual SQL branching strategy is a lot more complex than that, but this is the basic idea!)

Are you currently working with SSIS in team environment? Are you using source control? I’d love to hear any recommendations you might have.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/thoughts-on-branching-strategies-for-ssis-projects/

Feb
16

Speaking at SQL Saturday #111 | Atlanta

Wow, what a lineup for SQL Saturday #111. I’m still amazed and honored anytime I’m on a speaker list of this caliber!

I’ll be co-presenting a pre-con event (full day training) with Andy Leonard on April 13th, and talking about SSIS in SQL Server 2012 as part of the main event. Rumor has it that I might even get to co-present with Patrick LeBlanc, who has been working hard on High Availability Solutions for SSIS in SQL 2012. It’s all very exciting – I hope to see you there!


If you can’t make it to Atlanta, you can also see me for SSIS Performance Design Pattern training at SQL Saturday #105 in Dublin, and SQL Bits X in London.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/speaking-at-sql-saturday-111-atlanta/

Feb
15

Exit Codes, DTEXEC, and SSIS Catalog

DTEXEC has two execution modes when running packages in the SSIS Catalog.

By default, the process runs asynchronously – the package execution is scheduled on the server, and DTEXEC returns immediately. DTEXEC does not wait for the package execution to complete. In this case, %ERRORLEVEL% would represent whether the package execution was created successfully, and will not reflect the result of the execution.

To get the result of the package execution, you can set the $ServerOption::SYNCHRONIZED property to True. In this mode, DTEXEC will wait for the package execution to complete. If the execution was successful, the %ERRORLEVEL% will be set to 0. If it failed, it will be set to 1.

See the examples below – the first execution runs a package (2-RowCounts.dtsx) on my local instance, and completes successfully. The second execution runs a package (6-ErrorContext.dtsx) that will fail.

 

C:\>dtexec /ISserver \SSISDB\Demo\MyCatalogProject\2-RowCounts.dtsx /Par "$ServerOption::SYNCHRONIZED(Boolean)";True
Microsoft (R) SQL Server Execute Package Utility
Version 11.0.XX.XX for 64-bit
Copyright (C) Microsoft Corporation. All rights reserved.

Started:  9:49:16 AM
Execution ID: 11.
To view the details for the execution, right-click on the Integration Services Catalog,
and open the [All Executions] report
Started:  9:49:16 AM
Finished: 9:49:22 AM
Elapsed:  6.015 seconds

C:\>echo %ERRORLEVEL%
0

C:\>dtexec /ISserver \SSISDB\Demo\MyCatalogProject\6-ErrorContext.dtsx /Par "$ServerOption::SYNCHRONIZED(Boolean)";True
Microsoft (R) SQL Server Execute Package Utility
Version 11.0.XXX.XX for 64-bit
Copyright (C) Microsoft Corporation. All rights reserved.

Started:  9:49:43 AM
Package execution on IS Server failed. Execution ID: 12, Execution Status:4.
To view the details for the execution, right-click on the Integration Services Catalog,
and open the [All Executions] report
Started:  9:49:43 AM
Finished: 9:49:49 AM
Elapsed:  6.235 seconds

C:\>echo %ERRORLEVEL%
1

For more details on the new DTEXEC command line parameters for the SSIS Catalog, see:

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/exit-codes-dtexec-and-ssis-catalog/

Feb
13

Queries for Data Flow Component Performance in SQL 2012

The built-in SSIS Catalog reports in SSMS show a breakdown of task and component performance of a package execution when you run with a logging level of “Performance”.

SSIS Catalog package performance report

For a more detailed or customized view, you can query the catalog tables directly. The key table to get data flow performance information is [catalog].[execution_component_phases]. Here are two sample queries – the first returns the total time spent on each component (similar to what is displayed in the performance report), and the second returns the times broken down by phase.

 

--
-- Determine the execution id for the last package execution
--

DECLARE @execution_id bigint;
SET @execution_id = (SELECT MAX(execution_id) from [catalog].execution_component_phases)

--
-- Break down time spent per transform
--

SELECT	package_name as PackageName,
        subcomponent_name as Transform,
        SUM(DATEDIFF(ms,start_time,end_time)) as ActiveTime
FROM	[catalog].execution_component_phases
WHERE   execution_id = @execution_id
GROUP BY package_name, task_name, subcomponent_name, execution_path
ORDER BY ActiveTime DESC

--
-- Break down transforms per phase
--

SELECT  package_name as PackageName,
        subcomponent_name as Transform,
        phase as Phase,
        SUM(DATEDIFF(ms,start_time,end_time)) as ActiveTime
FROM	[catalog].execution_component_phases
WHERE   execution_id = @execution_id
GROUP BY package_name, task_name, subcomponent_name, execution_path, phase
ORDER BY ActiveTime DESC

Running these queries returns something like this:

image

Some notes:

  1. You need to run the package with a Performance logging level to capture this data
  2. You can set a default logging level for the Catalog
  3. The query above determines the execution_id for the latest run – you will need to change this value to look at the results for specific package executions

Permanent link to this article: http://www.mattmasson.com/index.php/2012/02/queries-for-data-flow-component-performance-in-sql-2012/

Jan
30

Retrieving the Value of a Sensitive Parameter in a Script

SQL Server 2012 introduces Sensitive Parameters. These parameter values are masked in the UI, protected according to the ProtectionLevel of your package/project when saved to disk, and encrypted when deployed to the SSIS Catalog. When using SSIS Expressions, sensitive parameters can only be assigned to sensitive properties (such as a connection manager’s Password property).

You may wish to use a sensitive parameter in a script (for example, if you’re doing FTP tasks using a Script Task). Retrieving a sensitive parameter’s value like any other variable/parameter will result in an error at runtime.

For example, the following ScriptTask code to retrieve the value for the “FtpPassword” parameter:

        public void Main()
        {
            try
            {
                string value = Dts.Variables["$Package::FtpPassword"].Value.ToString();
                MessageBox.Show("My super secret sensitive value is: " + value);

                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception e)
            {
                Dts.Log(e.Message, 0, null);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }

Results in the following exception at runtime:

 

Error: Accessing value of the parameter variable for the sensitive parameter "<XXX>" is not allowed. Verify that the variable is used properly and that it protects the sensitive information.

 

The correct way to retrieve the value is by using the GetSensitiveValue() function. The following ScriptTask code will return the value successfully.

        public void Main()
        {
            try
            {
                string value = Dts.Variables["$Package::FtpPassword"].GetSensitiveValue().ToString();
                MessageBox.Show("My super secret sensitive value is: " + value);

                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception e)
            {
                Dts.Log(e.Message, 0, null);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }

Note: Once you retrieve the value out of the sensitive parameter, you are responsible for ensuring it is securely handled. Be careful of unintentionally exposing the actual value through logging. See the Handling Passwords entry in books online for more information.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/01/retrieving-the-value-of-a-sensitive-parameter-in-a-script/

Jan
25

SSIS Quick Tip: Copy & Paste Packages into a Visual Studio Project

The Add Existing Package option for SSIS projects is very helpful, but has one major limitation. The dialog only allows you to add a single package at a time. When working with a lot of packages, this can be a time consuming process.

image

Thankfully, there is a shortcut. You can add multiple packages to a project using Copy & Paste.

  1. Select the packages you want to add to your projects in Windows Explorer, and Copy (Ctrl-C)
  2. Switch to Visual Studio
  3. Select the Project node
  4. Paste (Ctrl-V)

The packages will be added to the SSIS project (and copied to the project directory).

image

Note: This works in all versions of SSIS.

Permanent link to this article: http://www.mattmasson.com/index.php/2012/01/ssis-quick-tip-copy-paste-packages-into-a-visual-studio-project/

Older posts «

» Newer posts