Advanced SSIS Catalog presentation from TechEd North America 2013

Matthew Roche (blog | twitter) and I teamed up once again to present an advanced SSIS Catalog session at TechEd North America 2013 – Deep Inside the Microsoft SQL Server Integration Services Server. The video and slide deck are now available on the Channel9 site. The slide deck actually contains 10 additional slides that we didn’t have time to cover during the regular session (with some further details about the security model).

I want to extend a big thank you to everyone who attended, and to all the great feedback we got. It can be tough doing a 400 level SQL session at TechEd, and while I could see some people’s heads spinning, it sounded like most people were able to learn something new.

The TechEd team picked an excellent preview picture for the session (below). It comes from Matthew’s intro – you’ll have to watch the video to see how he worked a picture of kittens into a 400 level SSIS session.

image

If you didn’t already know, Channel 9 has many TechEd presentations available online. You can see recordings of my previous events on my speaker page, and Matthew’s as well.

SSIS, DQS and MDS Training at SQL Saturday #229 | Dublin, Ireland

I’m honored to be invited back to this year’s SQL Saturday Dublin event. I must have done a decent job last year on my SSIS Design Patterns pre-conference session, because I’ve been asked to do another one this time around as part of their Training Day (June 21st).

This year I’ll be doing a full day training session on Enterprise Information Management, which greatly expands upon one of my more popular talks about combining SSIS, DQS and MDS. The session will include some advanced SSIS topics (such as automation and dynamic package generation), and some of the main SSIS design patterns for data warehousing. The abstract is included below:

Enterprise Information Management (EIM) is an industry term for technologies to manage your data for integration, quality, and governance. This full day training session will show you how Integration Services (SSIS), Data Quality Services (DQS), and Master Data Services (MDS) work together to provide a comprehensive EIM solution in SQL Server 2012. Focusing on Data Warehousing scenarios, we’ll explore the key steps needed to build such a solution, from the beginning (planning, benchmarking), to data curation (cleansing, matching, managing dimensions), to data loading using SSIS. We’ll also touch on some advanced data warehousing ETL design patterns, such as change data capture (CDC), slowly changing dimension processing, and automation.

 

Course Modules

  • Data cleansing and matching
  • Reference data management
  • SSIS design patterns
  • Incremental data loading
  • SSIS deployment, management, and monitoring
  • Automation

Registration details can be found on the Prodata site.

The session schedule hasn’t been posted yet, but I see that a number of people submitted interesting SSIS sessions to the conference. I see a lot of big names from the SQL community, and expect there will be an awesome turnout (just like last time).

I submitted a brand new session entitled Cats, Facebook, and Online Dating with Microsoft BI that I hope will be accepted. I’ll show the SSIS demos I’ve done for the BI Power Hour (2011 | 2012), how I built them, and the lessons that I learned throughout. I’ll try not to make it too egotistical (“Hey look at me and how I made people laugh that one time!”), and try to provide content that attendees will find useful.

Oh, who I am kidding? I’ll totally be showing off Mr. Wiggles.

Hope to see you there!

Slides from SSIS Performance Design Patterns | TechDays Hong Kong 2013

The slides from my 3rd presentation at the TechDays Hong Kong event are now available. As I mentioned during the talk, design patterns are one of my favorite topics, and something I really enjoyed writing about. This presentation had a subset of the SSIS Performance Design patterns in the full day training sessions I’ve done in the past. See this post for a link to the full deck (over 100 slides).

Thanks for attending!

Book – Microsoft SQL Server 2012 Integration Services comes out September 17th

Update: Amazon is now reporting a release date of 9/17 although it appears you can order it immediately from the O’Reilly site.

My second SSIS book – Microsoft SQL Server 2012 Integration Services (MS Press) – comes out on September 17th. This book takes a different approach than my Design Patterns book that came out last week – it’s more of an in-depth look at the internals of SSIS in SQL Server 2012, rather a problem/solution type of guide. My co-authors (Wee Hyong Tok, Rakesh Parida, Xiaoning Ding, and Kaarthik Sivashanmugam) are all members of the SSIS product team (or at least were when we started writing the book… Rakesh and Kaarthik have recently moved over to the Data Quality Services team), which allowed us to take a deep look into how the product was designed.

Here a full list of chapters – a sample chapter is available from the O’Reilly website.

  • Chapter 1 : SSIS Overview
  • Chapter 2 : Understanding SSIS Concepts
  • Chapter 3 : Upgrading to SSIS 2012
  • Chapter 4 : New SSIS Designer Features
  • Chapter 5 : Team Development
  • Chapter 6 : Developing an SSIS Solution
  • Chapter 7 : Understanding SSIS Connectivity
  • Chapter 8 : Working with Change Data Capture in SSIS 2012
  • Chapter 9 : Data Cleansing Using SSIS
  • Chapter 10 : Configuration in SSIS
  • Chapter 11 : Running SSIS Packages
  • Chapter 12 : SSIS T-SQL Magic
  • Chapter 13 : SSIS PowerShell Magic
  • Chapter 14 : SSIS Reports
  • Chapter 15 : SSIS Engine Deep Dive
  • Chapter 16 : SSIS Catalog Deep Dive
  • Chapter 17 : SSIS Security
  • Chapter 18 : Understanding SSIS Logging
  • Chapter 19 : Automating SSIS
  • Chapter 20 : Troubleshooting SSIS Package Failures
  • Chapter 21 : SSIS Performance Best Practices
  • Chapter 22 : Troubleshooting SSIS Performance Issues
  • Chapter 23 : Troubleshooting Data Issues

Can I deploy a single SSIS package from my project to the SSIS Catalog?

No. No you cannot. In SQL Server 2012, you will always deploy the entire .ispac file to the Catalog.

I’ve received this question a number of times in the past couple of weeks. It’s actually a feature we considered, and one that I (initially) argued for, as I know it is a common practice with SSIS solutions built for SQL Server 2005 and 2008. However, I was quickly convinced that the scenarios that require single package, or incremental deployment of a project can be solved using team development best practices and procedures. The central one being source control management of your SSIS packages.

Let’s compare developing an SSIS solution to developing a C#/.NET application.

  • A package (.dtsx) can be thought of a source code file (.cs)
  • The project (.ispac) can be thought of as the binary the source code compiles into (.exe)

If you have a team of developers working on the application, you would not do the following:

  • Place the source code in a common directory which all developers work on simultaneously
  • Update individual source files without recompiling the project binary
  • Commit partial changes to one source file when it could break code in other source files
  • Build and deploy the project to a test/integration/production environment when you are unsure whether the changes to other parts of the code made by other developers are complete

(Ok, maybe would not is too strong – I’ve definitely seen all of the above done before . How about we use should not instead?)

When I ask for details about the scenarios that people require single package updates for, it typically turns out that they are doing one or more of these “should not” things in their environments. If all of these things are red flags for application developers, then why do people do them with SSIS solutions?

I described some ideas I had for SSIS branching strategies when you’re using source control a while back. I’d like to add the following high level recommendations to that post:

  • If you have a team of developers working on SSIS packages, you should put you SSIS packages in source control
  • Developers should not commit changes until they are complete, or in at least in a reasonable state where the project is in a working state
  • If you have developer working on a very large change that could potentially break things (or one you want to do in multiple steps), do the work on a sub-branch to minimize impact to the rest of the team
  • Builds that get deployed to test/integration/production environments come from stable branches, not from the development branch

Now, there are probably cases where single package updates for an SSIS project deployment can come in handy (even when you’re using source control, producing regular builds off the integration branches, etc). I just haven’t been able to think of any. If you have one of these scenarios, I’d really like to hear it – either via Connect, or by posting here (or ideally, both). So far the customers I’ve worked with found that these deployment requirements went away once they started adopting some of these application development lifecycle best practices… but I’m always happy to be proved wrong!

Getting Started with DQS and MDS

If you’re looking to get started with Data Quality Services (DQS) and Master Data Services (MDS), there are some fantastic resources available on Technet. The site includes videos and slides for full day training sessions on both products.

Data Quality Services for SQL Server 2012

  • Data Quality Basics and Introducing DQS: Video | Slides
  • Knowledge Management and Data Cleansing in DQS: Video | Slides
  • Data Matching in DQS: Video | Slides
  • DQS Integration with SSIS: Data Cleansing using SSIS: Video | Slides
  • DQS Integration with MDS: Data Matching using MDS: Video | Slides

Master Data Services for SQL Server 2012

  • Master Data Services Overview: Video | Slides
  • Managing Data Warehousing Dimensions with MDS, Part 1: Video | Slides
  • Managing Data Warehousing Dimensions with MDS, Part 2: Video
  • Data Loading via Entity Based Staging (EBS): Video | Slides
  • MDS Hierarchies and Collections: Video | Slides
  • Business Rules and Workflow in MDS: Video | Slides
  • MDS Model Migration and Upgrade: Video | Slides
  • Security Features and Guidelines in MDS: Video | Slides
  • Eliminate Duplicate Data with the MDS Add-In for Excel: Video | Slides

Setting a Default Logging Level in the SSIS Catalog

One of the powerful features of the SSIS Catalog is that it can automatically capture logs for packages run on the server. The logging level is set to Basic by default, and can be changed when the package is run on the Advanced tab of the run UI (as shown below. This can also be done when scheduling the package through SQL Agent).

Logging level on Execute Package UI

You can set a server wide default logging level on the Catalog properties page.

  • Connect to the SQL Server instance using SSMS
  • Expand the Integration Services Catalogs node
  • Right click on the SSISDB node, select Properties
  • Change the Server-wide Default Logging Level setting (in the Operations Log category)

Catalog Properties

See What Events are Included in the SSIS Catalog Log Levels for more information.

What Events are Included in the SSIS Catalog Log Levels

Logging Levels

Packages run through the SSIS Catalog can automatically have their logs captured in SQL Server 2012. The SSIS Catalog provides four different log levels – None, Basic, Performance, Verbose. The default log level is Basic.

The table below shows which SSIS events are captured in the [catalog].[operation_messages] view in the four logging levels.

 

Level Events Notes
None None Captures enough information to say whether the package succeeded or failed, and does not log any messages to the [operation_messages] view.
Basic OnPreValidateOnPostValidate

OnPreExecute

OnPostExecute

OnInformation

OnWarning

OnError

Captures similar information to what is displayed on the console by the default when a package is run with dtexec.
Performance OnWarningOnError This log level is required to track the performance information for the run (how long it took to run each task / component, etc) but does not log all of the events captured by the Basic log level.
Verbose All events The Verbose log level captures all log events (including performance and diagnostic events). This logging level can introduce some overhead on performance – see details blow.

Note that additional information is captured and stored in other tables / views in the SSIS Catalog. For an overview of the Views related to package execution, see Jamie Thomson’s blog post – SSIS Logging in Denali.

When to Use Each Logging Level

None

While the None logging level provides slightly better performance than Basic, I don’t expect that it will be used too often in production environments. This logging level does not capture error and status messages, which makes diagnosing problems difficult. However, if you have Packages Never Fail ™, this logging level might be for you! (Hey, you can always re-run the package with a higher level if something goes wrong…)

Basic

This is the recommended logging level for every day use. It captures important events (like errors and warnings), as well as enough progress information to display what the currently active task is, and how long each task has taken (internally this information is stored in the [catalog].[executable_statistics] view).

Performance

The Performance log level should be used when you are doing benchmarking and performance tuning for your packages. While it actually logs less messages to the [catalog].[operation_messages] view than Basic, it captures a lot more events internally to analyze the performance of the data flow components. As a result, there is a bit more overhead during execution – packages run with Basic will actually run a little faster than Performance (in this case Performance means “give me all of the performance details”, not “run my packages as fast as you can”).

Verbose

The Verbose logging level captures a lot of events (especially when using OLE DB connections, as they generate many DIAGNOSTIC events). Verbose should only be used when trying to debug or diagnose package failures.

Custom Log Events

The only log level that captures custom log events (i.e. special events that are raised with custom / 3rd party SSIS extensions or scripts) is Verbose. Due to the overhead that the Verbose level introduces, I recommend creating your own custom reports if you need to capture these custom events.