SELECT * From SSIS.DataFlow

If you’ve been looking through the documentation for the Power BI preview, you might have noticed a section on Publishing SSIS Packages as OData Feeds. This functionality lets you create a T-SQL View over an SSIS data flow using a new SSIS add-on called the Data Feed Publishing Components. This add-on works with SQL Server 2012, and is a free download from the Microsoft download center. While the components are useful for a number of Power BI scenarios, the components don’t require a Power BI subscription – all you need is a SQL Server/SSIS 2012 installation.

The Data Feed Publishing Components have three main components:

  1. Data Streaming Destination – A new Data Flow Destination that creates an “endpoint”, letting you stream data back to the caller (similar in concept to the DataReader Destination)
  2. OLE DB Provider for SSIS – A special OLE DB provider that allows SQL to treat an SSIS package as a Linked Server
  3. Data Feed Publishing Wizard – A wizard that deploys a project/package containing a Data Streaming Destination and creates a T-SQL View (and linked server) that kicks off the SSIS package when accessed

Publish SSIS Packages - Conceptual Diagram

If this sounds interesting to you, be sure to check out the step-by-step guide in the Power BI documentation. Note – if you’re currently not in the Power BI preview, you can stop at step #3.

Scenarios

I expect I’ll be blogging more about this in the coming months (as well as talking about it at PASS), but I wanted to briefly mention some of the main scenarios we’ve been working with customers on.

Invoking an SSIS Package from a Report

You’d do this in the case where a simple query isn’t enough – there are work flow steps (i.e. FTP files from remote server), you’re merging/transforming disparate data sources, require .NET scripting logic, or your data source requires a custom connector. Internally we’ve been referring to this scenario as “Complex Data Feeds”.

While it is possible to configure Reporting Services to read from an SSIS package, the approach has some limitations as to the account the package will execute as (and is actually removed from the default configuration file in SQL 2012). The Data Feed components provide a similar approach, but also let you leverage the logging and configuration provided by the SSIS Catalog.

On-Demand Execution of an SSIS Package

SELECT’ing from the View created by the Publishing Wizard dynamically invokes the SSIS package with a data flow, and streams back the results. While the majority of SSIS packages would run on a schedule, or write data to a fixed destination, there are cases where dynamic invocation and streaming results are preferred.

One customer we worked with had 500+ “Data Feeds” – data sets that were more than just a simple queries. This data sets were typically small and used for ad hoc reporting purposes. These feeds weren’t accessed regularly – some would not be used for months, and then be used heavily for a day or two (perhaps at the end of a quarter). Unfortunately, the access patterns weren’t predictable. Because the data had to be there when it was needed, the customer ended up with a very large infrastructure keeping every feed up to date. What they needed was something that could be run on demand (perhaps with built in caching), so the data could be refreshed only when it was needed.

Another customer was looking for a way to do dynamic auditing in their environments using SSIS. They had a set of packages with custom auditing logic that they’d deploy and run in various environments, getting a real-time snapshot of their systems.

Alternative to Linked Servers

Want to use a linked server, but don’t have an OLE DB provider for your data source? Want to enforce custom logic, or do dynamic transformations on the results of the data? No problem – use an SSIS package!

Limitations

Just like Power BI, the current release of these components is in preview, and might not have all of the functionality you’re looking for (just yet). One thing to note is that the SSIS OLE DB provider currently does not support Distributed Query optimizations. Therefore, it currently doesn’t provide statistics or push down filters like other OLE DB providers used for Linked Servers. This functionality is best suited for one time executions of an SSIS package – if you find it’s something you’re accessing over and over, then you should probably be running your package on a schedule.

Links

For more information, see one of the following:

Publish to SSIS Catalog using PowerShell

Here is a PowerShell script that I use to provision and setup SSIS Catalogs when I’m demoing SQL Server 2012. It does the following:

  1. Connects to localhost (default instance)
  2. Drops the existing SSIS Catalog
  3. Creates a new SSIS Catalog
  4. Creates a Folder
  5. Deploys a Project
  6. Creates an Environment
  7. Creates three server varaibles
  8. Creates an environment reference for the project

Be careful of step #2 above – this script is great for resetting a demo environment, but you’ll want to modify it if you’re using it for real deployments!

 

Exit Codes, DTEXEC, and SSIS Catalog

DTEXEC has two execution modes when running packages in the SSIS Catalog.

By default, the process runs asynchronously – the package execution is scheduled on the server, and DTEXEC returns immediately. DTEXEC does not wait for the package execution to complete. In this case, %ERRORLEVEL% would represent whether the package execution was created successfully, and will not reflect the result of the execution.

To get the result of the package execution, you can set the $ServerOption::SYNCHRONIZED property to True. In this mode, DTEXEC will wait for the package execution to complete. If the execution was successful, the %ERRORLEVEL% will be set to 0. If it failed, it will be set to 1.

See the examples below – the first execution runs a package (2-RowCounts.dtsx) on my local instance, and completes successfully. The second execution runs a package (6-ErrorContext.dtsx) that will fail.

 

For more details on the new DTEXEC command line parameters for the SSIS Catalog, see:

Queries for Data Flow Component Performance in SQL 2012

The built-in SSIS Catalog reports in SSMS show a breakdown of task and component performance of a package execution when you run with a logging level of “Performance”.

SSIS Catalog package performance report

For a more detailed or customized view, you can query the catalog tables directly. The key table to get data flow performance information is [catalog].[execution_component_phases]. Here are two sample queries – the first returns the total time spent on each component (similar to what is displayed in the performance report), and the second returns the times broken down by phase.

 

Running these queries returns something like this:

image

Some notes:

  1. You need to run the package with a Performance logging level to capture this data
  2. You can set a default logging level for the Catalog
  3. The query above determines the execution_id for the latest run – you will need to change this value to look at the results for specific package executions

Setting a Default Logging Level in the SSIS Catalog

One of the powerful features of the SSIS Catalog is that it can automatically capture logs for packages run on the server. The logging level is set to Basic by default, and can be changed when the package is run on the Advanced tab of the run UI (as shown below. This can also be done when scheduling the package through SQL Agent).

Logging level on Execute Package UI

You can set a server wide default logging level on the Catalog properties page.

  • Connect to the SQL Server instance using SSMS
  • Expand the Integration Services Catalogs node
  • Right click on the SSISDB node, select Properties
  • Change the Server-wide Default Logging Level setting (in the Operations Log category)

Catalog Properties

See What Events are Included in the SSIS Catalog Log Levels for more information.

What Events are Included in the SSIS Catalog Log Levels

Logging Levels

Packages run through the SSIS Catalog can automatically have their logs captured in SQL Server 2012. The SSIS Catalog provides four different log levels – None, Basic, Performance, Verbose. The default log level is Basic.

The table below shows which SSIS events are captured in the [catalog].[operation_messages] view in the four logging levels.

 

Level Events Notes
None None Captures enough information to say whether the package succeeded or failed, and does not log any messages to the [operation_messages] view.
Basic OnPreValidateOnPostValidate

OnPreExecute

OnPostExecute

OnInformation

OnWarning

OnError

Captures similar information to what is displayed on the console by the default when a package is run with dtexec.
Performance OnWarningOnError This log level is required to track the performance information for the run (how long it took to run each task / component, etc) but does not log all of the events captured by the Basic log level.
Verbose All events The Verbose log level captures all log events (including performance and diagnostic events). This logging level can introduce some overhead on performance – see details blow.

Note that additional information is captured and stored in other tables / views in the SSIS Catalog. For an overview of the Views related to package execution, see Jamie Thomson’s blog post – SSIS Logging in Denali.

When to Use Each Logging Level

None

While the None logging level provides slightly better performance than Basic, I don’t expect that it will be used too often in production environments. This logging level does not capture error and status messages, which makes diagnosing problems difficult. However, if you have Packages Never Fail ™, this logging level might be for you! (Hey, you can always re-run the package with a higher level if something goes wrong…)

Basic

This is the recommended logging level for every day use. It captures important events (like errors and warnings), as well as enough progress information to display what the currently active task is, and how long each task has taken (internally this information is stored in the [catalog].[executable_statistics] view).

Performance

The Performance log level should be used when you are doing benchmarking and performance tuning for your packages. While it actually logs less messages to the [catalog].[operation_messages] view than Basic, it captures a lot more events internally to analyze the performance of the data flow components. As a result, there is a bit more overhead during execution – packages run with Basic will actually run a little faster than Performance (in this case Performance means “give me all of the performance details”, not “run my packages as fast as you can”).

Verbose

The Verbose logging level captures a lot of events (especially when using OLE DB connections, as they generate many DIAGNOSTIC events). Verbose should only be used when trying to debug or diagnose package failures.

Custom Log Events

The only log level that captures custom log events (i.e. special events that are raised with custom / 3rd party SSIS extensions or scripts) is Verbose. Due to the overhead that the Verbose level introduces, I recommend creating your own custom reports if you need to capture these custom events.