A new OData Source for SSIS is now available. The 2012 version is available as a standalone download, and the 2014 version is part of the SQL Server 2014 Feature Pack.
The documentation for the 2012 is available in Books Online.
A new feature was added to the Raw File Destination in SQL Server 2012 which allows you to output an empty raw file from the editor UI.
This functionality is handy when you are using raw files to stage data between data flows. You can use this empty raw file in the Raw File Source to configure the metadata it needs. In previous versions, you would have had to run the package to produce the initial file.
Another note – Raw Files were improved in SQL Server 2012 to include sort information, so now the IsSorted flag is automatically set in the Raw File Source.
Here are some of the resources I mentioned in the SSIS Roadmap session at the PASS Summit.
(update 2014/11/14: added link to the VS 2013 release which came out in April)
Looking for the SSIS development tools? Formerly known as Business Intelligence Developer Studio (BIDS), the designer is now called SQL Server Data Tools – Business Intelligence (SSDT-BI) (not to be confused with the other SQL Server Data Tools). The version you want will depend on two things; your version of SQL Server, and the version of Visual Studio you want to work with.
SQL Server 2012
SQL Server 2014
If you’ve been looking through the documentation for the Power BI preview, you might have noticed a section on Publishing SSIS Packages as OData Feeds. This functionality lets you create a T-SQL View over an SSIS data flow using a new SSIS add-on called the Data Feed Publishing Components. This add-on works with SQL Server 2012, and is a free download from the Microsoft download center. While the components are useful for a number of Power BI scenarios, the components don’t require a Power BI subscription – all you need is a SQL Server/SSIS 2012 installation.
The Data Feed Publishing Components have three main components:
If this sounds interesting to you, be sure to check out the step-by-step guide in the Power BI documentation. Note – if you’re currently not in the Power BI preview, you can stop at step #3.
I expect I’ll be blogging more about this in the coming months (as well as talking about it at PASS), but I wanted to briefly mention some of the main scenarios we’ve been working with customers on.
Invoking an SSIS Package from a Report
You’d do this in the case where a simple query isn’t enough – there are work flow steps (i.e. FTP files from remote server), you’re merging/transforming disparate data sources, require .NET scripting logic, or your data source requires a custom connector. Internally we’ve been referring to this scenario as “Complex Data Feeds”.
While it is possible to configure Reporting Services to read from an SSIS package, the approach has some limitations as to the account the package will execute as (and is actually removed from the default configuration file in SQL 2012). The Data Feed components provide a similar approach, but also let you leverage the logging and configuration provided by the SSIS Catalog.
On-Demand Execution of an SSIS Package
SELECT’ing from the View created by the Publishing Wizard dynamically invokes the SSIS package with a data flow, and streams back the results. While the majority of SSIS packages would run on a schedule, or write data to a fixed destination, there are cases where dynamic invocation and streaming results are preferred.
One customer we worked with had 500+ “Data Feeds” – data sets that were more than just a simple queries. This data sets were typically small and used for ad hoc reporting purposes. These feeds weren’t accessed regularly – some would not be used for months, and then be used heavily for a day or two (perhaps at the end of a quarter). Unfortunately, the access patterns weren’t predictable. Because the data had to be there when it was needed, the customer ended up with a very large infrastructure keeping every feed up to date. What they needed was something that could be run on demand (perhaps with built in caching), so the data could be refreshed only when it was needed.
Another customer was looking for a way to do dynamic auditing in their environments using SSIS. They had a set of packages with custom auditing logic that they’d deploy and run in various environments, getting a real-time snapshot of their systems.
Alternative to Linked Servers
Want to use a linked server, but don’t have an OLE DB provider for your data source? Want to enforce custom logic, or do dynamic transformations on the results of the data? No problem – use an SSIS package!
Just like Power BI, the current release of these components is in preview, and might not have all of the functionality you’re looking for (just yet). One thing to note is that the SSIS OLE DB provider currently does not support Distributed Query optimizations. Therefore, it currently doesn’t provide statistics or push down filters like other OLE DB providers used for Linked Servers. This functionality is best suited for one time executions of an SSIS package – if you find it’s something you’re accessing over and over, then you should probably be running your package on a schedule.
For more information, see one of the following:
The SSIS team blogged about executing packages deployed to the SSIS Catalog using DTEXEC a while ago. The post mentions the $ServerOption::SYNCHRONIZED parameter as a way to control whether the execution is synchronous or asynchronous, but there are some other server options you can set as well. Phil Brammer actually blogged about the options last year. You can also see the full list of options when you view the SSIS Catalog Execution report (note the Parameters Used section in the screenshot below).
These options can be specified on the command line when you run a catalog package with DTExec. For example, to change the logging level to VERBOSE (3) for a specific execution, you’d add the following to your dtexec command:
More information on logging levels can be found here.
Each entity in SQL Server 2012 Master Data Services (MDS) will have it’s own staging table (stg.<name>_Leaf). Using this staging table, you can create, update, deactivate and delete left members in bulk. This post describes how to bulk load into an entity staging table and trigger the stored procedure to start the batch import process.
The new entity based staging tables are an excellent feature in MDS 2012, and make it very easy to bulk load into MDS from SSIS. If you take a look at the SQL database used by your MDS instance, you’ll see at least one table in the stg schema for each entity. For this example I’ve created a Suppliers entity and I see a matching table called [stg].[Suppliers_Leaf]. If your entity is using hierarchies, you will have three staging tables (see BOL for details). If we expand the columns, we’ll see all of the attributes have their own columns, as well as some system columns that every staging table will have.
Each staging table will also have a stored procedure that is used to tell MDS that new data is ready to load. Details of the arguments can be found in BOL.
To load into this table from SSIS, our data flow will need to do the following:
See the Leaf Member Staging Table BOL entry for details on the remaining system columns. If your Code value isn’t set to be generated automatically, then you’d also need to specify it in your data flow. Otherwise, the default fields can be safely ignored when we’re bulk importing.
The BatchTag column is used as an identifier in the UI – it can be any string value, as long as it’s unique (and under 50 characters).
MDS uses the same staging table for creating, updating and deleting entities. The ImportType column indicates which action you want to perform. The possible values are listed in the table below.
|0||Create new members. Replace existing MDS data with staged data, but only if the staged data is not NULL. NULL values are ignored. To change a string attribute value to NULL, set it ~NULL~. To change a number attribute value to NULL, set it to -98765432101234567890. To change a datetime attribute value to NULL, set it to 5555-11-22T12:34:56.|
|1||Create new members only. Any updates to existing MDS data fail.|
|2||Create new members. Replace existing MDS data with staged data. If you import NULL values, they will overwrite existing MDS values.|
|3||Deactivate the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are maintained but no longer available in the UI. If the member is used as a domain-based attribute value of another member, the deactivation will fail. See ImportType 5 for an alternative.|
|4||Permanently delete the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are permanently deleted. If the member is used as a domain-based attribute value of another member, the deletion will fail. See ImportType 6 for an alternative.|
|5||Deactivate the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are maintained but no longer available in the UI. If the member is used as a domain-based attribute value of other members, the related values will be set to NULL. ImportType 5 is for leaf members only.|
|6||Permanently delete the member, based on the Code value. All attributes, hierarchy and collection memberships, and transactions are permanently deleted. If the member is used as a domain-based attribute value of other members, the related values will be set to NULL. ImportType 6 is for leaf members only.|
When you are bulk loading data into MDS, you’ll use 0, 1 or 2 as the ImportType. To summarize the different modes:
You control flow will have at least two tasks:
Your data flow will have (at least) three steps:
As noted above, in your OLE DB Destination you’ll need to map your data flow columns to your member attributes (including Code if it’s not auto-generated), the BatchTag value (which can be automatically generated via expression), and the ImportType.
After the Data Flow, you’ll run the staging table stored procedure.
EXEC [stg].[udp_Suppliers_Leaf] ?, ?, ?
The first three parameters are required:
Matthew Roche (blog | twitter) and I teamed up once again to present an advanced SSIS Catalog session at TechEd North America 2013 – Deep Inside the Microsoft SQL Server Integration Services Server. The video and slide deck are now available on the Channel9 site. The slide deck actually contains 10 additional slides that we didn’t have time to cover during the regular session (with some further details about the security model).
I want to extend a big thank you to everyone who attended, and to all the great feedback we got. It can be tough doing a 400 level SQL session at TechEd, and while I could see some people’s heads spinning, it sounded like most people were able to learn something new.
The TechEd team picked an excellent preview picture for the session (below). It comes from Matthew’s intro – you’ll have to watch the video to see how he worked a picture of kittens into a 400 level SSIS session.
Yes, yes you can. Scripted or automated deployment can be done in a number of ways in SQL Server 2012.
Integration Services Deployment Wizard
Yes, this is the primary UI tool for SSIS project deployment, but it can also be run silently on the command line. When you run through the wizard, the Review page will actually list all of the parameters you need to do the same deployment from the command line.
Run ISDeploymentWizard.exe /? from a command prompt, and you’ll see the full list of arguments it supports.
<img width="619" height="234" title="image" style="margin: 5px; border: 0px currentcolor; display: inline; background-image: none;" alt="image" src="https://www.mattmasson.com/wp-content/uploads/2013/05/image2.png" border="0" />
SSIS Catalog T-SQL API
DECLARE @ProjectBinary as varbinary(max)
DECLARE @operation_id as bigint
Set @ProjectBinary = (SELECT * FROM OPENROWSET(BULK 'MyProj.ispac', SINGLE_BLOB) as BinaryData)
Exec catalog.deploy_project @folder_name = 'MyFolder', @project_name = 'MyProj', @Project_Stream = @ProjectBinary, @operation_id = @operation_id out
All SSIS Catalog operations can be automated through PowerShell. I previously blogged about a deployment script I use to setup my demos.
The SSIS Catalog management object model (MOM) exposes a set of SMO classes you can use to code your own catalog utilities. You’ll want to use the CatalogFolder.DeployProject method to do the actual deployment. If SMO or .NET isn’t your thing, you can also code a custom utility which interacts directly with the T-SQL API.
I put this together for an SSIS performance presentation I did a while ago, and thought it warranted its own post.
In addition to looking for appropriate Design Patterns, I have a checklist of things to look for when trying to optimize an SSIS Data Flow. This list is by no means extensive, but it contains some of the more common causes of performance issues in the packages I’ve worked with.
OLE DB command operates on a row by row basis. Stage the data and perform the operation in a batch mode using the Execute SQL Task instead.
Learn MERGE, and love it.
Sometimes you do. Most of the time you don’t. If you do need them (i.e. you are loading from a flat file, and need to do a merge), considering maxing out the DefaultBufferMaxRows setting so you can sort more data in memory (sort will keep at most 5 active buffers in memory before it starts swapping to disk).
Lots of related lookup posts can be found here.
Have fun optimizing!