Generate an Empty Raw File Without Running your SSIS Package

A new feature was added to the Raw File Destination in SQL Server 2012 which allows you to output an empty raw file from the editor UI.

  1. Add a Raw File Destination to your data flow, and connect it to the previous transform
  2. Double click to open the Raw File Destination Editor
  3. Configure the file name
  4. Click on the Columns tab, and select the columns you want included
  5. Click on the Connection Manager tab
  6. Click the Generate initial raw file … button
  7. A dialog confirming the output columns will pop up. Click OK to create your file.

 

image

 

This functionality is handy when you are using raw files to stage data between data flows. You can use this empty raw file in the Raw File Source to configure the metadata it needs. In previous versions, you would have had to run the package to produce the initial file.

Another note – Raw Files were improved in SQL Server 2012 to include sort information, so now the IsSorted flag is automatically set in the Raw File Source.

 

image

Quick Tip: How can I tell when my DQS knowledge base has been updated?

imageYou can improve data quality in your SSIS data flows by using the DQS Cleansing transform (new in SQL 2012). When you’re using DQS, the data is cleansed according to the rules you’ve built up in the DQS knowledge base. As you update the knowledge base with better rules, your overall data quality goes up. Therefore, you might find that you want to re-process data that’s already been through DQS anytime you make a significant update to the knowledge base.

Think of your data as “water”, and your knowledge base as a “water filter”… as your filter improves in quality (you improve your knowledge base), the water (data) you run through it gets cleaner.

I usually recommend storing the date value of when the data was processed with DQS. This could be a column value, if you want to track things at the row level, or a entry in a separate file/table, if you’re tracking it at the table/DB level. (I describe this pattern in detail in the SSIS Design Patterns book)

While there is no official notification system or API to know when a DQS knowledge base has been updated, the DQS database does contain a PUBLISH_DATE field that you can query. This won’t tell you which individual domains were touched, or what was actually modified, but given that you probably won’t be publishing a new version of your knowledge base without a good reason, it should be good enough for most automated data cleansing scenarios.

Query the Publish Date for a DQS Knowledge Base

We’re interested in the [dbo].[A_KNOWLEDGEBASE] table in the [DQS_MAIN] database. This table lists all of the knowledge bases in your system, as well as any active projects. You can use the following query to get the last time the knowledge base was updated (replace the [Name] clause with the name of your KB).

Monitoring SSIS package executions

Question: When we run SSIS in development (from SSDT/BIDS), we can see the progress of the job, with all of the steps highlighted in yellow, red or green. Is there a way to display this same status information when the packages run in production?


You have a number of options when it comes to monitoring SSIS packages at runtime. In 2005/2008, you needed to create your own solution (based on SSIS logging, or custom logging approaches). In 2012, you now have built in reporting and monitoring from the SSIS Catalog (although the previous methods still work as well). There are a number of third party and open source logging frameworks and monitoring solutions, and of course you can always roll your own using the SSIS APIs.

A quick search turned up a number of options (some of which I wasn’t previously aware of):

SSIS Catalog (2012)

All versions

Installing SSIS for Visual Studio 2012

Did you hear that SQL Server Data Tools – Business Intelligence (formerly known as BIDS) has been releases for Visual Studio 2012? It was announced last month, and we’ve seen a lot of downloads so far. The download adds support for the SQL Server 2012 BI projects (Reporting Services, Analysis Services, and Integration Services) to VS 2012, or installs a standalone VS shell if you don’t already have VS 2012 installed.

Also note that the component was renamed – we added “Business Intelligence” to the end to distinguish it from the SQL Server Data Tools (Juneau). We now refer to it as SSDTBI, rather than “SSDT, no, not that one, the other one – you know, the one that comes with the SQL installation media, not the one you download”.

Installation

Download the installer from http://www.microsoft.com/en-us/download/details.aspx?id=36843.

When you run the installer, the SQL Server 2012 setup application will show up. Select the Perform a new installation of SQL Server 2012 option, and click Next.

image

On the next page, select the SQL Server Data Tools – Business Intelligence for Visual Studio 2012 option.

image

Complete the installation process, and then launch Visual Studio 2012. You’ll now be able to create new SSIS projects in all of VS 2012’s monochrome goodness.

image

Quick Tip: Resolving Oracle Destination Datatype conversion errors

When working with the Oracle Destination (the one from Microsoft/Attunity (2008 | 2012)), you might encounter generic data type conversion errors after mapping your columns.

image

image

Validation error. Data Flow Task: Data Flow Task: Datatype conversion of Oracle Destination.Inputs[Oracle Destination Input].Columns[L_QUANTITY] is not supported.

The error appears because there is not a 1:1 mapping of SSIS Data Flow types to Oracle types, and the Oracle Destination can’t do the conversion for you. A common one is for BIGINT conversions. Unfortunately, the error message doesn’t provide much guidance on what the type mismatch is, but there is an easy way to check.

Right click on the Oracle Destination component, and select Show Advanced Editor …

On the Input and Output Properties tab, expand the Oracle Destination Input list item, and view the external columns.

image

Click the column which is using you the error. On the properties page, you’ll see the SSIS data type that the destination expects.

image

In your data flow, use a Data Conversion or Derived Column transform to convert the original column to the destination data type.

Automatically Select Input Columns in a Custom Data Flow Component

The following code snippet can be used in a custom data flow component to automatically select all input columns when you attach a path.

This is the equivalent to clicking the Select All box in the Advanced Editor.

image 

Can I deploy a single SSIS package from my project to the SSIS Catalog?

No. No you cannot. In SQL Server 2012, you will always deploy the entire .ispac file to the Catalog.

I’ve received this question a number of times in the past couple of weeks. It’s actually a feature we considered, and one that I (initially) argued for, as I know it is a common practice with SSIS solutions built for SQL Server 2005 and 2008. However, I was quickly convinced that the scenarios that require single package, or incremental deployment of a project can be solved using team development best practices and procedures. The central one being source control management of your SSIS packages.

Let’s compare developing an SSIS solution to developing a C#/.NET application.

  • A package (.dtsx) can be thought of a source code file (.cs)
  • The project (.ispac) can be thought of as the binary the source code compiles into (.exe)

If you have a team of developers working on the application, you would not do the following:

  • Place the source code in a common directory which all developers work on simultaneously
  • Update individual source files without recompiling the project binary
  • Commit partial changes to one source file when it could break code in other source files
  • Build and deploy the project to a test/integration/production environment when you are unsure whether the changes to other parts of the code made by other developers are complete

(Ok, maybe would not is too strong – I’ve definitely seen all of the above done before . How about we use should not instead?)

When I ask for details about the scenarios that people require single package updates for, it typically turns out that they are doing one or more of these “should not” things in their environments. If all of these things are red flags for application developers, then why do people do them with SSIS solutions?

I described some ideas I had for SSIS branching strategies when you’re using source control a while back. I’d like to add the following high level recommendations to that post:

  • If you have a team of developers working on SSIS packages, you should put you SSIS packages in source control
  • Developers should not commit changes until they are complete, or in at least in a reasonable state where the project is in a working state
  • If you have developer working on a very large change that could potentially break things (or one you want to do in multiple steps), do the work on a sub-branch to minimize impact to the rest of the team
  • Builds that get deployed to test/integration/production environments come from stable branches, not from the development branch

Now, there are probably cases where single package updates for an SSIS project deployment can come in handy (even when you’re using source control, producing regular builds off the integration branches, etc). I just haven’t been able to think of any. If you have one of these scenarios, I’d really like to hear it – either via Connect, or by posting here (or ideally, both). So far the customers I’ve worked with found that these deployment requirements went away once they started adopting some of these application development lifecycle best practices… but I’m always happy to be proved wrong!

Publish to SSIS Catalog using PowerShell

Here is a PowerShell script that I use to provision and setup SSIS Catalogs when I’m demoing SQL Server 2012. It does the following:

  1. Connects to localhost (default instance)
  2. Drops the existing SSIS Catalog
  3. Creates a new SSIS Catalog
  4. Creates a Folder
  5. Deploys a Project
  6. Creates an Environment
  7. Creates three server varaibles
  8. Creates an environment reference for the project

Be careful of step #2 above – this script is great for resetting a demo environment, but you’ll want to modify it if you’re using it for real deployments!

 

Should I use the SQL Server Destination?

Short answer:

No.

Long answer:

Noooooooooooooooooooooooooooooooooooooooooooooooooo.

UPDATE:

I was told that some people would like an explanation instead of blindly trusting my advice (weird), so here goes:

In SQL 2005 RTM, the SQL Server Destination was the fastest way to load data into SQL Server with SSIS. Somewhere along the way (in the 2005 SP2 / 2008 timeframe), the performance of the OLE DB Destination (using Fast Load) surpassed the SQL Server Destination in many scenarios. This was due to a couple of different reasons:

  • Multiple improvements were made to the SQL Native Client OLE DB provider
  • Additional checks and constraints were added to the shared memory based loading mechanism used by the SQL Server Destination

I saw results from some performance tests we ran shortly after the 2008 release and was surprised to see that the SQL Server Destination could be 10% faster to 10% slower than loading the same data with the OLE DB Destination. The performance would vary based on the system the tests were run on, the data types involved, and the overall row width. While it does appear that the SQL Server Destination can be faster, it’s just as likely that it will be slower. In most cases, the performance is about the same.

My recommendation is that if you need every bit of performance (a 10% perf increase on a 10 hour load can be significant), try out the SQL Server Destination to see how it works for you. However – keep in mind the following limitations of the SQL Server Destination:

Given these limitations, I recommend using the OLE DB Destination even if you are seeing a performance increase with the SQL Server Destination.

DQS – Using the Changes History Panel

The DQS client will automatically sort the list of Domain Values, which can make adding new values to a big list tricky. Once you enter the new value at the bottom of the list, it gets automatically sorted, and you need to scroll up the list to find it again. However, there is a better way!

At the far end of the toolbar, there is a drop down button which exposes the lesser used commands. One of these is the ShowHide domain values changes history panel button, which turns out to be incredible useful.

Domain Values Changes History panel button

When the panel is enabled, it is displayed at the bottom of the Domain Values page. When a value is added or modified, the change will be displayed in the log window, with a hyperlink to the domain value. Clicking on this link will automatically scroll/focus on the list to the target value.

image