Refresh an Excel Workbook with a Script Task

Are you loading data from an Excel workbook with external queries using SSIS? Do you want to make sure those queries are refreshed before the load takes place? Have I got the post for you! I recently had to do this myself, and after some googling with Bing, I found an MSDN forum post that describes how to refresh all of the queries within a workbook using an SSIS Script Task. I’ve adapted the code from the post, and you can find it below.

For the code to compile, you’ll need to add a reference to Microsoft.Office.Interop.Excel within your script task’s project.

image

Next, select the reference under the project explorer, and open the properties window. You’ll need to disable the Embed Interop Types option. (This assumes you’re using SQL 2012, which supports .NET 4 … if you are using 2008 with .NET 2/3.5, you won’t have to do this).

image

Include the following code in your script task. It iterates over all of the Excel connection managers within your package, loads the workbook, and calls RefreshAll() to refresh each of the queries in the workbook. The task then sleeps for 20 seconds (to give time for the query refresh to complete) before exiting.

I kept the code simple for this example, but there are a few different optimizations you can make here:

  • Add error handling
  • Pass in the excel file name through a variable instead of iterating over each connection manager
  • Refresh a specific query within the workbook, rather than calling RefreshAll
  • Find your query within the workbook and set the BackgroundQuery property to False to avoid the asynchronous refresh call, and having to sleep

Consuming CANSIM Datasets using Data Explorer

I recently stumbled upon CANSIM – Canada’s equivalent of Data.Gov. The site has a lot of really interesting datasets that come in handy if you’re someone who frequently creates data focused demos. I wanted to play around a bit more with Data Visualizations in Excel (specifically, the Scatter Chart over Time feature in Power View), and thought that Canadian population statistics would be a good place to start.

The CANSIM site provides a default view for each dataset, with a few links that let us customize the view. If this view of the dataset is all we’re looking for, we could use the From Web functionality in Excel to import the data into our workbook.

image

Data Explorer provides a similar From Web feature. Unfortunately, this doesn’t currently work with the CANSIM site because the site redirects us to a survey page the first time you access it. Excel’s "From Web” feature brings up a web browser, allowing us to click through to the dataset we want. Data Explorer simply displays the raw data, so we end up with an empty table and no way to access the view that we want. Boo.

image

Thankfully, the CANSIM site also allows us to download the raw data behind each dataset. This is actually better for what we want to do, as Excel has all of the processing functionality we need to slice and visualize the data that way we want it. Clicking the Download tab on the dataset page gives us a couple of different format options. CSV is the easiest data format to work with in Excel, so that’s what we’ll download.

image

Loading the Data into Excel

After downloading and extracting the dataset (it comes zipped), we’ll load and format it using Data Explorer’s From File | From CSV option.

image

This dataset is pretty simple, with rows for the reference date (Ref_Date), geography (GEO), and population (Value). We’ll only need to do a couple of things to get the data in the format that we want.

image

First we click the table icon and select Use First Row and Headers to specify the column names.

image

Next, we’ll filter out the summary rows (total population for Canada), because we can calculate those directly. We can do this by right clicking on a value in the GEO column and selecting Text Filers | Does Not Equal from the context menu.

image

We’re left with a pretty simple query:

 

We can give the query a meaningful name (click on the “Query1” title at the top of the New Query page), and we’re good to go.

image

Loading the Data into PowerPivot

imageAfter clicking Done, the query is created and the data is imported into our workbook. Data Explorer gives us two main options on how we use this data – we can keep it on the worksheet, or load it into a data model. In this case, we want to perform some calculations on the data with PowerPivot before visualizing it with Power View, so it makes sense to load it directly into the data model. To do this, we turn off the Load to worksheet slider, and then click the Load to data model link. Now when the query is refreshed, the data gets loaded directly into PowerPivot. We can see the model by going to the PowerPivot ribbon menu, and clicking the Manage button.

We should do a few things before we start manipulating the data:

  1. Set the correct data types – We’ll need to set the data types for the Value (Whole Number) and Ref_Date (date) columns. PowerPivot will use whatever data types were set in the Data Explorer query. To set the Value column to a Number (rather than text), we could either do it in the DE query, or at the PowerPivot level.
  2. Set the date format – This is more preference than requirement, but since the original date values didn’t include a time value, we can remove it with formatting.
  3. Rename the table – By default this will be called “Query”. We’d usually want to rename this to something more meaningful, but for this example we’ll keep it as is.

Now that our columns are ready, we can add two calculated columns to make visualization easier.

Total Population

This column will represent the total population of Canada (SUM([Value])) at the given date ([Ref_Date]). The calculation is:

% of Total

This column calculates the % of the total population that this province has. The calculation is simple:

After adding the calculated columns (and applying a % formatting), we end up with a model that looks like this:

image

Scatter Plot in Power View

What I want to do is visualize the population growth of Canada since the start of our data set (1951). I want to see which provinces grew faster than others, and if the population distribution has remained consistent over the past 60+ years. The Power View Scatter Plot graph (with time series functionality) is a great way to do this.

We start by closing the PowerPivot model, and adding a Power View report from the INSERT | Power View ribbon.

On the Power View sheet, we have a Power View Fields pane on the right. Click ALL, and select the columns from our PowerPivot table (Query).

image

Select the data table on the main page, and convert it into a scatter chart by clicking on DESIGN | Other Chart | Scatter Chart.

image

There’s a few different ways we can visualize this, depending on what we’re trying to show.

Option 1 – Focus on % of Total

 

X Value % of Total
Y Value Total Population
Size Value
Details GEO
Play Axis Ref_Date

This option has the % of Total across the X Axis, highlighting which provinces have made up the bulk of Canada’s population, and how it has changed over time.

1951

image

1985

image

2011

image

You can click on a bubble to see its progression over time (you can also ctrl-click to see more than one).

image

This visualization quickly shows how Quebec’s population has been making up less and less of Canada’s total population over time, while Ontario, British Columbia and Albert have been gaining. Plus, it looks like snakes which is pretty cool.

Option 2 – Growth over Time

X Value Value
Y Value Value
Size % of Total
Details GEO
Play Axis Ref_Date

This option puts the population value on both the X and Y axis. The animation of the time series shows the relative growth speed of each province.

(Note, to add a column value twice, right click on the column and select one of the Add as options).

1951

image

1985

image

2011

image

If we play the animation we can see that Quebec and Ontario’s populations steadily increase at about the same rate until the 1970s. Then Quebec’s growth rate visibly shows, while Ontario’s continues to grow at an even faster rate. Interesting! 

Having both the X and Y Axis use the same value gives you a “linear growth” chart. We could have also made the bubbles “bounce around” a bit by adding another dimension (average population age?), but I’ll leave that as an exercise for the reader.

Links

Using OUTPUT Parameters with the OLE DB Source

Did you know that support for OUTPUT parameters was added to the OLE DB Source in SSIS 2012? So many features were added in 2012, this one is often overlooked. (I had completely forgotten about it until someone asked the other day…). This post has a quick walkthrough of how to use them.

For this example, we’ll be using the Production.uspGetList example from Books Online. It looks like this:

We see there are four parameters total:

  1. Product
  2. MaxPrice
  3. ComparePrice (OUTPUT)
  4. ListPrice (OUTPUT)

In the OLE DB Source, we create a connection manager and set the Data access mode to SQL command. In the SQL statement, we can mark parameters with a question mark (?), or hard code values by entering them directly. For this example, we’ll provide a value for MaxPrice, and use variables for the others. I’

 

image

We can map the parameters by clicking the Parameters… button.

image

Some important caveats:

  • You must use the parameter name, rather than the ordinal, when mapping the output parameters.
  • All OUTPUT parameter variables should be of type Object. This is because the provider will likely attempt to set the variable to NULL (DbNull) prior to executing the stored procedure, and only SSIS Object variables support DbNull values. This means you’ll need to cast the variable value to the appropriate type before you use it.
  • The OUTPUT value will be written to the variable when the data flow completes (i.e. in the PostExecute phase). This means you can use the variables in a control flow task after your Data Flow, but not within the same Data Flow.

 

image

Slides from SSIS Performance Design Patterns | TechDays Hong Kong 2013

The slides from my 3rd presentation at the TechDays Hong Kong event are now available. As I mentioned during the talk, design patterns are one of my favorite topics, and something I really enjoyed writing about. This presentation had a subset of the SSIS Performance Design patterns in the full day training sessions I’ve done in the past. See this post for a link to the full deck (over 100 slides).

Thanks for attending!

Slides from Bringing Together SSIS, DQS and MDS | TechDays Hong Kong 2013

A big thank you to everyone who attended my 2nd session at the TechDays Hong Kong event, Enterprise Information Management (EIM): Bringing Together SQL Server Integration Services (SSIS), Data Quality Services (DQS) and Master Data Services (MDS) (wow that’s a long title). The slides are available below. The files for my integrated demo are available on my skydrive share.

Note, this session is very similar to the one I did at DevTeach Montreal and the 2012 PASS Summit (when I co-presented with Matthew Roche). I keep meaning to retire it, but I keep getting asked to present it. I think the integrated demo which shows how all of the products can work together really resonates well with people. If you are interested in this content, be sure to check out the other EIM session that Matthew Roche and I presented at TechEd North America 2012 – the slides are similar, but approaches the problem from the DQS/MDS perspective, rather than focusing on SSIS.

Using Data Explorer with Power View

In my first Data Explorer post, I pulled a list of event data from the SQL Saturday website. I then showed how to do dynamic lookups against an address verification service to pull back location data about the event’s venue. In this post I’ll complete my experiment and plot out each location on a map so I can easily see how far away each event is.

I’ll start off with the data I had from my previous post. Within the data, we have Latitude and Longitude values which we will use to create our location map with Power View.

The first thing we’ll do is create a Data Model in Excel. We do this by selecting our data, and clicking on the Add to Data Model button on the POWERPIVOT tab.

image

This brings up the PowerPivot window, with all of the columns from the Data Explorer query. image

The Address.Latitude and Address.Longitude columns are currently set as General/Text fields. We’re going to change this to Decimal Number by selecting the columns, and changing the Data Type field.

image

On the Advanced tab, we’ll set the Data Category values to Latitude and Longitude. This allows the columns to be automatically classified when we’re plotting our data.

image

Some other settings that will make visualization easier would be:

  • Setting the appropriate category values for City, State and Country
  • Fixing the “Summarize By” setting for Confidence, Latitude and Longitude

Once we’re done updating our data model, we can close the PowerPivot model and insert Power View report.

image

Select the fields we want to use in our report.

image

Select the original table it put in our report, and click the Map button to convert it.

The default categorization of the columns didn’t really make sense, but we can easily change that by dragging the fields around to where they belong.

image

Voila! In just a few minutes, we have a chart that maps our events across the US.

image

Generate Data with a Script Component

I use the following code to generate data with a script component. I’ve found this useful in the past to demonstrate how buffers are handled in the data flow, as well as perf/load testing a destination component. This script will automatically generate data based on the columns you’ve defined for the component – no additional code modifications are needed. For optimal performance, it uses static values, but it could be updated to support randomly generated values as well (it does support random ints – look at the code for details)

To use it, do the following:

  • Add a new Script Component, and make it a Source
  • Bring up the component UI
  • On Inputs and Outputs, define your columns
  • When you are done, click OK to save the component. I know, you haven’t entered the script yet, but it is a good idea to save your component at this point. If for some reason I cancel out of the dialog, or you have an issue with the VSTA editor, you could lost all of the columns you just spent time defining.

image

 

  • Open the UI again
  • (Optional) If you want to be able to control the number of rows that get generated, define an Integer variable called MaxRowCount. Add it as a ReadOnly variable
  • Click the Edit Script… button
  • When the editor opens, replace your main.cs file with the following code

Download the script file here: main.cs.zip

 

Quick Tip: Resolving Oracle Destination Datatype conversion errors

When working with the Oracle Destination (the one from Microsoft/Attunity (2008 | 2012)), you might encounter generic data type conversion errors after mapping your columns.

image

image

Validation error. Data Flow Task: Data Flow Task: Datatype conversion of Oracle Destination.Inputs[Oracle Destination Input].Columns[L_QUANTITY] is not supported.

The error appears because there is not a 1:1 mapping of SSIS Data Flow types to Oracle types, and the Oracle Destination can’t do the conversion for you. A common one is for BIGINT conversions. Unfortunately, the error message doesn’t provide much guidance on what the type mismatch is, but there is an easy way to check.

Right click on the Oracle Destination component, and select Show Advanced Editor …

On the Input and Output Properties tab, expand the Oracle Destination Input list item, and view the external columns.

image

Click the column which is using you the error. On the properties page, you’ll see the SSIS data type that the destination expects.

image

In your data flow, use a Data Conversion or Derived Column transform to convert the original column to the destination data type.

Dynamic Lookups With Data Explorer

In my previous post, I used Data Explorer to grab the list of upcoming events from the SQL Saturday website. Excel has had the ability to grab data from HTML pages for awhile, but using Data Explorer provided a richer set of options, and allowed me to apply a series of transforms to get the data exactly how I wanted it.

Using this events data, I want to do two things:

  • Enrich the data with some more details about the event venue’s location
  • Plot the events on a map/chart so I can see which ones are close to me

This post will cover the first step, and let’s me show off some of the more advanced Data Explorer functionality.

Address Lookups

There are a number of web services that allow you to validate an address, or fetch additional details (such as latitude and longitude). I just happen to have a subscription to Melissa Data’s Address Check service, provided through the Windows Azure Marketplace. While the service was designed to work with Data Quality Services, it can also be invoked programmatically, and returns a whole bunch of additional information for the address you provide. It will even return suggestions when an exact match for the address isn’t found.

image

I previously covered how to retrieve data from the Windows Azure Marketplace, but unfortunately that method won’t help us here because we’re not dealing with a pre-defined set of data (I’m not sure how long it would take to pull in all possible US and Canada address combinations to Excel, but it’s probably not something I’d want to try on my laptop). The Address Check service I want to use is an OData Service (in fact, all data market offerings have OData interfaces), and it exposes a single function with three parameters

  • Address
  • MaximumSuggestions
  • MinimumConfidence

I found all of the details (including the service URL) on the service description page in the marketplace.

Dynamic Lookups

I’ll use the list of events I retrieved in my previous post as a starting point. To keep the original query intact, I’ll Duplicate my query using the option on the QUERY tab.

 image

Next, I click on the Filter & Shape button to bring up my query.

image

I want to invoke the address lookup function on my OData Service for each row of my data set, and pass in the value of the Location field, but couldn’t find an option that would allow this on the query editor UI. Instead, we can turn to the Data Explorer formula language, and start adding some custom transformations.

First, we need to add a new transformation step by clicking the formula button.

image

By default, the step we just created displays the same data as the previous step. We can change this by modifying the value in the formula bar. Note, at the moment there is no intellisense or syntax highlighting for the formulas, but I’m hoping we’ll see this in a future update. The formula to invoke the external OData Service took some trial and error, but I eventually got it working.

First we define the lookup as an auxiliary query. We’ll name the step Lookup to make it easy to reference.

= OData.Feed(“https://api.datamarket.azure.com/MelissaData/AddressCheck/v1/”){0}[Data]

This formula is accessing the OData Service at the given URL, and returning the data ([Data]) at the first member ({0}), which in this case is the Melissa Data SuggestAddress function that we’re looking for.

image

When the function is returned, Data Explorer provides a UI that shows us the arguments, and indicates that the result will come back in a table format. Pressing the Invoke button lets us enter static values (shown below for demonstration purposes), but as we’re interested in dynamic lookups using the data from our existing query, we’ll skip this for now.

image

We then add a second step which invokes the Lookup function we’ve defined for each row in our data set. We can click the formula button again to create the new step, and use this value:

= Table.AddColumn(RenamedColumns, “Address”, each Lookup([Location], 1, 0))

Let’s break down what this formula is doing:

  • Add a new column to the result of RenamedColumns step
  • Name the column “Address”
  • For each row in the data set, call the function defined by the Lookup step
  • When calling the function, pass in the [Location] field (with two hard coded values – 1, 0 – for the remaining arguments)

After entering the formula, we get a notification that credentials are required to access this data source.

image

Clicking Edit credentials brings up the Data Explorer credential manager. The Windows Azure Marketplace uses the concept of a Feed Key (similar to an “API key” used by many web/REST based APIs). Clicking the link on the page brings you directly to the Marketplace page which displays your key – you can copy & paste it from there. (Be careful with the copy & paste! On my first couple of tries, I included extra whitespace before and after the key, which ended up giving me a permission error when I tried to connect)

image

After entering the key, we can Save and go back to our query. We immediate see another error:

image

Formula.Firewall: Query ‘SqlSaturdayEvents’ is accessing data sources which can not be used together.

 

By default, it appears that Data Explorer doesn’t allow you to pass data between services, as it could result in a “leak”. For example, in this case I’m sending my original Location data (coming from the SQL Saturday website) to the OData Service (run by Melissa Data). Thankfully, there is a way to allow this behavior using functionality called “Fast Combine”.

We can click Done to save the query we have at this point (even though it’s in an error state), and go back to Excel. On the DATA EXPLORER tab, we can see the Fast Combine option.

image

Clicking it brings up a warning dialog that enabling Fast Combine may improve query performance, but may introduce privacy issues.

image

Click Enable, go back to Filter & Shape, and now the data is displayed.

image

The results of the OData Service come back as a table, which we can expand inline by clicking the small icon next to the column title.

image

After selecting the columns we want, we can see the new values in our data set.

image

The Melissa Data service we’re using is for US and Canadian addresses, so we can see that the non-US locations aren’t getting Latitude and Longitude results. Addresses that aren’t matched should come back with a confidence level of 0, so we can use this to filter out these unmatched locations from our list. Our filter will only include rows that have a Confidence value greater than 0.

image

image

The remaining results look pretty good. We can click Done to save the query and get the results into Excel.

image

Summary

Wow – I’m impressed with Data Explorer so far. I really like the idea of mashing up data sources like this, and browsing through the formula language gave me an idea of what’s possible.

In my next post, I’ll take a look at importing the data into a PowerPivot, and then doing some visualizing with Power View.

For reference, here are the queries I used at each step:

 

Source = Table.TransformColumns(Data0,{},Value.FromText)
HiddenColumns = Table.RemoveColumns(ChangedType,{“Column1″, “Column3″})
SplitColumnDelimiter = Table.SplitColumn(HiddenColumns,”Column4″,Splitter.SplitTextByEachDelimiter({“-“}, null, false),{“Column4.1″, “Column4.2″})
SplitColumnPos = Table.SplitColumn(SplitColumnDelimiter,”Column4.1″,Splitter.SplitTextByPositions({0, 13}, false),{“Column4.1.1″, “Column4.1.2″})
HiddenColumns1 = Table.RemoveColumns(SplitColumnPos,{“Column4.1.1″})
RenamedColumns

= Table.RenameColumns(HiddenColumns1,{{“Column2″, “Date”}, {“Column4.1.2″, “Number”}, {“Column4.2″, “Name”}, {“Column5″, “Location”}})

Lookup = OData.Feed(“https://api.datamarket.azure.com/MelissaData/AddressCheck/v1/”){0}[Data]
AddressLookup

= Table.AddColumn(RenamedColumns, “Address”, each Lookup([Location], 1, 0))

Expand Address

= Table.ExpandTableColumn(AddressLookup, “Address”, {“Confidence”, “Latitude”, “Longitude”, “City”, “State”, “Country”}, {“Address.Confidence”, “Address.Latitude”, “Address.Longitude”, “Address.City”, “Address.State”, “Address.Country”})

FilteredRows

= Table.SelectRows(#”Expand Address”, each [Address.Confidence] > 0)