Consuming CANSIM Datasets using Data Explorer

I recently stumbled upon CANSIM – Canada’s equivalent of Data.Gov. The site has a lot of really interesting datasets that come in handy if you’re someone who frequently creates data focused demos. I wanted to play around a bit more with Data Visualizations in Excel (specifically, the Scatter Chart over Time feature in Power View), and thought that Canadian population statistics would be a good place to start.

The CANSIM site provides a default view for each dataset, with a few links that let us customize the view. If this view of the dataset is all we’re looking for, we could use the From Web functionality in Excel to import the data into our workbook.

image

Data Explorer provides a similar From Web feature. Unfortunately, this doesn’t currently work with the CANSIM site because the site redirects us to a survey page the first time you access it. Excel’s "From Web” feature brings up a web browser, allowing us to click through to the dataset we want. Data Explorer simply displays the raw data, so we end up with an empty table and no way to access the view that we want. Boo.

image

Thankfully, the CANSIM site also allows us to download the raw data behind each dataset. This is actually better for what we want to do, as Excel has all of the processing functionality we need to slice and visualize the data that way we want it. Clicking the Download tab on the dataset page gives us a couple of different format options. CSV is the easiest data format to work with in Excel, so that’s what we’ll download.

image

Loading the Data into Excel

After downloading and extracting the dataset (it comes zipped), we’ll load and format it using Data Explorer’s From File | From CSV option.

image

This dataset is pretty simple, with rows for the reference date (Ref_Date), geography (GEO), and population (Value). We’ll only need to do a couple of things to get the data in the format that we want.

image

First we click the table icon and select Use First Row and Headers to specify the column names.

image

Next, we’ll filter out the summary rows (total population for Canada), because we can calculate those directly. We can do this by right clicking on a value in the GEO column and selecting Text Filers | Does Not Equal from the context menu.

image

We’re left with a pretty simple query:

let
    Source = Csv.Document(File.Contents("C:\data\00510005-eng.csv")),
    ChangedType = Table.TransformColumns(Source,{},Value.FromText),
    FirstRowAsHeader = Table.PromoteHeaders(ChangedType),
    FilteredRows = Table.SelectRows(FirstRowAsHeader, each [GEO] <> "Canada")
in
    FilteredRows

 

We can give the query a meaningful name (click on the “Query1” title at the top of the New Query page), and we’re good to go.

image

Loading the Data into PowerPivot

imageAfter clicking Done, the query is created and the data is imported into our workbook. Data Explorer gives us two main options on how we use this data – we can keep it on the worksheet, or load it into a data model. In this case, we want to perform some calculations on the data with PowerPivot before visualizing it with Power View, so it makes sense to load it directly into the data model. To do this, we turn off the Load to worksheet slider, and then click the Load to data model link. Now when the query is refreshed, the data gets loaded directly into PowerPivot. We can see the model by going to the PowerPivot ribbon menu, and clicking the Manage button.

We should do a few things before we start manipulating the data:

  1. Set the correct data types – We’ll need to set the data types for the Value (Whole Number) and Ref_Date (date) columns. PowerPivot will use whatever data types were set in the Data Explorer query. To set the Value column to a Number (rather than text), we could either do it in the DE query, or at the PowerPivot level.
  2. Set the date format – This is more preference than requirement, but since the original date values didn’t include a time value, we can remove it with formatting.
  3. Rename the table – By default this will be called “Query”. We’d usually want to rename this to something more meaningful, but for this example we’ll keep it as is.

Now that our columns are ready, we can add two calculated columns to make visualization easier.

Total Population

This column will represent the total population of Canada (SUM([Value])) at the given date ([Ref_Date]). The calculation is:

=CALCULATE(SUM(Query[Value]), ALLEXCEPT(Query, Query[Ref_Date]))

% of Total

This column calculates the % of the total population that this province has. The calculation is simple:

=Query[Value] / Query[Total Population]

After adding the calculated columns (and applying a % formatting), we end up with a model that looks like this:

image

Scatter Plot in Power View

What I want to do is visualize the population growth of Canada since the start of our data set (1951). I want to see which provinces grew faster than others, and if the population distribution has remained consistent over the past 60+ years. The Power View Scatter Plot graph (with time series functionality) is a great way to do this.

We start by closing the PowerPivot model, and adding a Power View report from the INSERT | Power View ribbon.

On the Power View sheet, we have a Power View Fields pane on the right. Click ALL, and select the columns from our PowerPivot table (Query).

image

Select the data table on the main page, and convert it into a scatter chart by clicking on DESIGN | Other Chart | Scatter Chart.

image

There’s a few different ways we can visualize this, depending on what we’re trying to show.

Option 1 – Focus on % of Total

 

X Value % of Total
Y Value Total Population
Size Value
Details GEO
Play Axis Ref_Date

This option has the % of Total across the X Axis, highlighting which provinces have made up the bulk of Canada’s population, and how it has changed over time.

1951

image

1985

image

2011

image

You can click on a bubble to see its progression over time (you can also ctrl-click to see more than one).

image

This visualization quickly shows how Quebec’s population has been making up less and less of Canada’s total population over time, while Ontario, British Columbia and Albert have been gaining. Plus, it looks like snakes which is pretty cool.

Option 2 – Growth over Time

X Value Value
Y Value Value
Size % of Total
Details GEO
Play Axis Ref_Date

This option puts the population value on both the X and Y axis. The animation of the time series shows the relative growth speed of each province.

(Note, to add a column value twice, right click on the column and select one of the Add as options).

1951

image

1985

image

2011

image

If we play the animation we can see that Quebec and Ontario’s populations steadily increase at about the same rate until the 1970s. Then Quebec’s growth rate visibly shows, while Ontario’s continues to grow at an even faster rate. Interesting! 

Having both the X and Y Axis use the same value gives you a “linear growth” chart. We could have also made the bubbles “bounce around” a bit by adding another dimension (average population age?), but I’ll leave that as an exercise for the reader.

Links

Leave a Reply

You must be logged in to post a comment.