I recently stumbled upon CANSIM – Canada’s equivalent of Data.Gov. The site has a lot of really interesting datasets that come in handy if you’re someone who frequently creates data focused demos. I wanted to play around a bit more with Data Visualizations in Excel (specifically, the Scatter Chart over Time feature in Power View), and thought that Canadian population statistics would be a good place to start.
The CANSIM site provides a default view for each dataset, with a few links that let us customize the view. If this view of the dataset is all we’re looking for, we could use the From Web functionality in Excel to import the data into our workbook.
Data Explorer provides a similar From Web feature. Unfortunately, this doesn’t currently work with the CANSIM site because the site redirects us to a survey page the first time you access it. Excel’s "From Web” feature brings up a web browser, allowing us to click through to the dataset we want. Data Explorer simply displays the raw data, so we end up with an empty table and no way to access the view that we want. Boo.
Thankfully, the CANSIM site also allows us to download the raw data behind each dataset. This is actually better for what we want to do, as Excel has all of the processing functionality we need to slice and visualize the data that way we want it. Clicking the Download tab on the dataset page gives us a couple of different format options. CSV is the easiest data format to work with in Excel, so that’s what we’ll download.
Loading the Data into Excel
After downloading and extracting the dataset (it comes zipped), we’ll load and format it using Data Explorer’s From File | From CSV option.
This dataset is pretty simple, with rows for the reference date (Ref_Date), geography (GEO), and population (Value). We’ll only need to do a couple of things to get the data in the format that we want.
First we click the table icon and select Use First Row and Headers to specify the column names.
Next, we’ll filter out the summary rows (total population for Canada), because we can calculate those directly. We can do this by right clicking on a value in the GEO column and selecting Text Filers | Does Not Equal from the context menu.
We’re left with a pretty simple query:
Source = Csv.Document(File.Contents("C:data