Iterating over multiple pages of web data using Power Query

This is one of my go-to demos for showing off the Power Query Formula Language (M). This approach works well with websites that spread data over multiple pages, and have one or more query parameters to specify which page or range of values to load. It works best when all pages have the same structure. This post walks through the end to end process, which includes:

  1. Creating the initial query to access a single page of data
  2. Turning the query into a parameterized function
  3. Invoking the function for each page of data you want to retrieve

This example uses the yearly box office results provided by

You can find a video of this demo in the recording Theresa Palmer and I did at TechEd North America earlier this year.

Creating the initial query

Access one of the pages of data on the site using Power Query’s From Web functionality.


From Web actually generates two separate M functions – Web.Contents to access the URL that you enter, and then another function based on the content-type of the URL. In this case we are accessing a web page, so Web.Contents gets wrapped by a call to Web.Page. This function brings up a Navigator that lets you select one of the tables found on the page. In this case, we’re interested in the second table on the page (labeled Table 1). Selecting it and clicking Edit will bring up the Query Editor.


From here, we can filter and shape the data as we want it. Once we’re happy with the way it looks, we will convert it to a function.

Turning a Query into a parameterized Function

Open the Advanced Editor to bring up the M code behind your query.


For the sake of the demo, I’ve kept the query simple – I’m just accessing the data, and I’ve removed the “Changed Type” step that Power Query automatically inserted for me. The only shaping I did was to remove the bottom 3 summary rows on the page. My code now looks this:

Note that the url value in the call to Web.Contents contains a query parameter (page) that specifies the page of data we want to access.

To turn this query into a parameterized function, we’ll add the following line before the let statement.

(page as number) as table =>

The two as statements specify the expected data types for the page parameter (number) and the return value of the function (table). They are optional, but I like specifying types whenever I can.

We’ve now turned our query into a function, and have a parameter we can use within the code. We are going to dynamically build up the query string, replacing the existing page value in the URL with the page parameter. Since we’ve indicated that page is a number, we will need to convert the value to text using the Number.ToText function. The updated code looks like this:

Clicking Done on the advanced editor brings us back to the query editor. We now have a function expecting a parameter.


You can click on the the Invoke button and enter a page value to test it out.



Be sure to delete the Invoked Function step, then give the function a meaningful name (like GetData). Once the function has been given a good name, click Close & Load to save the query.

Invoking the function for each page of data you want to retrieve

Now that we have a function that can get the data, we’ll want to invoke it for each page we want to retrieve. M doesn’t have any concept of Loops – to perform an action multiple times, we’ll need to generate a List (or Table) of values we want to act on.

From the Power Query ribbon, select From Other Sources –> Blank Query. This brings up an empty editor page. In the formula bar, type the following formula:

= {1..7}

This gives us a list of numbers from 1 to 7.


Convert this to a table by clicking the To Table button, and click OK on the prompt.

Rename the column to something more meaningful (i.e. “Page”).

Go to the Add Column menu, and click Add Custom Column.

We can invoke our function (GetData) for each page with the following formula:


Click OK to the return to the editor. We now have a new column (Custom) with Table values. Note – clicking the whitespace next to the “Table” text (and not “Table” itself) will bring up a preview window in the bottom of the editor.


Click the Expand Columns button to expand the table inline.


The full query now looks like this

Clicking Close & Load brings us back to the workbook. After the query executes, we can scroll to the bottom of the sheet to see that we’ve pulled in 7 pages of data



Converting a Query to a Function in Power Query

You can turn a query into a function by adding a single line before your M query:

() =>

That is a pair of parenthesis ( () ), followed by a goes-to symbol ( => ). Any parameters for your function would go in between the parenthesis.


In the Query Editor, click on the View page, and then click Advanced Editor.


The Advanced Editor will display your M query. Add () => before the starting let statement.


Click Done to return to the Query Editor. You should now see a single Applied Step, and the option to Invoke your function.


Clicking Invoke simply runs your previous query, which isn’t all that interesting. Delete the Invoked step (if you clicked the button), and go back to the Advanced Editor to add a parameter to your function.

Note that when you bring up the editor, it looks a little different …


The code will now look something like this:

Power Query has automatically prepended some code based on the name of your query. You can remove the outer let statement, or simply ignore it for now. Add a parameter in-between the parenthesis (for example, “page”).

Click Done to return to the editor. Notice that the editor now displays the parameter your entered. It is typed any because we didn’t explicitly specify a type on the parameter.


Clicking Invoke will prompt us for the parameter.


Invoking the query with the parameter doesn’t change our results at all, since we never actually referenced the parameter within our M code… I’ll explain that process in an upcoming post.

Check out the links before for more information on the Power Query Formula Language (“M”):

Notepad++ language file for the Power Query formula language (“M”)

Power Query doesn’t have syntax highlighting in its advanced editor (yet?). In the meantime, here is a language file for Notepad++. It’s not perfect, but I find it useful when writing M queries. It lists all of the current library functions as keywords (as of the PQ October update), recognizes comments, strings, literals, and language keywords.

You can find it on my One Drive share – here. You can find instructions on how to install a user defined language file in Notepad++ on their wiki site.



For more information on writing queries using the Power Query formula language, you can check out the deep dive session that Theresa Palmer and I gave at TechEd North America earlier this year.

Defining Configurable Settings for your Queries

(The information in this blog post is current as of the March 2014 release of Power Query.)

The Problem

A colleague sent me a copy of the Excel workbook they used for a blog post about using Power Query with Project Online (see the fantastic article: Creating burndown charts for Project using Power Pivot and Power Query). The workbook has a query which uses the Project OData API to pull down Task information to produce a “burndown” chart. The query was configured to pull down the last 90 days worth of data, and I wanted to make it a little more configurable. The original query looked like this:

The highlighted line shows the filter that limits the data to 90 days (#duration(90, 0, 0, 0)). To change the value, we’d need to modify the formula in the Power Query editor (or open up the Advanced Editor and change it there). Making the change isn’t hard, but I wanted to make the workbook a bit more flexible so that anyone I shared it with could set the value without having to understand the script.

Using a Configuration Query

A Power Query query can reference other queries within the current workbook. We’re going to use this functionality to create a query that returns the settings we want, and then reference it from other queries to avoid hard coding anything. The steps will be:

  1. Create an Excel table that contains the settings we want to reference
  2. Create a query (named Settings) that reads the table
  3. Replace the hardcoded values with a reference to the Settings query

The Excel table is simple – a header row and a single row of data. The table should have a column for each setting we want to reference by name. In this case we have a single column the number of days we want the Project Online query to grab – we’ll call it DaysToKeep.


Next we create a query using the From Table button on the Power Query ribbon.


We’ll apply the following transformation steps:

  • Set the right data types
  • Convert the table to Records using the Table.ToRecords function
  • Select the first (and only) record
  • Disable loading the query (i.e. deselect “Load to Worksheet”)
  • Name the query something easy to remember (ex. Settings)

The query looks like this:

This gives us a query that returns a Record, which lets us refer to each field by name.


After saving the query, we’ll be able to use Settings[DaysToKeep] in any other query within our workbook, and it will evaluate to “30” at runtime. We can test this out by created a new query (Power Query –> From Other Data Sources –> Blank Query), and typing it into the formula bar.


Going back to the original burndown query, we can now replace the hard coded 90 value with the reference to the DaysToKeep setting.

After we change the query to use the Settings value, we’ll receive a prompt from Power Query’s Formula Firewall.


This prompt occurs because we are using the DaysToKeep value in a filter statement, which ends up getting folded into the query sent to the OData feed. The firewall prompt is meant to prevent unintentional data disclosure. In this case we’re not sending private information to the other data source (i.e. the number 30), but it would be a concern if we were joining a private data set (i.e. a customer list) with a public web service. Clicking Continue brings up the Privacy levels dialog.


From here we can specify that the OData feed has a privacy level of Organizational, and the settings from the Current Workbook can be considered Public (since they aren’t sensitive in any way). This satisfies the security requirements for the Power Query formula firewall, and lets the engine fold the values into the query as expected. Note, if the privacy levels didn’t allow folding (i.e. Current Workbook is set to Private, and the OData feed is Public), Power Query would do the filtering locally (in memory), rather than including the filter in the query. Once the privacy levels have been set, our query runs successfully.


After clicking apply, the query refreshes and brings a total of 110 rows into the data model.


To pull more data, we can simply change the value in the settings table, and refresh the query.



Skylanders Power View Workbook

I had a lot of requests for the Skylanders BI Application I showed off at the BI Power Hour, so I’ve made it available on skydrive. For those that missed it, the “app” is an Excel workbook which uses Power View (on top of Power Pivot) that I created to help my kids made better decisions as to which toys they should buy.


The content in the workbook comes from the Spyro wiki at Wikia and is licensed under the Creative Commons Attribution-Share Alike License.

Error: The Microsoft Access database engine cannot open or write to the file

Issue – you run an SSIS package that uses an Excel Source or Excel Destination and get the following error:

[Connection manager "Excel Connection Manager"] Error: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80004005.

An OLE DB record is available. Source: "Microsoft Access Database Engine" Hresult: 0x80004005 Description: "The Microsoft Access database engine cannot open or write to the file ”. It is already opened exclusively by another user, or you need permission to view and write its data.".

Steps to resolve:

  1. Verify that the file isn’t actual open (check task manager for suspicious EXCEL.EXE processes)
  2. Verify that there are no other SSIS packages (or other background processes) that might be opening this file at the same time
  3. If you are in Visual Studio when this error happens, restart VS and try again

The third issue typically happens if you click the Preview button on the component editor. This ends up loading the Excel file through the Access database engine, and can keep a reference to the file open until the designer is restarted.


Refresh an Excel Workbook with a Script Task

Are you loading data from an Excel workbook with external queries using SSIS? Do you want to make sure those queries are refreshed before the load takes place? Have I got the post for you! I recently had to do this myself, and after some googling with Bing, I found an MSDN forum post that describes how to refresh all of the queries within a workbook using an SSIS Script Task. I’ve adapted the code from the post, and you can find it below.

For the code to compile, you’ll need to add a reference to Microsoft.Office.Interop.Excel within your script task’s project.


Next, select the reference under the project explorer, and open the properties window. You’ll need to disable the Embed Interop Types option. (This assumes you’re using SQL 2012, which supports .NET 4 … if you are using 2008 with .NET 2/3.5, you won’t have to do this).


Include the following code in your script task. It iterates over all of the Excel connection managers within your package, loads the workbook, and calls RefreshAll() to refresh each of the queries in the workbook. The task then sleeps for 20 seconds (to give time for the query refresh to complete) before exiting.

I kept the code simple for this example, but there are a few different optimizations you can make here:

  • Add error handling
  • Pass in the excel file name through a variable instead of iterating over each connection manager
  • Refresh a specific query within the workbook, rather than calling RefreshAll
  • Find your query within the workbook and set the BackgroundQuery property to False to avoid the asynchronous refresh call, and having to sleep

Using Data Explorer with Power View

In my first Data Explorer post, I pulled a list of event data from the SQL Saturday website. I then showed how to do dynamic lookups against an address verification service to pull back location data about the event’s venue. In this post I’ll complete my experiment and plot out each location on a map so I can easily see how far away each event is.

I’ll start off with the data I had from my previous post. Within the data, we have Latitude and Longitude values which we will use to create our location map with Power View.

The first thing we’ll do is create a Data Model in Excel. We do this by selecting our data, and clicking on the Add to Data Model button on the POWERPIVOT tab.


This brings up the PowerPivot window, with all of the columns from the Data Explorer query. image

The Address.Latitude and Address.Longitude columns are currently set as General/Text fields. We’re going to change this to Decimal Number by selecting the columns, and changing the Data Type field.


On the Advanced tab, we’ll set the Data Category values to Latitude and Longitude. This allows the columns to be automatically classified when we’re plotting our data.


Some other settings that will make visualization easier would be:

  • Setting the appropriate category values for City, State and Country
  • Fixing the “Summarize By” setting for Confidence, Latitude and Longitude

Once we’re done updating our data model, we can close the PowerPivot model and insert Power View report.


Select the fields we want to use in our report.


Select the original table it put in our report, and click the Map button to convert it.

The default categorization of the columns didn’t really make sense, but we can easily change that by dragging the fields around to where they belong.


Voila! In just a few minutes, we have a chart that maps our events across the US.


Dynamic Lookups With Data Explorer

In my previous post, I used Data Explorer to grab the list of upcoming events from the SQL Saturday website. Excel has had the ability to grab data from HTML pages for awhile, but using Data Explorer provided a richer set of options, and allowed me to apply a series of transforms to get the data exactly how I wanted it.

Using this events data, I want to do two things:

  • Enrich the data with some more details about the event venue’s location
  • Plot the events on a map/chart so I can see which ones are close to me

This post will cover the first step, and let’s me show off some of the more advanced Data Explorer functionality.

Address Lookups

There are a number of web services that allow you to validate an address, or fetch additional details (such as latitude and longitude). I just happen to have a subscription to Melissa Data’s Address Check service, provided through the Windows Azure Marketplace. While the service was designed to work with Data Quality Services, it can also be invoked programmatically, and returns a whole bunch of additional information for the address you provide. It will even return suggestions when an exact match for the address isn’t found.


I previously covered how to retrieve data from the Windows Azure Marketplace, but unfortunately that method won’t help us here because we’re not dealing with a pre-defined set of data (I’m not sure how long it would take to pull in all possible US and Canada address combinations to Excel, but it’s probably not something I’d want to try on my laptop). The Address Check service I want to use is an OData Service (in fact, all data market offerings have OData interfaces), and it exposes a single function with three parameters

  • Address
  • MaximumSuggestions
  • MinimumConfidence

I found all of the details (including the service URL) on the service description page in the marketplace.

Dynamic Lookups

I’ll use the list of events I retrieved in my previous post as a starting point. To keep the original query intact, I’ll Duplicate my query using the option on the QUERY tab.


Next, I click on the Filter & Shape button to bring up my query.


I want to invoke the address lookup function on my OData Service for each row of my data set, and pass in the value of the Location field, but couldn’t find an option that would allow this on the query editor UI. Instead, we can turn to the Data Explorer formula language, and start adding some custom transformations.

First, we need to add a new transformation step by clicking the formula button.


By default, the step we just created displays the same data as the previous step. We can change this by modifying the value in the formula bar. Note, at the moment there is no intellisense or syntax highlighting for the formulas, but I’m hoping we’ll see this in a future update. The formula to invoke the external OData Service took some trial and error, but I eventually got it working.

First we define the lookup as an auxiliary query. We’ll name the step Lookup to make it easy to reference.

= OData.Feed(“”){0}[Data]

This formula is accessing the OData Service at the given URL, and returning the data ([Data]) at the first member ({0}), which in this case is the Melissa Data SuggestAddress function that we’re looking for.


When the function is returned, Data Explorer provides a UI that shows us the arguments, and indicates that the result will come back in a table format. Pressing the Invoke button lets us enter static values (shown below for demonstration purposes), but as we’re interested in dynamic lookups using the data from our existing query, we’ll skip this for now.


We then add a second step which invokes the Lookup function we’ve defined for each row in our data set. We can click the formula button again to create the new step, and use this value:

= Table.AddColumn(RenamedColumns, “Address”, each Lookup([Location], 1, 0))

Let’s break down what this formula is doing:

  • Add a new column to the result of RenamedColumns step
  • Name the column “Address”
  • For each row in the data set, call the function defined by the Lookup step
  • When calling the function, pass in the [Location] field (with two hard coded values – 1, 0 – for the remaining arguments)

After entering the formula, we get a notification that credentials are required to access this data source.


Clicking Edit credentials brings up the Data Explorer credential manager. The Windows Azure Marketplace uses the concept of a Feed Key (similar to an “API key” used by many web/REST based APIs). Clicking the link on the page brings you directly to the Marketplace page which displays your key – you can copy & paste it from there. (Be careful with the copy & paste! On my first couple of tries, I included extra whitespace before and after the key, which ended up giving me a permission error when I tried to connect)


After entering the key, we can Save and go back to our query. We immediate see another error:


Formula.Firewall: Query ‘SqlSaturdayEvents’ is accessing data sources which can not be used together.


By default, it appears that Data Explorer doesn’t allow you to pass data between services, as it could result in a “leak”. For example, in this case I’m sending my original Location data (coming from the SQL Saturday website) to the OData Service (run by Melissa Data). Thankfully, there is a way to allow this behavior using functionality called “Fast Combine”.

We can click Done to save the query we have at this point (even though it’s in an error state), and go back to Excel. On the DATA EXPLORER tab, we can see the Fast Combine option.


Clicking it brings up a warning dialog that enabling Fast Combine may improve query performance, but may introduce privacy issues.


Click Enable, go back to Filter & Shape, and now the data is displayed.


The results of the OData Service come back as a table, which we can expand inline by clicking the small icon next to the column title.


After selecting the columns we want, we can see the new values in our data set.


The Melissa Data service we’re using is for US and Canadian addresses, so we can see that the non-US locations aren’t getting Latitude and Longitude results. Addresses that aren’t matched should come back with a confidence level of 0, so we can use this to filter out these unmatched locations from our list. Our filter will only include rows that have a Confidence value greater than 0.



The remaining results look pretty good. We can click Done to save the query and get the results into Excel.



Wow – I’m impressed with Data Explorer so far. I really like the idea of mashing up data sources like this, and browsing through the formula language gave me an idea of what’s possible.

In my next post, I’ll take a look at importing the data into a PowerPivot, and then doing some visualizing with Power View.

For reference, here are the queries I used at each step:


Source = Table.TransformColumns(Data0,{},Value.FromText)
HiddenColumns = Table.RemoveColumns(ChangedType,{“Column1″, “Column3″})
SplitColumnDelimiter = Table.SplitColumn(HiddenColumns,”Column4″,Splitter.SplitTextByEachDelimiter({“-“}, null, false),{“Column4.1″, “Column4.2″})
SplitColumnPos = Table.SplitColumn(SplitColumnDelimiter,”Column4.1″,Splitter.SplitTextByPositions({0, 13}, false),{“Column4.1.1″, “Column4.1.2″})
HiddenColumns1 = Table.RemoveColumns(SplitColumnPos,{“Column4.1.1″})

= Table.RenameColumns(HiddenColumns1,{{“Column2″, “Date”}, {“Column4.1.2″, “Number”}, {“Column4.2″, “Name”}, {“Column5″, “Location”}})

Lookup = OData.Feed(“”){0}[Data]

= Table.AddColumn(RenamedColumns, “Address”, each Lookup([Location], 1, 0))

Expand Address

= Table.ExpandTableColumn(AddressLookup, “Address”, {“Confidence”, “Latitude”, “Longitude”, “City”, “State”, “Country”}, {“Address.Confidence”, “Address.Latitude”, “Address.Longitude”, “Address.City”, “Address.State”, “Address.Country”})


= Table.SelectRows(#”Expand Address”, each [Address.Confidence] > 0)

Exploring Data Explorer

This week the Data Explorer team announced that a preview of their Excel add-in is now available! I’ve been meaning to try it out for a while, so I went ahead and downloaded the Office 2013 version then walked through some of the tutorials on the Office Help site.

Once the add-in is installed, it shows up under the DATA EXPLORER tab. The first set of buttons allow you to grab data from different sources. The options are similar to what you find under the DATA table, but there are some new additions as well.


I was recently going through the list of upcoming SQL Saturday events, and trying to decide which ones I might be able to attend. I thought this “data set” would make a good example for this post.

Clicking the From Web button in the ribbon brings up a box to enter your URL.


Once Data Explorer has loaded the page, it parses out any potential sources of data (i.e. tables) it finds on the page. The SQL Saturday Events page has two tables – one that shows upcoming events, and one that shows past events. Selecting the table shows you the data you’re working with.


You can remove columns you don’t want/need by clicking on the column header to select it, right clicking, and selecting Hide Columns.



Parsing/splitting field values is pretty straight forward as well. Here I wanted to split the name of the event up into two fields – the event number, and the event title (city, year). To do this, you right click on the field and select Split Column.


In this case, our field is consistently formatted, so we can simply split on a dash (-) character.


We now have two separate fields.


I didn’t want the “SQLSaturday #” prefix, so I used another split to give me the event number



I then removed the first split column, as it was no longer needed.



Data Explorer generates default column names (Column1, Column2), and you also have the option to use the first row as headers. My data didn’t have any headers in it, so I manually renamed the columns.


Each time you apply a transform in Data Explorer, it adds a “Step” to your process. You can view them all by expanding the Steps tab on the left hand side of the window. Clicking on various steps will show you what the data looks like in each stage. You can view the formula that is being applied as well. I’m sure advanced users will play directly within the formula bar, but most of the options you’ll need are exposed through the UI.


After renaming all the columns, my data looks like this:


When I click the Done button, a new sheet is added in Excel with the query I just built. I can provide a friendly name for the query, as well as modify it and refresh the data.


For review, my queries at each step were:


Source = Table.TransformColumns(Data0,{},Value.FromText)
HiddenColumns = Table.RemoveColumns(ChangedType,{“Column1″, “Column3″})
SplitColumnDelimiter = Table.SplitColumn(HiddenColumns,”Column4″,Splitter.SplitTextByEachDelimiter({“-“}, null, false),{“Column4.1″, “Column4.2″})
SplitColumnPos = Table.SplitColumn(SplitColumnDelimiter,”Column4.1″,Splitter.SplitTextByPositions({0, 13}, false),{“Column4.1.1″, “Column4.1.2″})
HiddenColumns1 = Table.RemoveColumns(SplitColumnPos,{“Column4.1.1″})
RenamedColumns = Table.RenameColumns(HiddenColumns1,{{“Column2″, “Date”}, {“Column4.1.2″, “Number”}, {“Column4.2″, “Name”}, {“Column5″, “Location”}})

Overall, I’m really liking the Data Explorer experience so far. I think the team has done an excellent job in providing a UI that is easy and intuitive. Looking at the formula language, I can see that there is a lot of capabilities behind it as well.

In my next post, I’ll explore more of the more advanced functionality.