Iterating over an unknown number of pages in Power Query

My previous blog post showed how to iterate over a set of web pages in Power Query using a parameterized function. The post contained two queries – the GetData function, and a query to invoke it over a set number of pages.

GetData function

Query to invoke it

This approach uses a pre-generated list of page numbers (Source = {1..7}), which works well if you know the range of pages you want to access. But what do you do if you don’t know the range upfront?

The Power Query Formula Language (M) is (partially) lazy – some steps won’t be fully evaluated until the data they reference is needed. We’ll use this capability to define a query that iterates over a large number of pages (10,000), but dynamically stops itself once the first error is hit.

Let’s break this down:

Line 2 (PageRange) defines a large range of page numbers (1 to 10,000).

Line 3 (Source) uses List.Transform to invoke a function (GetData) over each value in the list. It uses a try…otherwise statement, which will catch errors thrown by GetData. If an error occurs, the otherwise statement  returns null.

Line 4 (First) uses List.FirstN, and passes in a condition (each _ <> null) that essentially says to take all rows until the first null is reached.

Line 5 (Table) converts the list to a table, and then Line 6 (Expanded) fully expands the table to get at the date.

The key to this working is that using Table.ExpandTableColumn causes List.Transform to be lazily evaluated – the function which goes out and grabs the data from the page (GetData) won’t actually be called until the table is expanded. Since the query specifies that we only want rows up until we get our first error/null, the Power Query engine will stop making calls to GetData once it gets back a null value. In this example, we have 7 pages of data – page 8 returns an error page with no table, which causes our query to fail and return null.

Important note: if you try to paste this code into the Power Query editor, and click on any of the steps before the last one (Expanded), List.Transform will not be lazily evaluated … if you watch the requests being made (with Fiddler, for example), you’ll see Power Query trying to evaluate and access all 10,000 pages.

image

 

2 thoughts on “Iterating over an unknown number of pages in Power Query”

  1. Hello Matt,

    Thanks for the great posts. I had a question if you had any idea about how to fix.
    I want to use power query at a web page where enables to show pages only for their members (I have a membership!).
    During the initial source process, it fails to fetch data as it is probably blocked by the site as it requires passwords.

    Do you know how to workaround this situation?

    Thanks
    Nao

  2. Hello Matt,

    Thanks for this great blog (and the other ones as well).

    I have quite a challange that is somewhat more complex (to me anyway) then the scenario you describe in this post.

    My initial webservice url I call via powerquery = “https://start.exactonline.nl/api/v1/5058/crm/Accounts?”

    This returns 60 rows of customers.
    Row number 61 contains “https://start.exactonline.nl/api/v1/5058/crm/Accounts?$skiptoken=guid’84211c57-a626-4a7a-bf8d-10f626061f6d'”.

    Calling this url (via Chrome advanced rest spi plugin) returns the next 60 customers… and so on and so on…

    Is there any magic you can think of to make powerquery return all rows.

    Help is greatly appreciated.

    Kr Wouter

Leave a Reply

Your email address will not be published. Required fields are marked *