M Language File for WordPress Crayon Syntax Highlighter

@cjkoester kindly reminded me that I created an M language file for the fantastic Crayon Syntax Highlighter wordpress plugin that I use on my site. I’ve made it available on my OneDrive share… simply extract it into your plugins/crayon-syntax-highlighter/langs directory, and you should be good to go.

Like the Notepad++ language file I shared, the highlighting is far from perfect/complete, but it’s a good start. If you improve upon it, please let me know!

Parsing Binary data using Power Query formulas

The Power Query formula language (M) contains a number of library functions that allow you to parse binary data. You can use these functions to build queries which parse custom binary file formats. This post contains two samples – a simple query which parses X & Y coordinate values from a binary field, and a more advanced query which reads PNG file headers.

Parsing binary data

The following query defines some binary data (Source), a record format (PointFormat), and a parsing format definition (FileFormat).

Let’s break this down.

The #binary function lets you pass in a list of byte values. Here we have specified a total of 12 bytes.

PointFormat defines a record format with two BinaryFormat.SignedInteger16 fields (2 bytes each), x and y, for a total of 4 bytes.

The first argument to the BinaryFormat.Choice function indicates that we should read the first 4 bytes (the size of an BinaryFormat.UnsignedInteger32). This value is read and passed in as the count parameter to the BinaryFormat.List function. BinaryFormat.List ends up reading the remaining 8 bytes of the binary (count * sizeof(PointFormat)), and outputs a list of records.

let
Source = #binary(
{0x00, 0x00, 0x00, 0x02,
0x00, 0x03, 0x00, 0x04,
0x00, 0x05, 0x00, 0x06}),

PointFormat = BinaryFormat.Record([
x = BinaryFormat.SignedInteger16,
y = BinaryFormat.SignedInteger16]),

FileFormat = BinaryFormat.Choice(BinaryFormat.UnsignedInteger32,
(count) => BinaryFormat.List(PointFormat, count))
in
FileFormat(Source)

If we put this M code into Power Query, convert the list of records to a table, and then expand, we get something like this:

image

Query

Parsing PNG Headers

This sample is provided as an exercise to the reader to understand. To test it out, replace the path value in the call to File.Contents.

Iterating over an unknown number of pages in Power Query

My previous blog post showed how to iterate over a set of web pages in Power Query using a parameterized function. The post contained two queries – the GetData function, and a query to invoke it over a set number of pages.

GetData function

Query to invoke it

This approach uses a pre-generated list of page numbers (Source = {1..7}), which works well if you know the range of pages you want to access. But what do you do if you don’t know the range upfront?

The Power Query Formula Language (M) is (partially) lazy – some steps won’t be fully evaluated until the data they reference is needed. We’ll use this capability to define a query that iterates over a large number of pages (10,000), but dynamically stops itself once the first error is hit.

Let’s break this down:

Line 2 (PageRange) defines a large range of page numbers (1 to 10,000).

Line 3 (Source) uses List.Transform to invoke a function (GetData) over each value in the list. It uses a try…otherwise statement, which will catch errors thrown by GetData. If an error occurs, the otherwise statement  returns null.

Line 4 (First) uses List.FirstN, and passes in a condition (each _ <> null) that essentially says to take all rows until the first null is reached.

Line 5 (Table) converts the list to a table, and then Line 6 (Expanded) fully expands the table to get at the date.

The key to this working is that using Table.ExpandTableColumn causes List.Transform to be lazily evaluated – the function which goes out and grabs the data from the page (GetData) won’t actually be called until the table is expanded. Since the query specifies that we only want rows up until we get our first error/null, the Power Query engine will stop making calls to GetData once it gets back a null value. In this example, we have 7 pages of data – page 8 returns an error page with no table, which causes our query to fail and return null.

Important note: if you try to paste this code into the Power Query editor, and click on any of the steps before the last one (Expanded), List.Transform will not be lazily evaluated … if you watch the requests being made (with Fiddler, for example), you’ll see Power Query trying to evaluate and access all 10,000 pages.

image

 

Iterating over multiple pages of web data using Power Query

This is one of my go-to demos for showing off the Power Query Formula Language (M). This approach works well with websites that spread data over multiple pages, and have one or more query parameters to specify which page or range of values to load. It works best when all pages have the same structure. This post walks through the end to end process, which includes:

  1. Creating the initial query to access a single page of data
  2. Turning the query into a parameterized function
  3. Invoking the function for each page of data you want to retrieve

This example uses the yearly box office results provided by BoxOfficeMojo.com.

You can find a video of this demo in the recording Theresa Palmer and I did at TechEd North America earlier this year.

Creating the initial query

Access one of the pages of data on the site using Power Query’s From Web functionality.

image

From Web actually generates two separate M functions – Web.Contents to access the URL that you enter, and then another function based on the content-type of the URL. In this case we are accessing a web page, so Web.Contents gets wrapped by a call to Web.Page. This function brings up a Navigator that lets you select one of the tables found on the page. In this case, we’re interested in the second table on the page (labeled Table 1). Selecting it and clicking Edit will bring up the Query Editor.

image

From here, we can filter and shape the data as we want it. Once we’re happy with the way it looks, we will convert it to a function.

Turning a Query into a parameterized Function

Open the Advanced Editor to bring up the M code behind your query.

image

For the sake of the demo, I’ve kept the query simple – I’m just accessing the data, and I’ve removed the “Changed Type” step that Power Query automatically inserted for me. The only shaping I did was to remove the bottom 3 summary rows on the page. My code now looks this:

Note that the url value in the call to Web.Contents contains a query parameter (page) that specifies the page of data we want to access.

To turn this query into a parameterized function, we’ll add the following line before the let statement.

(page as number) as table =>

The two as statements specify the expected data types for the page parameter (number) and the return value of the function (table). They are optional, but I like specifying types whenever I can.

We’ve now turned our query into a function, and have a parameter we can use within the code. We are going to dynamically build up the query string, replacing the existing page value in the URL with the page parameter. Since we’ve indicated that page is a number, we will need to convert the value to text using the Number.ToText function. The updated code looks like this:

Clicking Done on the advanced editor brings us back to the query editor. We now have a function expecting a parameter.

image

You can click on the the Invoke button and enter a page value to test it out.

image

image

Be sure to delete the Invoked Function step, then give the function a meaningful name (like GetData). Once the function has been given a good name, click Close & Load to save the query.

Invoking the function for each page of data you want to retrieve

Now that we have a function that can get the data, we’ll want to invoke it for each page we want to retrieve. M doesn’t have any concept of Loops – to perform an action multiple times, we’ll need to generate a List (or Table) of values we want to act on.

From the Power Query ribbon, select From Other Sources –> Blank Query. This brings up an empty editor page. In the formula bar, type the following formula:

= {1..7}

This gives us a list of numbers from 1 to 7.

image

Convert this to a table by clicking the To Table button, and click OK on the prompt.

Rename the column to something more meaningful (i.e. “Page”).

Go to the Add Column menu, and click Add Custom Column.

We can invoke our function (GetData) for each page with the following formula:

GetData([Page])

Click OK to the return to the editor. We now have a new column (Custom) with Table values. Note – clicking the whitespace next to the “Table” text (and not “Table” itself) will bring up a preview window in the bottom of the editor.

image

Click the Expand Columns button to expand the table inline.

image

The full query now looks like this

Clicking Close & Load brings us back to the workbook. After the query executes, we can scroll to the bottom of the sheet to see that we’ve pulled in 7 pages of data

 

image

Converting a Query to a Function in Power Query

You can turn a query into a function by adding a single line before your M query:

() =>

That is a pair of parenthesis ( () ), followed by a goes-to symbol ( => ). Any parameters for your function would go in between the parenthesis.

Example

In the Query Editor, click on the View page, and then click Advanced Editor.

image

The Advanced Editor will display your M query. Add () => before the starting let statement.

image

Click Done to return to the Query Editor. You should now see a single Applied Step, and the option to Invoke your function.

image

Clicking Invoke simply runs your previous query, which isn’t all that interesting. Delete the Invoked step (if you clicked the button), and go back to the Advanced Editor to add a parameter to your function.

Note that when you bring up the editor, it looks a little different …

image

The code will now look something like this:

Power Query has automatically prepended some code based on the name of your query. You can remove the outer let statement, or simply ignore it for now. Add a parameter in-between the parenthesis (for example, “page”).

Click Done to return to the editor. Notice that the editor now displays the parameter your entered. It is typed any because we didn’t explicitly specify a type on the parameter.

image

Clicking Invoke will prompt us for the parameter.

image

Invoking the query with the parameter doesn’t change our results at all, since we never actually referenced the parameter within our M code… I’ll explain that process in an upcoming post.

Check out the links before for more information on the Power Query Formula Language (“M”):

Notepad++ language file for the Power Query formula language (“M”)

Power Query doesn’t have syntax highlighting in its advanced editor (yet?). In the meantime, here is a language file for Notepad++. It’s not perfect, but I find it useful when writing M queries. It lists all of the current library functions as keywords (as of the PQ October update), recognizes comments, strings, literals, and language keywords.

You can find it on my One Drive share – here. You can find instructions on how to install a user defined language file in Notepad++ on their wiki site.

Enjoy!

image

For more information on writing queries using the Power Query formula language, you can check out the deep dive session that Theresa Palmer and I gave at TechEd North America earlier this year.