Unleash the Power of Regular Expression in PowerFx!

Unleash the Power of Regular Expression in PowerFx!

Describe it, don't specify it!

Just a pre-warning, this article ends up getting a little deeper than others I've done in the past. Bear with me, this is all worth it!


Introduction

You need to find some text within a larger block of text, you may just be checking it's there, or you may also want to extract that text.

With(
{
    MyString: "Hello PowerSnacks Blog"
},
    Mid(
        MyString,
        Find("PowerSnacks", MyString),
        Len("PowerSnacks")))

The above PowerFx will look for the word "PowerSnacks" in the variable "MyString" using the Find() function and return it through the Mid() function. This works great when we know the word we want to look for. We can also use variables here, which could contain the word we're looking for.

But this defines what we're looking for in our text.

How do we describe the string we want to find, rather than defining it?


The Match(), MatchAll() and IsMatch() Function Family

The Match Family of functions is going to be crucial here. The Match functions not only accept text literals for their second parameters, but they also accept Enumerated Regular Expression as well!

  • IsMatch() tests if your match is within the input string and will return a boolean value

  • Match() will return the first instance of a match, this returns a record

  • MatchAll() will perform a Global Match on your input string, this returns a table of results that mimic the schema of the Match() function

//Schemas
Match():
{
    'FullMatch': "Text",    //The Matching Text
    'StartMatch': 0,        //The Index of the match
    'SubMatches':[ { 'Value': "" } ] //Table of SubMatches
}

MatchAll():
[
    {
        'FullMatch': "Text",    //The Matching Text
        'StartMatch': 0,        //The Index of the match
        'SubMatches':[ { 'Value': "" } ] //Table of SubMatches
    } 
]

/*
⚠️ One really important note to remember about these functions, 
the second parameter MUST be either a Text Literal, 
so you can't define patterns, or match options into a variable. ⚠️
*/

RegEx Simplified - Match Enumerators

Also known as Predefined Patterns, these options built into PowerFx allow you to specify Regular Expression patterns with easy-to-remember enumerated values.

The list of options is here and what's great is that these predefined patterns can be combined with an Ampersand (&) to form a Regular Expression Pattern:

With(
    {MyString: "The date today is 10/12/2023 and the time is 14:00:00"},
    Match(MyString, Match.Digit & Match.Digit).FullMatch
)

So by specifying Match.Digit & Match.Digit, we're specifying that we're looking for the first two digits that occur together in our string. You may have already been using Regular Expression and didn't know it!

You can also mix in regular strings into the mix as well, making our matching pattern something like: '##/##/'

With(
    {MyString: "The date today is 10/12/2023 and the time is 14:00:00"},
    Match(
        MyString,
        Match.Digit & Match.Digit & "/" & Match.Digit & Match.Digit & "/"
    ).FullMatch
)

So, because these predefined patterns are just returning Regular Expression anyway, we can add some Regular Expression into the mix: I'll show a change where we can specify we want to find 2 digits:

With(
    {MyString: "The date today is 10/12/2023 and the time is 14:00:00"},
    Match(
        MyString,
        Match.Digit & "{2}" & "/" & Match.Digit & "{2}" & "/"
    ).FullMatch
)

As you can imagine, using these patterns is great to quickly craft some Regular Expression, but when the patterns become complex, we may need to go deeper:


RegEx Complexified 🦀

So we've seen how we can match text literals, match using pre-defined patterns and we've even sprinkled them with a little bit of Regular Expression language, but what about just regular expression itself:

A Note on Regular Expression

I'm going to come right out and say it. At first sight, Regular Expression looks downright scary, looking like random characters plastered across the screen. It can be really difficult to work out what a regular expression pattern may be doing. For example, the Match Enumerator for Email address matching is:

.+\@.+\\.[^\\.]{2,}

It's unrealistic to expect anyone to remember this off-by-heart, or be able to construct complex patterns 'on-the-fly'. Thankfully, many tools exist online to help us write these patterns out and understand how they work.

I use Regex101.com to test regular expression patterns, it's a fantastic tool because it will also describe what patterns are doing, and has a handy cheat sheet at the bottom to show you different elements you can add to your pattern.

I STRONGLY recommend you try the testing tool in Regex101.com and have a good play around with different patterns, even finding some patterns via Google and seeing them broken down will help to give you more of an understanding of what you can do with Regular Expression.

Note, that other tools exist online too. Let me know if you use any other ones :)


Using Regular Expression

To solve our original problem using Regular Expression:

With(
    {MyString: "The date today is 10/12/2023 and the time is 14:00:00"},
    Concat(
        MatchAll(
        MyString,
        "(\d{2}[\W]\d{2}[\W]\d{2,4})",
        MatchOptions.Contains
    ), ThisRecord.FullMatch," - ")
)

Pattern Breakdown (\d{2}[\W]\d{2}[\W]\d{2,4}):

  • () - Matching Group

    • \d{2} - Match 2 Digits (0-9),

    • [\W] - Match Any Non-Word Character (i.e. anything that's not ^a-zA-Z0-9_) Once

    • \d{2} - Match 2 Digits (0-9)

    • [\W] - Match Any Non-Word Character (i.e. anything that's not ^a-zA-Z0-9_) Once

    • \d{2,4} - Match between 2 to 4 Digits (0-9)

We're using MatchAll() above, and concatenating the output into a string for viewing. The pattern used contains patterns that aren't available in the predefined patterns (The \W matches a non-word character btw), so it's always worth remembering that we can use full Regular Expression if we want to. We may want to do this if:

  • We want to match lots of different character types, and using the predefined patterns becomes cumbersome

  • We want to use patterns not available in the pre-defined patterns

  • We want to keep the code lean and not have too much bloat.


Handling Match Groups

Another benefit of adding Regular Expression is you now get to benefit from Match Groups as well.

The MatchAll() function's resulting table has a "SubMatches" option, and this is where you can get access to matching groups.

Explaining Matching Groups

In Regular Expression, you can add Groups into your pattern. The benefit of doing this is to be able to extrapolate text within text. We can also do other funky things like conditionally matching text as well.

I have an example here from RegEx101 on using groups:

Pattern Breakdown (\d{3})(\d{2})(\d{3})(\d{3}):

  • () - Matching Group

    • \d{3} - Match 3 Digits (0-9),
  • () - Matching Group

    • \d{2} - Match 2 Digits (0-9),
  • () - Matching Group

    • \d{3} - Match 3 Digits (0-9)
  • () - Matching Group

    • \d{3} - Match 3 Digits (0-9)

The benefit of using the groups ( The bits in the () brackets ) is that we can specifically extract this word with the MatchAll function!

Below I have a Gallery and Textbox configured. The Gallery is going to take the TextInputs' Text (which in this case is a phone number) and perform a MatchAll() with the above pattern. The Labels in the Gallery display the FullMatch and a Concatenated string of SubMatches (These are groups),

//Match Groups are encased in Brackets ()
Gallery.Items = MatchAll(TextInput1.Text, "(\d{3})(\d{2})(\d{3})(\d{3})")
Gallery.Label1.Text = $"Full Match: {ThisItem.FullMatch}"
Gallery.Label2.Text = $"Formatted Phone Number: {Concat(ThisItem.SubMatches,Value," ")}"

Using the Match Groups in this way, we can extract specific text around other text using the correctly formulated pattern.

Naming Match Groups

Match Groups can also be named to make returning a particular group easier.

//Define a Group Name using ?<Name> at the start of the matching group
MatchAll("Text and crabs", "(?<crabbymatch>crab)(?<plural>s)")
(?<crabbymatch>crab)(?<plural>s)

Pattern Breakdown (?<crabbymatch>crab)(?<plural>s):

  • () - Matching Group - Name "crabbymatch"

    • crab - match the literal word "crab",
  • () - Matching Group - Name "plural"

    • s - Match the character "s",

Creating a named match group creates a column in your Match() or MatchAll() output:

You can still use SubMatches in this instance and get each SubMatch in your Pattern.


Manipulate Matching with MatchOptions

MatchOptions allow us to manipulate the way the Match functions work. They again work as enumerated values but this time they represent Regular Expression Flags, rather than patterns. These Regular Expression Flags also exist in Regex101, so we can enable/disable them as needed. They can also be strung together with ampersands (&).

As an example, if we have a string that contains a block of numbers at the end of the string, we can extract just the last four by employing MatchOptions.EndsWith:

With(
    {MyString: "The date today is 10122023"},
    Match(
        MyString,
        Match.Digit & "{4}",
        MatchOptions.EndsWith
    ).FullMatch
)

Or by using MatchOptions.Complete, we can ensure the entire string matches our pattern, which it won't in the above example:

With(
    {MyString: "The date today is 10122023"},
    Match(
        MyString,
        Match.Digit & "{4}",
        MatchOptions.Complete
    ).FullMatch
)


The Global Flag ⛳

One match option missing from the list is the Global Flag (/g). If we jump back to our previous example of formatting phone numbers. We used the MatchAll() function and not Match(). Using MatchAll() is the equivalent of enabling the Global flag in Regular Expression, meaning the pattern will continue to be matched throughout our text.

If we continued to satisfy the pattern in our previous example, we'd see additional matches:

For instances where we only want to match the first instance (and therefore, disable the Global flag), we just use Match(). The Global flag doesn't have an equivalent MatchOption.


Conclusion

There are many ways to crack this claw, so it's important to remember that there's sometimes no right or wrong way to go about it. What is important to remember, is that should you need to do some more complex string matching, you have the power of Regular Expression behind you to make it happen, however you wish to do so!

  • You can find strings within strings using Mid() and Find() functions, when you know the exact text you're looking for

  • If you know the sort of text you're looking for, but you don't know the exact value, you can use the Match functions.

  • With Match, you can mix and match literal text, Match Enumerators and Regular Expression

  • If you need to match a more complex pattern, you can use full Regular Expression, and tools such as RegEx101.com can help write that.

  • You can manipulate how Match functions work with MatchOptions, which are equivalent to Regular Expression Flags.


Resources