Introduction to Regular Expressions in OPAL
This month’s topic covers regular expression operations found in OPAL. Regular expressions, or regexes for short, are a way to match text against a user-specified pattern. It’s a huge topic and one that many people struggle with because the syntax can be confusing at times. But when you need it, there are few better tools than regexes.
If you haven’t used regular expressions much, you might not know there isn’t one universal syntax. While the basics generally work with all implementations, each has a slightly different set of options and extensions. Since Observe is built on Snowflake, OPAL’s regular expressions use the same POSIX Extended Regular Expression syntax Snowflake does.
Check out this Snowflake docs page if you use regexes with other tools to see how POSIX ERE differs. And if you need a general refresher on how regular expressions work, Regular-Expressions.info has many beginner tutorials and examples.
Regular Expressions in OPAL
In our last OPAL blog post, we introduced regex filter expressions to filter datasets, but there are also ways to match and extract text from individual fields. We won’t cover everything you can do with regexes in OPAL, but here are a couple of functions (and one verb) that you might find useful in your scripts.
Function: match_regex()
and match_regex_all()
Use match_regex()
to determine if the contents of a field contains your desired pattern. The expression will return true
if the pattern matches and false
if it doesn’t. This function is often combined with if()
to identify data for further manipulation. For example, look for failed webhook requests and classify them by the type of failure:
On the other hand, match_regex_all()
returns an array containing all matches in the target string. Use this if you want the matching values found in a particular field. Here’s an example that extracts a list of IP addresses from the field log:
Function: replace_regex()
Unsurprisingly, replace_regex() replaces matching patterns with new text. This example removes extra whitespace in a log of SQL queries, and updates the existing QUERY_TEXT field in place:
You don’t have to replace the contents of the field, you can simply create a new field instead. If you want to see which values got updated, compare old and new:
Verb: extract_regex
Our next verb, extract_regex creates new fields from the matched data. Yes, you can combine match_regex_all()
with make_col()
, but extract_regex
allows you to use named capture groups to extract multiple values.
What are named capture groups? Well, most regular expression parsers allow you to reference multiple matches by their position in the regex (e.g. 1, 2, 3, etc.), but some allow assigning names to these matches, called capture groups. In theory, this makes them much easier to work with rather than dealing with indices.
In OPAL, extract_regex
creates new fields for each named capture group. This example creates two new fields, jobName
and buildId
, from build data found in Jenkins logs:
To better understand how named capture groups work with extract_regex
, let’s look at the complete example.
The source data (Jenkins Build Logs) contains JSON objects with several properties. However, none of those properties are the jobName
or buildId
. To get those, you have to extract them from the path.
Here’s how those values are extracted using OPAL:
Learn More
This guide has hopefully given you what you need to start using the power of regexes in OPAL. As always, we can’t cover everything in a single post so for more examples and best practices for regexes, check out the docs page for our regex functions, as well as the page for extract_regex.
If you’d like to know more about regexes in OPAL, or are stuck and need a bit of help, visit our Slack support channel (#opal
) and one of our engineers should be able to assist you.