Regular expressions

This tutorial describes the syntax of regular expressions in Praat.

Introduction

A regular expression (regex) is a text string that describes a set of strings. Regular expressions are useful as a way to search for patterns in text strings and, optionally, replace them by another pattern.

Some regular expressions match only one string, i.e., the set they describe has only one member. For example, the regex "ab" matches the string "ab" and no others. Other regular expressions match more than one string, i.e., the set they describe has more than one member. For example, the regex "a*" matches the string made up of any number (including zero) of "a"s. As you can see, some characters match themselves (such as "a" and "b"), and these characters are called ordinary characters. The characters that do not match themselves, such as "*", are called special characters or metacharacters. Many special characters are only special characters in the search regex and are ordinary characters in the substitution regex.

You can read the rest of this tutorial sequentially with the help of the "<1" and ">1" buttons.

1. Special characters (\ ^ $ { } [ ] ( ) . + ? | - &)
2. Quantifiers (how often do we match).
3. Anchors (where do we match)
4. Special constructs with parentheses (grouping constructs)
5. Special control characters (difficult-to-type characters like \n)
6. Convenience escape sequences (\d \D \l \L \s \S \w \W \B)
7. Octal and hexadecimal escapes (things like \053 or \X2B)
8. Substitution special characters (\1..\9 \U \u \L \l &)

More in-depth coverage of regular expressions can be found in Friedl (1997).

Links to this page


© David Weenink & Paul Boersma 20180401