Regular expressions

This tutorial describes the syntax of regular expressions in PRAAT

Introduction

A regular expression is a text string that describes a set of strings. Regular expressions (regex) are useful as a way to search for patterns in text strings and, optionally, replace them by another pattern.

Some regex match only one string, i.e., the set they describe has only one member. For example, the regex "ab" matches the string "ab" and no others. Other regex match more than one string, i.e., the set they describe has more than one member. For example, the regex "a*" matches the string made up of any number (including zero) of "a"s. As you can see, some characters match themselves (such as "a" and "b") and these characters are called ordinary characters. The characters that don't match themselves, such as "*", are called special characters or meta characters. Many special characters are only special characters in the search regex and are ordinary characters in the substitution regex.

You can read the rest of this tutorial sequentially with the help of the "<1" and ">1" buttons.

1. Special characters (\ ^ ${ } [ ] ( ) . + ? | - &)
2. Quantifiers (how often do we match).
3. Anchors (where do we match)
4. Special constructs with parenthesis (grouping constructs)
5. Special control characters (difficult-to-type characters like \n)
6. Convenience escape sequences (\d \D \l \L \s \S \w \W \B)
7. Octal and hexadecimal escapes (things like \053 or \X2B)
8. Substitution special characters (\1..\9 \U \u \L \l &)

More in depth coverage of regular expressions can be found in Friedl (1997).

Links to this page


© djmw, July 6, 2001