Regular expression basics

Expression Description Example
Any character except [\^$.|?*+() All characters except the listed special characters match a single instance of themselves. { and } are literal characters, unless they’re part of a valid regular expression token (e.g. the {n} quantifier). ‘a’ match ‘a’
\ (backslash) followed by any of [\^$.|?*+(){} make special character function as normal character ‘\+’ match ‘+’
\Q…\E Matches the characters between \Q and \E literally, suppressing the meaning of special characters. ‘\Q+-*/\E’ matches ‘+-*/’
\xFF Matches the character with the specified ASCII/ANSI value, which depends on the code page used. Can be used in character classes. ‘\xA9’ matches ‘©’ when using the Latin-1 code page.
[] matches one character inside square brackets,can use ‘-‘ for range. [^] is complement class of [] [abc] matches a, b or c, [0-9] matches any digit in 0 to 9
\n, \r and \t Match an LF character, CR character and a tab character respectively. Can be used in character classes. \r\n matches a DOS/Windows CRLF line break.
\d,\w,\s,\D,\W,\S \d any digit from 0 to 9, \w any digits and/or letters(e.g. fj490fj), \s whitespace. \D\W\S are their complement [\d\s] matches a character that is a digit or whitespace
. Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too. ‘.’ matches ‘x’ or (almost) any other character
^$ indicates start and end of a regular expression, most regex support them to match line break. ‘^[0-9][A-Z]$’ matches ‘4D’
\A,\Z,\z \A Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Never matches after line breaks.
\Z Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks, except for the very last line break if the string ends with a line break.
‘\A.’ matches ‘a’ in ‘abc’, ‘.\Z’ matches ‘f’ in ‘abc\ndef’
\b,\B \b Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters.
\B Matches at the position between two word characters (i.e the position between \w\w) as well as at the position between two non-word characters (i.e. \W\W).
‘.\b’ matches ‘c’ in ‘abc’, ‘\B.\B’ matches ‘b’ in ‘abc’
| (pipe) means or, matches left part or right part of it abc(def|xyz) matches abcdef or abcxyz
? (question mark) Makes the preceding item optional. Greedy, so the optional item is included in the match if possible. abc? matches ab or abc
*, *? * Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.
*? (lazy star) Repeats the previous item zero or more times. Lazy, so the engine first attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item.
“.*” matches “def” “ghi” in abc “def” “ghi” jkl
{n} {n} Repeats the previous item exactly n times.
{n,m} Repeats the previous item between n and m times. Greedy, so repeating m times is tried before reducing the repetition to n times.
{n,m}? Repeats the previous item between n and m times. Lazy, so repeating n times is tried before increasing the repetition to m times.
{n,} Repeats the previous item at least n times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only n times.
{n,}? Repeats the previous item n or more times. Lazy, so the engine first matches the previous item n times, before trying permutations with ever increasing matches of the preceding item.
a{3} matches aaa, a{2,4} matches aaaa, aaa or aa, a{2,4}? matches aa, aaa or aaaa, a{2,} matches aaaaa in aaaaa, a{2,}? matches aa in aaaaa

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s