Regular expressions
1.1. Cheatsheet
Regex Cheatsheet (MIT)
Davechild Regular Expressions (Cheatography)
1.2. QuickStart
From http://www.rexegg.com/regex-quickstart.html
Character | Legend | Example | Sample Match |
---|---|---|---|
\d | One digit | file_\d\d | file_25 |
\w | One "word character": letter, underscore or digit | \w-\w\w\w | A-b_1 |
\s | One white space character (e.g.: a tab) | ab\s\s\sc | ab c |
\D | One character that is not a digit | \D\D\D | ABC |
\W | One character that is not a word character | \W\W\W\W\W | *-+=) |
\S | One character that is not a space | \S\S\S\S | Yoyo |
Quantifier | Legend | Example | Sample Match |
---|---|---|---|
+ | One or more | Version \w-\w+ | Version A-b1_1 |
{3} | Exactly three times | \D{3} | ABC |
{2,4} | Two to four times | \d{2,4} | 156 |
{3,} | Three or more times | \w{3,} | regex_tutorial |
* | Zero or more times | A*B*C* | AAACC |
? | Once or none | plurals? | plural |
Character | Legend | Example | Sample Match |
---|---|---|---|
. | Any character except new line | a.c | abc |
. | Any character except new line | .* | whatever, man. |
\. | A period (special character: needs to be escaped by a \) | a\.c | a.c |
\ | Escapes a special character | \.\*\+\? \$\ \/\\ |
.*+? $/\ |
\ | Escapes a special character | \\{\(\)\}\ | {()} |
Logic | Legend | Example | Sample Match |
---|---|---|---|
| | OR operand | 22|33 | 33 |
() | Capturing group | A(nt|pple) | Apple (captures "pple") |
\1 | Contents of Group 1 | r(\w)g\1x | regex |
\2 | Contents of Group 2 | (\d\d)\+(\d\d)=\2\+\1 | 12+65=65+12 |
(?: | Non-capturing group | A(?:nt|pple) | Apple |
Character | Legend | Example | Sample Match |
---|---|---|---|
\t | Tab | T\t\w{2} | T ab |
\r | Return character | see below | |
\n | New line character | see below | |
\r\n | New line in Windows | AB\r\nCD | AB CD |
Quantifier | Legend | Example | Sample Match |
---|---|---|---|
+ | The + (one or more) is "greedy" | \d+ | 12345 |
? | Makes quantifiers "lazy" | \d+? | 1 in 12345 |
* | The * (zero or more) is "greedy" | A* | AAA |
? | Makes quantifiers "lazy" | A*? | empty in AAA |
{2,4} | Two to four times, "greedy" | \w{2,4} | abcd |
? | Makes quantifiers "lazy" | \w{2,4}? | ab in abcd |
Character | Legend | Example | Sample Match |
---|---|---|---|
stuff | One of the characters in the brackets | AEIOU | One uppercase vowel |
- | Range indicator | a-z | One lowercase letter |
stuff | One of the characters in the brackets | AB1-5w-z | One of either: A,B,1,2,3,4,5,w,x,y,z |
stuff | One of the characters in the brackets | A-Z+ | GREAT |
^x | One character that is not x | ^a-z{3} | A1! |
\d\D | One character that is a digit or a non-digit | \d\D+ | Any characters, inc- luding new line |
Anchor | Legend | Example | Sample Match |
---|---|---|---|
Beginning of line (but means "not" inside ^brackets) | abc .* | abc (line start) | |
$ | End of line | .*? the end$ | this is the end |
\A | Beginning of string | \Aabc\d\D* | abc (string... ...start) |
\Z | End of string | \d\D*the end\Z | this is... ...the end |
\b | Word boundary | Bob.*\bcat\b | Bob ate the cat |
\B | Not a word boundary | Bob.*\Bcat\B.* | Bobcats |
Character | Legend | Example | Sample Match |
---|---|---|---|
:alpha: | Letters | [8:alpha:]+ | WellDone88 |
:alnum: | Letters and numbers | [[:alnum:]]{10} | ABCDE12345 |
:punct: | Punctuation marks | [[:punct:]]+ | ?!.,:; |
Lookaround | Legend | Example | Sample Match |
---|---|---|---|
(?= | Positive lookahead | (?=\d{10})\d{5} | 01234 in0123456789 |
(?<= | Positive lookbehind | (?<=\d)cat | cat in 1cat |
(?! | Negative lookahead | (?!theatre)the\w+ | theme |
(?<! | Negative lookbehind | \w{3}(?<!mon)ster | Munster |
1.3. Styles
There perls & bash regexp styles, which are the common ones, maybe. There are other styles (vim, R, ...), but I don't plan to be fully comprehensive, here, since most styles are similar or can be run in perl-style with some flag/param/argument.
1.4. Ubuntu
Packages in Ubuntu 13.10 to help with regular expressions:
1.4.1. codeblocks-contrib: Regular expression testbed
Plugin regexp for Codeblocks editor (Regular expression testbed)
1.4.2. kiki: Tool for python regular expression testing
http://project5.freezope.org/kiki (broken)
A free environment for regular expression testing (ferret). It allows you to write regexes and test them against a sample text, providing
extensive output about the results. It is useful for several purposes:
- exploring and understanding the structure of match objects generated by the re module, making Kiki a valuable tool for people new to regexes.
- testing regexes on sample text before deploying them in code.
Kiki can function on its own or as plugin for the Spe Python editor.
1.4.3. redet: regular expression development and execution tool
http://www.billposer.org/Software/redet.html
Redet allows the user to construct regular expressions and test them against input data by executing any of a variety of search programs, editors, and programming languages that make use of regular expressions. When a suitable regular expression has been constructed it may be saved to a file.
Redet stands for Regular Expression Development and Execution Tool. For each program, a palette showing the available regular expression syntax is provided. Selections from the palette may be copied to the regular expression window with a mouse click. Users may add their own definitions to the palette via their initialization file. Redet also keeps a list of the regular expressions executed, from which entries may be copied back into the regular expression under construction. The history list is saved to a file and restored on startup, so it persists across sessions.
So long as the underlying program supports Unicode, Redet allows UTF-8 Unicode in both test data and regular expressions.
1.4.4. rgxg: command-line tool to generate regular expressions
http://rgxg.sf.net
rgxg (ReGular eXpression Generator) is a command-line tool to generate (extended) regular expressions.
It can be useful to generate (extended) regular expressions to match for instance a specific number range (e.g. 0 to 31 or 00 to FF) or all addresses of a CIDR block (e.g. 192.168.0.0/24 or 2001:db8:aaaa::/64).
xavi@coprinus:~$ rgxg Usage: rgxg COMMAND [ARGS] The available rgxg commands are: alternation Create regex that matches any of the given patterns cidr Create regex that matches all addresses of the given CIDR block escape Escape the given string for use in a regex range Create regex that matches integers in a given range Type 'rgxg help COMMAND' for help information on a specific command. Type 'rgxg version' to see the version of rgxg. xavi@coprinus:~$ rgxg help escape Usage: rgxg escape [options] STRING -h display this help message
1.4.5. txt2regex: A Regular Expression "wizard", all written with bash2 builtins
^txt2regex$ is a Regular Expression "wizard", all written with bash2 builtins, that converts human sentences to RegExs. With a simple interface, you just answer to questions and build your own RegEx for a large variety of programs, like awk, emacs, grep, perl, php, procmail, python, sed and vim. There are more than 20 supported programs.
1.4.6. visual-regexp: Interactively debug regular expressions
http://laurent.riesterer.free.fr/regexp/
This Tcl script shows the result of running a regular expression, making debugging relatively easy. It also assists in the construction
of regular expressions.
1.5. PHP online live regexp help
1.6. Online tools
Alias names for this page:
regexp | regexps | Regular Expression | regexpr