Flavors and features of regular expressions
Also see this document
POSIX Basic Regular Expressions (BRE) |
POSIX Extended Regular Expressions (ERE) |
grep | egrep | Perl Compatible Regular Expressions (PCRE) |
|
---|---|---|---|---|---|
* ^ $ [ ] | Yes | Yes | Yes | Yes | Yes |
? + | | No | Yes | Yes, but need to add \, e.g. \| | Yes | Yes |
Matching/capture groups | Yes: \(...\) | Yes: (...) | Yes: \(...\) | Yes: (...) | Yes: (...) |
{ } | Yes, but need to add \, e.g. \{ \} | Yes | Yes, but need to add \, e.g. \{ \} | Yes | Yes |
\b \B (Word boundaries) |
No | No | No | Yes | Yes |
\w \W (Alphanumeric characters without _) |
No | No | Yes | Yes | Yes (but _ will be matched too) |
Also, POSIX Regular Expressions and grep/egrep always return the longest match, while PCRE also allows shortest match (ungreedy/lazy quantifiers).
POSIX Regular Expressions and grep/egrep do not accept special characters such as \n \r \t \f \v etc.
See here for the kinds of regular expressions GNU core utilities (grep, find, awk, etc) accept
Special characters
Characters | Meaning |
---|---|
. |
Any single character, except the new line (\n) (and carriage return \r if you work on Windows/DOS) To match any single character, use [\s\S] |
[ ] |
A single character that is contained within the brackets. |
[^ ] |
Any single character that is not contained within the brackets. |
^ |
Beginning position of the string. |
$ |
Ending position of the string. |
( ) |
Matching/capture group. |
\n |
Refer to the n-th matching/capture group. |
* |
Match the preceding element 0 or more times. |
+ |
Match the preceding element 1 or more times. |
? |
Match the preceding element at most 1 time. |
{m} |
Match the preceding element exactly m times. |
{m,n} |
Match the preceding element at least m and no more than n times. n is optional. |
| |
Match either regular expressions (alternation) |
\xf0 |
Match hex character (in this case f0 hexidecimal) |
\021 |
Match octal character (in this case 21 octal) |
\b |
Match "word" boundary |
\B |
Match non-"word" boundary |
Character classes
POSIX | Perl | Meaning |
---|---|---|
[[:alnum:]] |
[A-Za-z0-9] Alphanumeric characters |
|
\w |
[A-Za-z0-9_] Alphanumeric characters plus _ |
|
\W |
[^A-Za-z0-9_] Non-word characters |
|
[[:alpha:]] |
[A-Za-z] Letters |
|
[[:digit:]] |
\d |
[0-9] Digits |
\D |
[^0-9] Non-digits |
|
[[:blank:]] |
[ \t] Space and tab |
|
[[:space:]] |
\s |
[ \t\r\n\v\f] Whitspace |
\S |
[^ \t\r\n\v\f] Non-whitspace |
|
[[:upper:]] |
[A-Z] Upper case letters |
|
[[:lower:]] |
[a-z] Lower case letters |
|
[[:punct:]] |
Punctuation characters | |
[[:print:]] |
Printable characters |