| 182 |
.rs |
.rs |
| 183 |
.sp |
.sp |
| 184 |
The backslash character has several uses. Firstly, if it is followed by a |
The backslash character has several uses. Firstly, if it is followed by a |
| 185 |
non-alphanumeric character, it takes away any special meaning that character |
character that is not a number or a letter, it takes away any special meaning |
| 186 |
may have. This use of backslash as an escape character applies both inside and |
that character may have. This use of backslash as an escape character applies |
| 187 |
outside character classes. |
both inside and outside character classes. |
| 188 |
.P |
.P |
| 189 |
For example, if you want to match a * character, you write \e* in the pattern. |
For example, if you want to match a * character, you write \e* in the pattern. |
| 190 |
This escaping action applies whether or not the following character would |
This escaping action applies whether or not the following character would |
| 192 |
non-alphanumeric with backslash to specify that it stands for itself. In |
non-alphanumeric with backslash to specify that it stands for itself. In |
| 193 |
particular, if you want to match a backslash, you write \e\e. |
particular, if you want to match a backslash, you write \e\e. |
| 194 |
.P |
.P |
| 195 |
|
In UTF-8 mode, only ASCII numbers and letters have any special meaning after a |
| 196 |
|
backslash. All other characters (in particular, those whose codepoints are |
| 197 |
|
greater than 127) are treated as literals. |
| 198 |
|
.P |
| 199 |
If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the |
If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the |
| 200 |
pattern (other than in a character class) and characters between a # outside |
pattern (other than in a character class) and characters between a # outside |
| 201 |
a character class and the next newline are ignored. An escaping backslash can |
a character class and the next newline are ignored. An escaping backslash can |
| 229 |
one of the following escape sequences than the binary character it represents: |
one of the following escape sequences than the binary character it represents: |
| 230 |
.sp |
.sp |
| 231 |
\ea alarm, that is, the BEL character (hex 07) |
\ea alarm, that is, the BEL character (hex 07) |
| 232 |
\ecx "control-x", where x is any character |
\ecx "control-x", where x is any ASCII character |
| 233 |
\ee escape (hex 1B) |
\ee escape (hex 1B) |
| 234 |
\ef formfeed (hex 0C) |
\ef formfeed (hex 0C) |
| 235 |
\en linefeed (hex 0A) |
\en linefeed (hex 0A) |
| 241 |
.sp |
.sp |
| 242 |
The precise effect of \ecx is as follows: if x is a lower case letter, it |
The precise effect of \ecx is as follows: if x is a lower case letter, it |
| 243 |
is converted to upper case. Then bit 6 of the character (hex 40) is inverted. |
is converted to upper case. Then bit 6 of the character (hex 40) is inverted. |
| 244 |
Thus \ecz becomes hex 1A, but \ec{ becomes hex 3B, while \ec; becomes hex |
Thus \ecz becomes hex 1A (z is 7A), but \ec{ becomes hex 3B ({ is 7B), while |
| 245 |
7B. |
\ec; becomes hex 7B (; is 3B). If the byte following \ec has a value greater |
| 246 |
|
than 127, a compile-time error occurs. This locks out non-ASCII characters in |
| 247 |
|
both byte mode and UTF-8 mode. (When PCRE is compiled in EBCDIC mode, all byte |
| 248 |
|
values are valid. A lower case letter is converted to upper case, and then the |
| 249 |
|
0xc0 bits are flipped.) |
| 250 |
.P |
.P |
| 251 |
After \ex, from zero to two hexadecimal digits are read (letters can be in |
After \ex, from zero to two hexadecimal digits are read (letters can be in |
| 252 |
upper or lower case). Any number of hexadecimal digits may appear between \ex{ |
upper or lower case). Any number of hexadecimal digits may appear between \ex{ |
| 1052 |
characters with values greater than 128 only when it is compiled with Unicode |
characters with values greater than 128 only when it is compiled with Unicode |
| 1053 |
property support. |
property support. |
| 1054 |
.P |
.P |
| 1055 |
The character types \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev, \eV, \ew, and |
The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev, |
| 1056 |
\eW may also appear in a character class, and add the characters that they |
\eV, \ew, and \eW may appear in a character class, and add the characters that |
| 1057 |
match to the class. For example, [\edABCDEF] matches any hexadecimal digit. A |
they match to the class. For example, [\edABCDEF] matches any hexadecimal |
| 1058 |
circumflex can conveniently be used with the upper case character types to |
digit. In UTF-8 mode, the PCRE_UCP option affects the meanings of \ed, \es, \ew |
| 1059 |
|
and their upper case partners, just as it does when they appear outside a |
| 1060 |
|
character class, as described in the section entitled |
| 1061 |
|
.\" HTML <a href="#genericchartypes"> |
| 1062 |
|
.\" </a> |
| 1063 |
|
"Generic character types" |
| 1064 |
|
.\" |
| 1065 |
|
above. The escape sequence \eb has a different meaning inside a character |
| 1066 |
|
class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX |
| 1067 |
|
are not special inside a character class. Like any other unrecognized escape |
| 1068 |
|
sequences, they are treated as the literal characters "B", "N", "R", and "X" by |
| 1069 |
|
default, but cause an error if the PCRE_EXTRA option is set. |
| 1070 |
|
.P |
| 1071 |
|
A circumflex can conveniently be used with the upper case character types to |
| 1072 |
specify a more restricted set of characters than the matching lower case type. |
specify a more restricted set of characters than the matching lower case type. |
| 1073 |
For example, the class [^\eW_] matches any letter or digit, but not underscore. |
For example, the class [^\eW_] matches any letter or digit, but not underscore, |
| 1074 |
|
whereas [\ew] includes underscore. A positive character class should be read as |
| 1075 |
|
"something OR something OR ..." and a negative class as "NOT something AND NOT |
| 1076 |
|
something AND NOT ...". |
| 1077 |
.P |
.P |
| 1078 |
The only metacharacters that are recognized in character classes are backslash, |
The only metacharacters that are recognized in character classes are backslash, |
| 1079 |
hyphen (only where it can be interpreted as specifying a range), circumflex |
hyphen (only where it can be interpreted as specifying a range), circumflex |
| 2742 |
.rs |
.rs |
| 2743 |
.sp |
.sp |
| 2744 |
.nf |
.nf |
| 2745 |
Last updated: 17 November 2010 |
Last updated: 21 November 2010 |
| 2746 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |
| 2747 |
.fi |
.fi |