| 227 |
greater than 127) are treated as literals. |
greater than 127) are treated as literals. |
| 228 |
</P> |
</P> |
| 229 |
<P> |
<P> |
| 230 |
If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the |
If a pattern is compiled with the PCRE_EXTENDED option, white space in the |
| 231 |
pattern (other than in a character class) and characters between a # outside |
pattern (other than in a character class) and characters between a # outside |
| 232 |
a character class and the next newline are ignored. An escaping backslash can |
a character class and the next newline are ignored. An escaping backslash can |
| 233 |
be used to include a whitespace or # character as part of the pattern. |
be used to include a white space or # character as part of the pattern. |
| 234 |
</P> |
</P> |
| 235 |
<P> |
<P> |
| 236 |
If you want to remove the special meaning from a sequence of characters, you |
If you want to remove the special meaning from a sequence of characters, you |
| 264 |
\a alarm, that is, the BEL character (hex 07) |
\a alarm, that is, the BEL character (hex 07) |
| 265 |
\cx "control-x", where x is any ASCII character |
\cx "control-x", where x is any ASCII character |
| 266 |
\e escape (hex 1B) |
\e escape (hex 1B) |
| 267 |
\f formfeed (hex 0C) |
\f form feed (hex 0C) |
| 268 |
\n linefeed (hex 0A) |
\n linefeed (hex 0A) |
| 269 |
\r carriage return (hex 0D) |
\r carriage return (hex 0D) |
| 270 |
\t tab (hex 09) |
\t tab (hex 09) |
| 406 |
<pre> |
<pre> |
| 407 |
\d any decimal digit |
\d any decimal digit |
| 408 |
\D any character that is not a decimal digit |
\D any character that is not a decimal digit |
| 409 |
\h any horizontal whitespace character |
\h any horizontal white space character |
| 410 |
\H any character that is not a horizontal whitespace character |
\H any character that is not a horizontal white space character |
| 411 |
\s any whitespace character |
\s any white space character |
| 412 |
\S any character that is not a whitespace character |
\S any character that is not a white space character |
| 413 |
\v any vertical whitespace character |
\v any vertical white space character |
| 414 |
\V any character that is not a vertical whitespace character |
\V any character that is not a vertical white space character |
| 415 |
\w any "word" character |
\w any "word" character |
| 416 |
\W any "non-word" character |
\W any "non-word" character |
| 417 |
</pre> |
</pre> |
| 497 |
<pre> |
<pre> |
| 498 |
U+000A Linefeed |
U+000A Linefeed |
| 499 |
U+000B Vertical tab |
U+000B Vertical tab |
| 500 |
U+000C Formfeed |
U+000C Form feed |
| 501 |
U+000D Carriage return |
U+000D Carriage return |
| 502 |
U+0085 Next line |
U+0085 Next line |
| 503 |
U+2028 Line separator |
U+2028 Line separator |
| 520 |
<a href="#atomicgroup">below.</a> |
<a href="#atomicgroup">below.</a> |
| 521 |
This particular group matches either the two-character sequence CR followed by |
This particular group matches either the two-character sequence CR followed by |
| 522 |
LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, |
LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, |
| 523 |
U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next |
U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next |
| 524 |
line, U+0085). The two-character sequence is treated as a single unit that |
line, U+0085). The two-character sequence is treated as a single unit that |
| 525 |
cannot be split. |
cannot be split. |
| 526 |
</P> |
</P> |
| 822 |
Xwd Any Perl "word" character |
Xwd Any Perl "word" character |
| 823 |
</pre> |
</pre> |
| 824 |
Xan matches characters that have either the L (letter) or the N (number) |
Xan matches characters that have either the L (letter) or the N (number) |
| 825 |
property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or |
property. Xps matches the characters tab, linefeed, vertical tab, form feed, or |
| 826 |
carriage return, and any other character that has the Z (separator) property. |
carriage return, and any other character that has the Z (separator) property. |
| 827 |
Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the |
Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the |
| 828 |
same characters as Xan, plus underscore. |
same characters as Xan, plus underscore. |
| 1829 |
following a backslash are taken as part of a potential back reference number. |
following a backslash are taken as part of a potential back reference number. |
| 1830 |
If the pattern continues with a digit character, some delimiter must be used to |
If the pattern continues with a digit character, some delimiter must be used to |
| 1831 |
terminate the back reference. If the PCRE_EXTENDED option is set, this can be |
terminate the back reference. If the PCRE_EXTENDED option is set, this can be |
| 1832 |
whitespace. Otherwise, the \g{ syntax or an empty comment (see |
white space. Otherwise, the \g{ syntax or an empty comment (see |
| 1833 |
<a href="#comments">"Comments"</a> |
<a href="#comments">"Comments"</a> |
| 1834 |
below) can be used. |
below) can be used. |
| 1835 |
</P> |
</P> |
| 2171 |
subroutines that can be referenced from elsewhere. (The use of |
subroutines that can be referenced from elsewhere. (The use of |
| 2172 |
<a href="#subpatternsassubroutines">subroutines</a> |
<a href="#subpatternsassubroutines">subroutines</a> |
| 2173 |
is described below.) For example, a pattern to match an IPv4 address such as |
is described below.) For example, a pattern to match an IPv4 address such as |
| 2174 |
"192.168.23.245" could be written like this (ignore whitespace and line |
"192.168.23.245" could be written like this (ignore white space and line |
| 2175 |
breaks): |
breaks): |
| 2176 |
<pre> |
<pre> |
| 2177 |
(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) ) |
(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) ) |
| 2565 |
a successful positive assertion <i>is</i> passed back when a match succeeds |
a successful positive assertion <i>is</i> passed back when a match succeeds |
| 2566 |
(compare capturing parentheses in assertions). Note that such subpatterns are |
(compare capturing parentheses in assertions). Note that such subpatterns are |
| 2567 |
processed as anchored at the point where they are tested. Note also that Perl's |
processed as anchored at the point where they are tested. Note also that Perl's |
| 2568 |
treatment of subroutines is different in some cases. |
treatment of subroutines and assertions is different in some cases. |
| 2569 |
</P> |
</P> |
| 2570 |
<P> |
<P> |
| 2571 |
The new verbs make use of what was previously invalid syntax: an opening |
The new verbs make use of what was previously invalid syntax: an opening |
| 2572 |
parenthesis followed by an asterisk. They are generally of the form |
parenthesis followed by an asterisk. They are generally of the form |
| 2573 |
(*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour, |
(*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour, |
| 2574 |
depending on whether or not an argument is present. A name is any sequence of |
depending on whether or not an argument is present. A name is any sequence of |
| 2575 |
characters that does not include a closing parenthesis. If the name is empty, |
characters that does not include a closing parenthesis. The maximum length of |
| 2576 |
that is, if the closing parenthesis immediately follows the colon, the effect |
name is 255 in the 8-bit library and 65535 in the 16-bit library. If the name |
| 2577 |
is as if the colon were not there. Any number of these verbs may occur in a |
is empty, that is, if the closing parenthesis immediately follows the colon, |
| 2578 |
pattern. |
the effect is as if the colon were not there. Any number of these verbs may |
| 2579 |
|
occur in a pattern. |
| 2580 |
<a name="nooptimize"></a></P> |
<a name="nooptimize"></a></P> |
| 2581 |
<br><b> |
<br><b> |
| 2582 |
Optimizations that affect backtracking verbs |
Optimizations that affect backtracking verbs |
| 2594 |
<a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a> |
<a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a> |
| 2595 |
in the |
in the |
| 2596 |
<a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| 2597 |
documentation. |
documentation. |
| 2598 |
</P> |
</P> |
| 2599 |
<P> |
<P> |
| 2600 |
Experiments with Perl suggest that it too has similar optimizations, sometimes |
Experiments with Perl suggest that it too has similar optimizations, sometimes |
| 2688 |
</P> |
</P> |
| 2689 |
<P> |
<P> |
| 2690 |
If you are interested in (*MARK) values after failed matches, you should |
If you are interested in (*MARK) values after failed matches, you should |
| 2691 |
probably set the PCRE_NO_START_OPTIMIZE option |
probably set the PCRE_NO_START_OPTIMIZE option |
| 2692 |
<a href="#nooptimize">(see above)</a> |
<a href="#nooptimize">(see above)</a> |
| 2693 |
to ensure that the match is always attempted. |
to ensure that the match is always attempted. |
| 2694 |
</P> |
</P> |
| 2869 |
</P> |
</P> |
| 2870 |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
| 2871 |
<P> |
<P> |
| 2872 |
Last updated: 14 April 2012 |
Last updated: 01 June 2012 |
| 2873 |
<br> |
<br> |
| 2874 |
Copyright © 1997-2012 University of Cambridge. |
Copyright © 1997-2012 University of Cambridge. |
| 2875 |
<br> |
<br> |