| 105 |
changes the convention to CR. That pattern matches "a\nb" because LF is no |
changes the convention to CR. That pattern matches "a\nb" because LF is no |
| 106 |
longer a newline. Note that these special settings, which are not |
longer a newline. Note that these special settings, which are not |
| 107 |
Perl-compatible, are recognized only at the very start of a pattern, and that |
Perl-compatible, are recognized only at the very start of a pattern, and that |
| 108 |
they must be in upper case. |
they must be in upper case. If more than one of them is present, the last one |
| 109 |
|
is used. |
| 110 |
|
</P> |
| 111 |
|
<P> |
| 112 |
|
The newline convention does not affect what the \R escape sequence matches. By |
| 113 |
|
default, this is any Unicode newline sequence, for Perl compatibility. However, |
| 114 |
|
this can be changed; see the description of \R in the section entitled |
| 115 |
|
<a href="#newlineseq">"Newline sequences"</a> |
| 116 |
|
below. |
| 117 |
</P> |
</P> |
| 118 |
<br><a name="SEC3" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br> |
<br><a name="SEC3" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br> |
| 119 |
<P> |
<P> |
| 399 |
or "french" in Windows, some character codes greater than 128 are used for |
or "french" in Windows, some character codes greater than 128 are used for |
| 400 |
accented letters, and these are matched by \w. The use of locales with Unicode |
accented letters, and these are matched by \w. The use of locales with Unicode |
| 401 |
is discouraged. |
is discouraged. |
| 402 |
</P> |
<a name="newlineseq"></a></P> |
| 403 |
<br><b> |
<br><b> |
| 404 |
Newline sequences |
Newline sequences |
| 405 |
</b><br> |
</b><br> |
| 406 |
<P> |
<P> |
| 407 |
Outside a character class, the escape sequence \R matches any Unicode newline |
Outside a character class, by default, the escape sequence \R matches any |
| 408 |
sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is equivalent to |
Unicode newline sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is |
| 409 |
the following: |
equivalent to the following: |
| 410 |
<pre> |
<pre> |
| 411 |
(?>\r\n|\n|\x0b|\f|\r|\x85) |
(?>\r\n|\n|\x0b|\f|\r|\x85) |
| 412 |
</pre> |
</pre> |
| 425 |
recognized. |
recognized. |
| 426 |
</P> |
</P> |
| 427 |
<P> |
<P> |
| 428 |
|
It is possible to restrict \R to match only CR, LF, or CRLF (instead of the |
| 429 |
|
complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF |
| 430 |
|
either at compile time or when the pattern is matched. This can be made the |
| 431 |
|
default when PCRE is built; if this is the case, the other behaviour can be |
| 432 |
|
requested via the PCRE_BSR_UNICODE option. It is also possible to specify these |
| 433 |
|
settings by starting a pattern string with one of the following sequences: |
| 434 |
|
<pre> |
| 435 |
|
(*BSR_ANYCRLF) CR, LF, or CRLF only |
| 436 |
|
(*BSR_UNICODE) any Unicode newline sequence |
| 437 |
|
</pre> |
| 438 |
|
These override the default and the options given to <b>pcre_compile()</b>, but |
| 439 |
|
they can be overridden by options given to <b>pcre_exec()</b>. Note that these |
| 440 |
|
special settings, which are not Perl-compatible, are recognized only at the |
| 441 |
|
very start of a pattern, and that they must be in upper case. If more than one |
| 442 |
|
of them is present, the last one is used. |
| 443 |
|
</P> |
| 444 |
|
<P> |
| 445 |
Inside a character class, \R matches the letter "R". |
Inside a character class, \R matches the letter "R". |
| 446 |
<a name="uniextseq"></a></P> |
<a name="uniextseq"></a></P> |
| 447 |
<br><b> |
<br><b> |
| 2184 |
</P> |
</P> |
| 2185 |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
| 2186 |
<P> |
<P> |
| 2187 |
Last updated: 21 August 2007 |
Last updated: 11 September 2007 |
| 2188 |
<br> |
<br> |
| 2189 |
Copyright © 1997-2007 University of Cambridge. |
Copyright © 1997-2007 University of Cambridge. |
| 2190 |
<br> |
<br> |