| 245 |
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz |
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz |
| 246 |
</pre> |
</pre> |
| 247 |
The \Q...\E sequence is recognized both inside and outside character classes. |
The \Q...\E sequence is recognized both inside and outside character classes. |
| 248 |
An isolated \E that is not preceded by \Q is ignored. |
An isolated \E that is not preceded by \Q is ignored. If \Q is not followed |
| 249 |
|
by \E later in the pattern, the literal interpretation continues to the end of |
| 250 |
|
the pattern (that is, \E is assumed at the end). If the isolated \Q is inside |
| 251 |
|
a character class, this causes an error, because the character class is not |
| 252 |
|
terminated. |
| 253 |
<a name="digitsafterbackslash"></a></P> |
<a name="digitsafterbackslash"></a></P> |
| 254 |
<br><b> |
<br><b> |
| 255 |
Non-printing characters |
Non-printing characters |
| 756 |
non-UTF-8 mode \X matches any one character. |
non-UTF-8 mode \X matches any one character. |
| 757 |
</P> |
</P> |
| 758 |
<P> |
<P> |
| 759 |
|
Note that recent versions of Perl have changed \X to match what Unicode calls |
| 760 |
|
an "extended grapheme cluster", which has a more complicated definition. |
| 761 |
|
</P> |
| 762 |
|
<P> |
| 763 |
Matching characters by Unicode property is not fast, because PCRE has to search |
Matching characters by Unicode property is not fast, because PCRE has to search |
| 764 |
a structure that contains data for over fifteen thousand characters. That is |
a structure that contains data for over fifteen thousand characters. That is |
| 765 |
why the traditional escape sequences such as \d and \w do not use Unicode |
why the traditional escape sequences such as \d and \w do not use Unicode |
| 1413 |
an escape such as \d or \pL that matches a single character |
an escape such as \d or \pL that matches a single character |
| 1414 |
a character class |
a character class |
| 1415 |
a back reference (see next section) |
a back reference (see next section) |
| 1416 |
a parenthesized subpattern (unless it is an assertion) |
a parenthesized subpattern (including assertions) |
| 1417 |
a recursive or "subroutine" call to a subpattern |
a recursive or "subroutine" call to a subpattern |
| 1418 |
</pre> |
</pre> |
| 1419 |
The general repetition quantifier specifies a minimum and maximum number of |
The general repetition quantifier specifies a minimum and maximum number of |
| 1804 |
except that it does not cause the current matching position to be changed. |
except that it does not cause the current matching position to be changed. |
| 1805 |
</P> |
</P> |
| 1806 |
<P> |
<P> |
| 1807 |
Assertion subpatterns are not capturing subpatterns, and may not be repeated, |
Assertion subpatterns are not capturing subpatterns. If such an assertion |
| 1808 |
because it makes no sense to assert the same thing several times. If any kind |
contains capturing subpatterns within it, these are counted for the purposes of |
| 1809 |
of assertion contains capturing subpatterns within it, these are counted for |
numbering the capturing subpatterns in the whole pattern. However, substring |
| 1810 |
the purposes of numbering the capturing subpatterns in the whole pattern. |
capturing is carried out only for positive assertions, because it does not make |
| 1811 |
However, substring capturing is carried out only for positive assertions, |
sense for negative assertions. |
| 1812 |
because it does not make sense for negative assertions. |
</P> |
| 1813 |
|
<P> |
| 1814 |
|
For compatibility with Perl, assertion subpatterns may be repeated; though |
| 1815 |
|
it makes no sense to assert the same thing several times, the side effect of |
| 1816 |
|
capturing parentheses may occasionally be useful. In practice, there only three |
| 1817 |
|
cases: |
| 1818 |
|
<br> |
| 1819 |
|
<br> |
| 1820 |
|
(1) If the quantifier is {0}, the assertion is never obeyed during matching. |
| 1821 |
|
However, it may contain internal capturing parenthesized groups that are called |
| 1822 |
|
from elsewhere via the |
| 1823 |
|
<a href="#subpatternsassubroutines">subroutine mechanism.</a> |
| 1824 |
|
<br> |
| 1825 |
|
<br> |
| 1826 |
|
(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it |
| 1827 |
|
were {0,1}. At run time, the rest of the pattern match is tried with and |
| 1828 |
|
without the assertion, the order depending on the greediness of the quantifier. |
| 1829 |
|
<br> |
| 1830 |
|
<br> |
| 1831 |
|
(3) If the minimum repetition is greater than zero, the quantifier is ignored. |
| 1832 |
|
The assertion is obeyed just once when encountered during matching. |
| 1833 |
</P> |
</P> |
| 1834 |
<br><b> |
<br><b> |
| 1835 |
Lookahead assertions |
Lookahead assertions |
| 2473 |
<P> |
<P> |
| 2474 |
If any of these verbs are used in an assertion or subroutine subpattern |
If any of these verbs are used in an assertion or subroutine subpattern |
| 2475 |
(including recursive subpatterns), their effect is confined to that subpattern; |
(including recursive subpatterns), their effect is confined to that subpattern; |
| 2476 |
it does not extend to the surrounding pattern. Note that such subpatterns are |
it does not extend to the surrounding pattern, with one exception: a *MARK that |
| 2477 |
processed as anchored at the point where they are tested. |
is encountered in a positive assertion <i>is</i> passed back (compare capturing |
| 2478 |
|
parentheses in assertions). Note that such subpatterns are processed as |
| 2479 |
|
anchored at the point where they are tested. |
| 2480 |
</P> |
</P> |
| 2481 |
<P> |
<P> |
| 2482 |
The new verbs make use of what was previously invalid syntax: an opening |
The new verbs make use of what was previously invalid syntax: an opening |
| 2566 |
capturing parentheses. |
capturing parentheses. |
| 2567 |
</P> |
</P> |
| 2568 |
<P> |
<P> |
| 2569 |
|
If (*MARK) is encountered in a positive assertion, its name is recorded and |
| 2570 |
|
passed back if it is the last-encountered. This does not happen for negative |
| 2571 |
|
assetions. |
| 2572 |
|
</P> |
| 2573 |
|
<P> |
| 2574 |
A name may also be returned after a failed match if the final path through the |
A name may also be returned after a failed match if the final path through the |
| 2575 |
pattern involves (*MARK). However, unless (*MARK) used in conjunction with |
pattern involves (*MARK). However, unless (*MARK) used in conjunction with |
| 2576 |
(*COMMIT), this is unlikely to happen for an unanchored pattern because, as the |
(*COMMIT), this is unlikely to happen for an unanchored pattern because, as the |
| 2740 |
</P> |
</P> |
| 2741 |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
| 2742 |
<P> |
<P> |
| 2743 |
Last updated: 21 November 2010 |
Last updated: 24 July 2011 |
| 2744 |
<br> |
<br> |
| 2745 |
Copyright © 1997-2010 University of Cambridge. |
Copyright © 1997-2011 University of Cambridge. |
| 2746 |
<br> |
<br> |
| 2747 |
<p> |
<p> |
| 2748 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |