| 35 |
<li><a name="TOC20" href="#SEC20">COMMENTS</a> |
<li><a name="TOC20" href="#SEC20">COMMENTS</a> |
| 36 |
<li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a> |
<li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a> |
| 37 |
<li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a> |
<li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a> |
| 38 |
<li><a name="TOC23" href="#SEC23">CALLOUTS</a> |
<li><a name="TOC23" href="#SEC23">ONIGURUMA SUBROUTINE SYNTAX</a> |
| 39 |
<li><a name="TOC24" href="#SEC24">BACKTRACKING CONTROL</a> |
<li><a name="TOC24" href="#SEC24">CALLOUTS</a> |
| 40 |
<li><a name="TOC25" href="#SEC25">SEE ALSO</a> |
<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a> |
| 41 |
<li><a name="TOC26" href="#SEC26">AUTHOR</a> |
<li><a name="TOC26" href="#SEC26">SEE ALSO</a> |
| 42 |
<li><a name="TOC27" href="#SEC27">REVISION</a> |
<li><a name="TOC27" href="#SEC27">AUTHOR</a> |
| 43 |
|
<li><a name="TOC28" href="#SEC28">REVISION</a> |
| 44 |
</ul> |
</ul> |
| 45 |
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br> |
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br> |
| 46 |
<P> |
<P> |
| 47 |
The syntax and semantics of the regular expressions that are supported by PCRE |
The syntax and semantics of the regular expressions that are supported by PCRE |
| 48 |
are described in detail below. There is a quick-reference syntax summary in the |
are described in detail below. There is a quick-reference syntax summary in the |
| 49 |
<a href="pcresyntax.html"><b>pcresyntax</b></a> |
<a href="pcresyntax.html"><b>pcresyntax</b></a> |
| 50 |
page. Perl's regular expressions are described in its own documentation, and |
page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE |
| 51 |
|
also supports some alternative regular expression syntax (which does not |
| 52 |
|
conflict with the Perl syntax) in order to provide some compatibility with |
| 53 |
|
regular expressions in Python, .NET, and Oniguruma. |
| 54 |
|
</P> |
| 55 |
|
<P> |
| 56 |
|
Perl's regular expressions are described in its own documentation, and |
| 57 |
regular expressions in general are covered in a number of books, some of which |
regular expressions in general are covered in a number of books, some of which |
| 58 |
have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", |
have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", |
| 59 |
published by O'Reilly, covers regular expressions in great detail. This |
published by O'Reilly, covers regular expressions in great detail. This |
| 319 |
<a href="#subpattern">parenthesized subpatterns.</a> |
<a href="#subpattern">parenthesized subpatterns.</a> |
| 320 |
</P> |
</P> |
| 321 |
<br><b> |
<br><b> |
| 322 |
|
Absolute and relative subroutine calls |
| 323 |
|
</b><br> |
| 324 |
|
<P> |
| 325 |
|
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or |
| 326 |
|
a number enclosed either in angle brackets or single quotes, is an alternative |
| 327 |
|
syntax for referencing a subpattern as a "subroutine". Details are discussed |
| 328 |
|
<a href="#onigurumasubroutines">later.</a> |
| 329 |
|
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> |
| 330 |
|
synonymous. The former is a back reference; the latter is a subroutine call. |
| 331 |
|
</P> |
| 332 |
|
<br><b> |
| 333 |
Generic character types |
Generic character types |
| 334 |
</b><br> |
</b><br> |
| 335 |
<P> |
<P> |
| 1060 |
pattern can contain special leading sequences to override what the application |
pattern can contain special leading sequences to override what the application |
| 1061 |
has set or what has been defaulted. Details are given in the section entitled |
has set or what has been defaulted. Details are given in the section entitled |
| 1062 |
<a href="#newlineseq">"Newline sequences"</a> |
<a href="#newlineseq">"Newline sequences"</a> |
| 1063 |
above. |
above. |
| 1064 |
<a name="subpattern"></a></P> |
<a name="subpattern"></a></P> |
| 1065 |
<br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br> |
<br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br> |
| 1066 |
<P> |
<P> |
| 1202 |
<a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| 1203 |
documentation. |
documentation. |
| 1204 |
</P> |
</P> |
| 1205 |
|
<P> |
| 1206 |
|
<b>Warning:</b> You cannot use different names to distinguish between two |
| 1207 |
|
subpatterns with the same number (see the previous section) because PCRE uses |
| 1208 |
|
only the numbers when matching. |
| 1209 |
|
</P> |
| 1210 |
<br><a name="SEC15" href="#TOC1">REPETITION</a><br> |
<br><a name="SEC15" href="#TOC1">REPETITION</a><br> |
| 1211 |
<P> |
<P> |
| 1212 |
Repetition is specified by quantifiers, which can follow any of the following |
Repetition is specified by quantifiers, which can follow any of the following |
| 1254 |
</P> |
</P> |
| 1255 |
<P> |
<P> |
| 1256 |
The quantifier {0} is permitted, causing the expression to behave as if the |
The quantifier {0} is permitted, causing the expression to behave as if the |
| 1257 |
previous item and the quantifier were not present. |
previous item and the quantifier were not present. This may be useful for |
| 1258 |
|
subpatterns that are referenced as |
| 1259 |
|
<a href="#subpatternsassubroutines">subroutines</a> |
| 1260 |
|
from elsewhere in the pattern. Items other than subpatterns that have a {0} |
| 1261 |
|
quantifier are omitted from the compiled pattern. |
| 1262 |
</P> |
</P> |
| 1263 |
<P> |
<P> |
| 1264 |
For convenience, the three most common quantifiers have single-character |
For convenience, the three most common quantifiers have single-character |
| 2058 |
</pre> |
</pre> |
| 2059 |
It matches "abcabc". It does not match "abcABC" because the change of |
It matches "abcabc". It does not match "abcABC" because the change of |
| 2060 |
processing option does not affect the called subpattern. |
processing option does not affect the called subpattern. |
| 2061 |
|
<a name="onigurumasubroutines"></a></P> |
| 2062 |
|
<br><a name="SEC23" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br> |
| 2063 |
|
<P> |
| 2064 |
|
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or |
| 2065 |
|
a number enclosed either in angle brackets or single quotes, is an alternative |
| 2066 |
|
syntax for referencing a subpattern as a subroutine, possibly recursively. Here |
| 2067 |
|
are two of the examples used above, rewritten using this syntax: |
| 2068 |
|
<pre> |
| 2069 |
|
(?<pn> \( ( (?>[^()]+) | \g<pn> )* \) ) |
| 2070 |
|
(sens|respons)e and \g'1'ibility |
| 2071 |
|
</pre> |
| 2072 |
|
PCRE supports an extension to Oniguruma: if a number is preceded by a |
| 2073 |
|
plus or a minus sign it is taken as a relative reference. For example: |
| 2074 |
|
<pre> |
| 2075 |
|
(abc)(?i:\g<-1>) |
| 2076 |
|
</pre> |
| 2077 |
|
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> |
| 2078 |
|
synonymous. The former is a back reference; the latter is a subroutine call. |
| 2079 |
</P> |
</P> |
| 2080 |
<br><a name="SEC23" href="#TOC1">CALLOUTS</a><br> |
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> |
| 2081 |
<P> |
<P> |
| 2082 |
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl |
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl |
| 2083 |
code to be obeyed in the middle of matching a regular expression. This makes it |
code to be obeyed in the middle of matching a regular expression. This makes it |
| 2112 |
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
| 2113 |
documentation. |
documentation. |
| 2114 |
</P> |
</P> |
| 2115 |
<br><a name="SEC24" href="#TOC1">BACKTRACKING CONTROL</a><br> |
<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br> |
| 2116 |
<P> |
<P> |
| 2117 |
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which |
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which |
| 2118 |
are described in the Perl documentation as "experimental and subject to change |
are described in the Perl documentation as "experimental and subject to change |
| 2121 |
remarks apply to the PCRE features described in this section. |
remarks apply to the PCRE features described in this section. |
| 2122 |
</P> |
</P> |
| 2123 |
<P> |
<P> |
| 2124 |
Since these verbs are specifically related to backtracking, they can be used |
Since these verbs are specifically related to backtracking, most of them can be |
| 2125 |
only when the pattern is to be matched using <b>pcre_exec()</b>, which uses a |
used only when the pattern is to be matched using <b>pcre_exec()</b>, which uses |
| 2126 |
backtracking algorithm. They cause an error if encountered by |
a backtracking algorithm. With the exception of (*FAIL), which behaves like a |
| 2127 |
|
failing negative assertion, they cause an error if encountered by |
| 2128 |
<b>pcre_dfa_exec()</b>. |
<b>pcre_dfa_exec()</b>. |
| 2129 |
</P> |
</P> |
| 2130 |
<P> |
<P> |
| 2228 |
second alternative and tries COND2, without backtracking into COND1. If (*THEN) |
second alternative and tries COND2, without backtracking into COND1. If (*THEN) |
| 2229 |
is used outside of any alternation, it acts exactly like (*PRUNE). |
is used outside of any alternation, it acts exactly like (*PRUNE). |
| 2230 |
</P> |
</P> |
| 2231 |
<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br> |
<br><a name="SEC26" href="#TOC1">SEE ALSO</a><br> |
| 2232 |
<P> |
<P> |
| 2233 |
<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3). |
<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3). |
| 2234 |
</P> |
</P> |
| 2235 |
<br><a name="SEC26" href="#TOC1">AUTHOR</a><br> |
<br><a name="SEC27" href="#TOC1">AUTHOR</a><br> |
| 2236 |
<P> |
<P> |
| 2237 |
Philip Hazel |
Philip Hazel |
| 2238 |
<br> |
<br> |
| 2241 |
Cambridge CB2 3QH, England. |
Cambridge CB2 3QH, England. |
| 2242 |
<br> |
<br> |
| 2243 |
</P> |
</P> |
| 2244 |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
| 2245 |
<P> |
<P> |
| 2246 |
Last updated: 17 September 2007 |
Last updated: 08 March 2009 |
| 2247 |
<br> |
<br> |
| 2248 |
Copyright © 1997-2007 University of Cambridge. |
Copyright © 1997-2009 University of Cambridge. |
| 2249 |
<br> |
<br> |
| 2250 |
<p> |
<p> |
| 2251 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |