| 9 |
.\" HREF |
.\" HREF |
| 10 |
\fBpcresyntax\fP |
\fBpcresyntax\fP |
| 11 |
.\" |
.\" |
| 12 |
page. Perl's regular expressions are described in its own documentation, and |
page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE |
| 13 |
|
also supports some alternative regular expression syntax (which does not |
| 14 |
|
conflict with the Perl syntax) in order to provide some compatibility with |
| 15 |
|
regular expressions in Python, .NET, and Oniguruma. |
| 16 |
|
.P |
| 17 |
|
Perl's regular expressions are described in its own documentation, and |
| 18 |
regular expressions in general are covered in a number of books, some of which |
regular expressions in general are covered in a number of books, some of which |
| 19 |
have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", |
have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", |
| 20 |
published by O'Reilly, covers regular expressions in great detail. This |
published by O'Reilly, covers regular expressions in great detail. This |
| 315 |
.\" |
.\" |
| 316 |
. |
. |
| 317 |
. |
. |
| 318 |
|
.SS "Absolute and relative subroutine calls" |
| 319 |
|
.rs |
| 320 |
|
.sp |
| 321 |
|
For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or |
| 322 |
|
a number enclosed either in angle brackets or single quotes, is an alternative |
| 323 |
|
syntax for referencing a subpattern as a "subroutine". Details are discussed |
| 324 |
|
.\" HTML <a href="#onigurumasubroutines"> |
| 325 |
|
.\" </a> |
| 326 |
|
later. |
| 327 |
|
.\" |
| 328 |
|
Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP |
| 329 |
|
synonymous. The former is a back reference; the latter is a subroutine call. |
| 330 |
|
. |
| 331 |
|
. |
| 332 |
.SS "Generic character types" |
.SS "Generic character types" |
| 333 |
.rs |
.rs |
| 334 |
.sp |
.sp |
| 364 |
\ew, and always match \eD, \eS, and \eW. This is true even when Unicode |
\ew, and always match \eD, \eS, and \eW. This is true even when Unicode |
| 365 |
character property support is available. These sequences retain their original |
character property support is available. These sequences retain their original |
| 366 |
meanings from before UTF-8 support was available, mainly for efficiency |
meanings from before UTF-8 support was available, mainly for efficiency |
| 367 |
reasons. |
reasons. Note that this also affects \eb, because it is defined in terms of \ew |
| 368 |
|
and \eW. |
| 369 |
.P |
.P |
| 370 |
The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the |
The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the |
| 371 |
other sequences, these do match certain high-valued codepoints in UTF-8 mode. |
other sequences, these do match certain high-valued codepoints in UTF-8 mode. |
| 1212 |
\fBpcreapi\fP |
\fBpcreapi\fP |
| 1213 |
.\" |
.\" |
| 1214 |
documentation. |
documentation. |
| 1215 |
|
.P |
| 1216 |
|
\fBWarning:\fP You cannot use different names to distinguish between two |
| 1217 |
|
subpatterns with the same number (see the previous section) because PCRE uses |
| 1218 |
|
only the numbers when matching. |
| 1219 |
. |
. |
| 1220 |
. |
. |
| 1221 |
.SH REPETITION |
.SH REPETITION |
| 1264 |
which may be several bytes long (and they may be of different lengths). |
which may be several bytes long (and they may be of different lengths). |
| 1265 |
.P |
.P |
| 1266 |
The quantifier {0} is permitted, causing the expression to behave as if the |
The quantifier {0} is permitted, causing the expression to behave as if the |
| 1267 |
previous item and the quantifier were not present. |
previous item and the quantifier were not present. This may be useful for |
| 1268 |
|
subpatterns that are referenced as |
| 1269 |
|
.\" HTML <a href="#subpatternsassubroutines"> |
| 1270 |
|
.\" </a> |
| 1271 |
|
subroutines |
| 1272 |
|
.\" |
| 1273 |
|
from elsewhere in the pattern. Items other than subpatterns that have a {0} |
| 1274 |
|
quantifier are omitted from the compiled pattern. |
| 1275 |
.P |
.P |
| 1276 |
For convenience, the three most common quantifiers have single-character |
For convenience, the three most common quantifiers have single-character |
| 1277 |
abbreviations: |
abbreviations: |
| 2054 |
processing option does not affect the called subpattern. |
processing option does not affect the called subpattern. |
| 2055 |
. |
. |
| 2056 |
. |
. |
| 2057 |
|
.\" HTML <a name="onigurumasubroutines"></a> |
| 2058 |
|
.SH "ONIGURUMA SUBROUTINE SYNTAX" |
| 2059 |
|
.rs |
| 2060 |
|
.sp |
| 2061 |
|
For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or |
| 2062 |
|
a number enclosed either in angle brackets or single quotes, is an alternative |
| 2063 |
|
syntax for referencing a subpattern as a subroutine, possibly recursively. Here |
| 2064 |
|
are two of the examples used above, rewritten using this syntax: |
| 2065 |
|
.sp |
| 2066 |
|
(?<pn> \e( ( (?>[^()]+) | \eg<pn> )* \e) ) |
| 2067 |
|
(sens|respons)e and \eg'1'ibility |
| 2068 |
|
.sp |
| 2069 |
|
PCRE supports an extension to Oniguruma: if a number is preceded by a |
| 2070 |
|
plus or a minus sign it is taken as a relative reference. For example: |
| 2071 |
|
.sp |
| 2072 |
|
(abc)(?i:\eg<-1>) |
| 2073 |
|
.sp |
| 2074 |
|
Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP |
| 2075 |
|
synonymous. The former is a back reference; the latter is a subroutine call. |
| 2076 |
|
. |
| 2077 |
|
. |
| 2078 |
.SH CALLOUTS |
.SH CALLOUTS |
| 2079 |
.rs |
.rs |
| 2080 |
.sp |
.sp |
| 2120 |
production code should be noted to avoid problems during upgrades." The same |
production code should be noted to avoid problems during upgrades." The same |
| 2121 |
remarks apply to the PCRE features described in this section. |
remarks apply to the PCRE features described in this section. |
| 2122 |
.P |
.P |
| 2123 |
Since these verbs are specifically related to backtracking, they can be used |
Since these verbs are specifically related to backtracking, most of them can be |
| 2124 |
only when the pattern is to be matched using \fBpcre_exec()\fP, which uses a |
used only when the pattern is to be matched using \fBpcre_exec()\fP, which uses |
| 2125 |
backtracking algorithm. They cause an error if encountered by |
a backtracking algorithm. With the exception of (*FAIL), which behaves like a |
| 2126 |
|
failing negative assertion, they cause an error if encountered by |
| 2127 |
\fBpcre_dfa_exec()\fP. |
\fBpcre_dfa_exec()\fP. |
| 2128 |
.P |
.P |
| 2129 |
The new verbs make use of what was previously invalid syntax: an opening |
The new verbs make use of what was previously invalid syntax: an opening |
| 2245 |
.rs |
.rs |
| 2246 |
.sp |
.sp |
| 2247 |
.nf |
.nf |
| 2248 |
Last updated: 17 September 2007 |
Last updated: 18 March 2009 |
| 2249 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
| 2250 |
.fi |
.fi |