| 333 |
later. |
later. |
| 334 |
.\" |
.\" |
| 335 |
Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP |
Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP |
| 336 |
synonymous. The former is a back reference; the latter is a subroutine call. |
synonymous. The former is a back reference; the latter is a |
| 337 |
|
.\" HTML <a href="#subpatternsassubroutines"> |
| 338 |
|
.\" </a> |
| 339 |
|
subroutine |
| 340 |
|
.\" |
| 341 |
|
call. |
| 342 |
. |
. |
| 343 |
. |
. |
| 344 |
.SS "Generic character types" |
.SS "Generic character types" |
| 1674 |
.sp |
.sp |
| 1675 |
causes an error at compile time. Branches that match different length strings |
causes an error at compile time. Branches that match different length strings |
| 1676 |
are permitted only at the top level of a lookbehind assertion. This is an |
are permitted only at the top level of a lookbehind assertion. This is an |
| 1677 |
extension compared with Perl (at least for 5.8), which requires all branches to |
extension compared with Perl (5.8 and 5.10), which requires all branches to |
| 1678 |
match the same length of string. An assertion such as |
match the same length of string. An assertion such as |
| 1679 |
.sp |
.sp |
| 1680 |
(?<=ab(c|de)) |
(?<=ab(c|de)) |
| 1681 |
.sp |
.sp |
| 1682 |
is not permitted, because its single top-level branch can match two different |
is not permitted, because its single top-level branch can match two different |
| 1683 |
lengths, but it is acceptable if rewritten to use two top-level branches: |
lengths, but it is acceptable to PCRE if rewritten to use two top-level |
| 1684 |
|
branches: |
| 1685 |
.sp |
.sp |
| 1686 |
(?<=abc|abde) |
(?<=abc|abde) |
| 1687 |
.sp |
.sp |
| 1690 |
.\" </a> |
.\" </a> |
| 1691 |
(see above) |
(see above) |
| 1692 |
.\" |
.\" |
| 1693 |
can be used instead of a lookbehind assertion; this is not restricted to a |
can be used instead of a lookbehind assertion to get round the fixed-length |
| 1694 |
fixed-length. |
restriction. |
| 1695 |
.P |
.P |
| 1696 |
The implementation of lookbehind assertions is, for each alternative, to |
The implementation of lookbehind assertions is, for each alternative, to |
| 1697 |
temporarily move the current position back by the fixed length and then try to |
temporarily move the current position back by the fixed length and then try to |
| 1703 |
the length of the lookbehind. The \eX and \eR escapes, which can match |
the length of the lookbehind. The \eX and \eR escapes, which can match |
| 1704 |
different numbers of bytes, are also not permitted. |
different numbers of bytes, are also not permitted. |
| 1705 |
.P |
.P |
| 1706 |
|
.\" HTML <a href="#subpatternsassubroutines"> |
| 1707 |
|
.\" </a> |
| 1708 |
|
"Subroutine" |
| 1709 |
|
.\" |
| 1710 |
|
calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long |
| 1711 |
|
as the subpattern matches a fixed-length string. |
| 1712 |
|
.\" HTML <a href="#recursion"> |
| 1713 |
|
.\" </a> |
| 1714 |
|
Recursion, |
| 1715 |
|
.\" |
| 1716 |
|
however, is not supported. |
| 1717 |
|
.P |
| 1718 |
Possessive quantifiers can be used in conjunction with lookbehind assertions to |
Possessive quantifiers can be used in conjunction with lookbehind assertions to |
| 1719 |
specify efficient matching at the end of the subject string. Consider a simple |
specify efficient matching at the end of the subject string. Consider a simple |
| 1720 |
pattern such as |
pattern such as |
| 1859 |
number or name is given. This condition does not check the entire recursion |
number or name is given. This condition does not check the entire recursion |
| 1860 |
stack. |
stack. |
| 1861 |
.P |
.P |
| 1862 |
At "top level", all these recursion test conditions are false. Recursive |
At "top level", all these recursion test conditions are false. |
| 1863 |
patterns are described below. |
.\" HTML <a href="#recursion"> |
| 1864 |
|
.\" </a> |
| 1865 |
|
Recursive patterns |
| 1866 |
|
.\" |
| 1867 |
|
are described below. |
| 1868 |
. |
. |
| 1869 |
.SS "Defining subpatterns for use by reference only" |
.SS "Defining subpatterns for use by reference only" |
| 1870 |
.rs |
.rs |
| 1873 |
name DEFINE, the condition is always false. In this case, there may be only one |
name DEFINE, the condition is always false. In this case, there may be only one |
| 1874 |
alternative in the subpattern. It is always skipped if control reaches this |
alternative in the subpattern. It is always skipped if control reaches this |
| 1875 |
point in the pattern; the idea of DEFINE is that it can be used to define |
point in the pattern; the idea of DEFINE is that it can be used to define |
| 1876 |
"subroutines" that can be referenced from elsewhere. (The use of "subroutines" |
"subroutines" that can be referenced from elsewhere. (The use of |
| 1877 |
|
.\" HTML <a href="#subpatternsassubroutines"> |
| 1878 |
|
.\" </a> |
| 1879 |
|
"subroutines" |
| 1880 |
|
.\" |
| 1881 |
is described below.) For example, a pattern to match an IPv4 address could be |
is described below.) For example, a pattern to match an IPv4 address could be |
| 1882 |
written like this (ignore whitespace and line breaks): |
written like this (ignore whitespace and line breaks): |
| 1883 |
.sp |
.sp |
| 1952 |
.P |
.P |
| 1953 |
A special item that consists of (? followed by a number greater than zero and a |
A special item that consists of (? followed by a number greater than zero and a |
| 1954 |
closing parenthesis is a recursive call of the subpattern of the given number, |
closing parenthesis is a recursive call of the subpattern of the given number, |
| 1955 |
provided that it occurs inside that subpattern. (If not, it is a "subroutine" |
provided that it occurs inside that subpattern. (If not, it is a |
| 1956 |
|
.\" HTML <a href="#subpatternsassubroutines"> |
| 1957 |
|
.\" </a> |
| 1958 |
|
"subroutine" |
| 1959 |
|
.\" |
| 1960 |
call, which is described in the next section.) The special item (?R) or (?0) is |
call, which is described in the next section.) The special item (?R) or (?0) is |
| 1961 |
a recursive call of the entire regular expression. |
a recursive call of the entire regular expression. |
| 1962 |
.P |
.P |
| 1988 |
It is also possible to refer to subsequently opened parentheses, by writing |
It is also possible to refer to subsequently opened parentheses, by writing |
| 1989 |
references such as (?+2). However, these cannot be recursive because the |
references such as (?+2). However, these cannot be recursive because the |
| 1990 |
reference is not inside the parentheses that are referenced. They are always |
reference is not inside the parentheses that are referenced. They are always |
| 1991 |
"subroutine" calls, as described in the next section. |
.\" HTML <a href="#subpatternsassubroutines"> |
| 1992 |
|
.\" </a> |
| 1993 |
|
"subroutine" |
| 1994 |
|
.\" |
| 1995 |
|
calls, as described in the next section. |
| 1996 |
.P |
.P |
| 1997 |
An alternative approach is to use named parentheses instead. The Perl syntax |
An alternative approach is to use named parentheses instead. The Perl syntax |
| 1998 |
for this is (?&name); PCRE's earlier syntax (?P>name) is also supported. We |
for this is (?&name); PCRE's earlier syntax (?P>name) is also supported. We |
| 2351 |
.rs |
.rs |
| 2352 |
.sp |
.sp |
| 2353 |
.nf |
.nf |
| 2354 |
Last updated: 18 September 2009 |
Last updated: 22 September 2009 |
| 2355 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
| 2356 |
.fi |
.fi |