| 334 |
syntax for referencing a subpattern as a "subroutine". Details are discussed |
syntax for referencing a subpattern as a "subroutine". Details are discussed |
| 335 |
<a href="#onigurumasubroutines">later.</a> |
<a href="#onigurumasubroutines">later.</a> |
| 336 |
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> |
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> |
| 337 |
synonymous. The former is a back reference; the latter is a subroutine call. |
synonymous. The former is a back reference; the latter is a |
| 338 |
|
<a href="#subpatternsassubroutines">subroutine</a> |
| 339 |
|
call. |
| 340 |
</P> |
</P> |
| 341 |
<br><b> |
<br><b> |
| 342 |
Generic character types |
Generic character types |
| 1664 |
</pre> |
</pre> |
| 1665 |
causes an error at compile time. Branches that match different length strings |
causes an error at compile time. Branches that match different length strings |
| 1666 |
are permitted only at the top level of a lookbehind assertion. This is an |
are permitted only at the top level of a lookbehind assertion. This is an |
| 1667 |
extension compared with Perl (at least for 5.8), which requires all branches to |
extension compared with Perl (5.8 and 5.10), which requires all branches to |
| 1668 |
match the same length of string. An assertion such as |
match the same length of string. An assertion such as |
| 1669 |
<pre> |
<pre> |
| 1670 |
(?<=ab(c|de)) |
(?<=ab(c|de)) |
| 1671 |
</pre> |
</pre> |
| 1672 |
is not permitted, because its single top-level branch can match two different |
is not permitted, because its single top-level branch can match two different |
| 1673 |
lengths, but it is acceptable if rewritten to use two top-level branches: |
lengths, but it is acceptable to PCRE if rewritten to use two top-level |
| 1674 |
|
branches: |
| 1675 |
<pre> |
<pre> |
| 1676 |
(?<=abc|abde) |
(?<=abc|abde) |
| 1677 |
</pre> |
</pre> |
| 1678 |
In some cases, the Perl 5.10 escape sequence \K |
In some cases, the Perl 5.10 escape sequence \K |
| 1679 |
<a href="#resetmatchstart">(see above)</a> |
<a href="#resetmatchstart">(see above)</a> |
| 1680 |
can be used instead of a lookbehind assertion; this is not restricted to a |
can be used instead of a lookbehind assertion to get round the fixed-length |
| 1681 |
fixed-length. |
restriction. |
| 1682 |
</P> |
</P> |
| 1683 |
<P> |
<P> |
| 1684 |
The implementation of lookbehind assertions is, for each alternative, to |
The implementation of lookbehind assertions is, for each alternative, to |
| 1693 |
different numbers of bytes, are also not permitted. |
different numbers of bytes, are also not permitted. |
| 1694 |
</P> |
</P> |
| 1695 |
<P> |
<P> |
| 1696 |
|
<a href="#subpatternsassubroutines">"Subroutine"</a> |
| 1697 |
|
calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long |
| 1698 |
|
as the subpattern matches a fixed-length string. |
| 1699 |
|
<a href="#recursion">Recursion,</a> |
| 1700 |
|
however, is not supported. |
| 1701 |
|
</P> |
| 1702 |
|
<P> |
| 1703 |
Possessive quantifiers can be used in conjunction with lookbehind assertions to |
Possessive quantifiers can be used in conjunction with lookbehind assertions to |
| 1704 |
specify efficient matching at the end of the subject string. Consider a simple |
specify efficient matching at the end of the subject string. Consider a simple |
| 1705 |
pattern such as |
pattern such as |
| 1851 |
stack. |
stack. |
| 1852 |
</P> |
</P> |
| 1853 |
<P> |
<P> |
| 1854 |
At "top level", all these recursion test conditions are false. Recursive |
At "top level", all these recursion test conditions are false. |
| 1855 |
patterns are described below. |
<a href="#recursion">Recursive patterns</a> |
| 1856 |
|
are described below. |
| 1857 |
</P> |
</P> |
| 1858 |
<br><b> |
<br><b> |
| 1859 |
Defining subpatterns for use by reference only |
Defining subpatterns for use by reference only |
| 1863 |
name DEFINE, the condition is always false. In this case, there may be only one |
name DEFINE, the condition is always false. In this case, there may be only one |
| 1864 |
alternative in the subpattern. It is always skipped if control reaches this |
alternative in the subpattern. It is always skipped if control reaches this |
| 1865 |
point in the pattern; the idea of DEFINE is that it can be used to define |
point in the pattern; the idea of DEFINE is that it can be used to define |
| 1866 |
"subroutines" that can be referenced from elsewhere. (The use of "subroutines" |
"subroutines" that can be referenced from elsewhere. (The use of |
| 1867 |
|
<a href="#subpatternsassubroutines">"subroutines"</a> |
| 1868 |
is described below.) For example, a pattern to match an IPv4 address could be |
is described below.) For example, a pattern to match an IPv4 address could be |
| 1869 |
written like this (ignore whitespace and line breaks): |
written like this (ignore whitespace and line breaks): |
| 1870 |
<pre> |
<pre> |
| 1939 |
<P> |
<P> |
| 1940 |
A special item that consists of (? followed by a number greater than zero and a |
A special item that consists of (? followed by a number greater than zero and a |
| 1941 |
closing parenthesis is a recursive call of the subpattern of the given number, |
closing parenthesis is a recursive call of the subpattern of the given number, |
| 1942 |
provided that it occurs inside that subpattern. (If not, it is a "subroutine" |
provided that it occurs inside that subpattern. (If not, it is a |
| 1943 |
|
<a href="#subpatternsassubroutines">"subroutine"</a> |
| 1944 |
call, which is described in the next section.) The special item (?R) or (?0) is |
call, which is described in the next section.) The special item (?R) or (?0) is |
| 1945 |
a recursive call of the entire regular expression. |
a recursive call of the entire regular expression. |
| 1946 |
</P> |
</P> |
| 1976 |
It is also possible to refer to subsequently opened parentheses, by writing |
It is also possible to refer to subsequently opened parentheses, by writing |
| 1977 |
references such as (?+2). However, these cannot be recursive because the |
references such as (?+2). However, these cannot be recursive because the |
| 1978 |
reference is not inside the parentheses that are referenced. They are always |
reference is not inside the parentheses that are referenced. They are always |
| 1979 |
"subroutine" calls, as described in the next section. |
<a href="#subpatternsassubroutines">"subroutine"</a> |
| 1980 |
|
calls, as described in the next section. |
| 1981 |
</P> |
</P> |
| 1982 |
<P> |
<P> |
| 1983 |
An alternative approach is to use named parentheses instead. The Perl syntax |
An alternative approach is to use named parentheses instead. The Perl syntax |
| 2332 |
</P> |
</P> |
| 2333 |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
| 2334 |
<P> |
<P> |
| 2335 |
Last updated: 18 September 2009 |
Last updated: 22 September 2009 |
| 2336 |
<br> |
<br> |
| 2337 |
Copyright © 1997-2009 University of Cambridge. |
Copyright © 1997-2009 University of Cambridge. |
| 2338 |
<br> |
<br> |