/[pcre]/code/trunk/doc/html/pcrepattern.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepattern.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 111 by ph10, Thu Mar 8 16:53:09 2007 UTC revision 182 by ph10, Wed Jun 13 15:09:54 2007 UTC
# Line 24  man page, in case the conversion went wr Line 24  man page, in case the conversion went wr
24  <li><a name="TOC9" href="#SEC9">VERTICAL BAR</a>  <li><a name="TOC9" href="#SEC9">VERTICAL BAR</a>
25  <li><a name="TOC10" href="#SEC10">INTERNAL OPTION SETTING</a>  <li><a name="TOC10" href="#SEC10">INTERNAL OPTION SETTING</a>
26  <li><a name="TOC11" href="#SEC11">SUBPATTERNS</a>  <li><a name="TOC11" href="#SEC11">SUBPATTERNS</a>
27  <li><a name="TOC12" href="#SEC12">NAMED SUBPATTERNS</a>  <li><a name="TOC12" href="#SEC12">DUPLICATE SUBPATTERN NUMBERS</a>
28  <li><a name="TOC13" href="#SEC13">REPETITION</a>  <li><a name="TOC13" href="#SEC13">NAMED SUBPATTERNS</a>
29  <li><a name="TOC14" href="#SEC14">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>  <li><a name="TOC14" href="#SEC14">REPETITION</a>
30  <li><a name="TOC15" href="#SEC15">BACK REFERENCES</a>  <li><a name="TOC15" href="#SEC15">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
31  <li><a name="TOC16" href="#SEC16">ASSERTIONS</a>  <li><a name="TOC16" href="#SEC16">BACK REFERENCES</a>
32  <li><a name="TOC17" href="#SEC17">CONDITIONAL SUBPATTERNS</a>  <li><a name="TOC17" href="#SEC17">ASSERTIONS</a>
33  <li><a name="TOC18" href="#SEC18">COMMENTS</a>  <li><a name="TOC18" href="#SEC18">CONDITIONAL SUBPATTERNS</a>
34  <li><a name="TOC19" href="#SEC19">RECURSIVE PATTERNS</a>  <li><a name="TOC19" href="#SEC19">COMMENTS</a>
35  <li><a name="TOC20" href="#SEC20">SUBPATTERNS AS SUBROUTINES</a>  <li><a name="TOC20" href="#SEC20">RECURSIVE PATTERNS</a>
36  <li><a name="TOC21" href="#SEC21">CALLOUTS</a>  <li><a name="TOC21" href="#SEC21">SUBPATTERNS AS SUBROUTINES</a>
37  <li><a name="TOC22" href="#SEC22">SEE ALSO</a>  <li><a name="TOC22" href="#SEC22">CALLOUTS</a>
38  <li><a name="TOC23" href="#SEC23">AUTHOR</a>  <li><a name="TOC23" href="#SEC23">SEE ALSO</a>
39  <li><a name="TOC24" href="#SEC24">REVISION</a>  <li><a name="TOC24" href="#SEC24">AUTHOR</a>
40    <li><a name="TOC25" href="#SEC25">REVISION</a>
41  </ul>  </ul>
42  <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>  <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
43  <P>  <P>
# Line 63  The remainder of this document discusses Line 64  The remainder of this document discusses
64  PCRE when its main matching function, <b>pcre_exec()</b>, is used.  PCRE when its main matching function, <b>pcre_exec()</b>, is used.
65  From release 6.0, PCRE offers a second matching function,  From release 6.0, PCRE offers a second matching function,
66  <b>pcre_dfa_exec()</b>, which matches using a different algorithm that is not  <b>pcre_dfa_exec()</b>, which matches using a different algorithm that is not
67  Perl-compatible. The advantages and disadvantages of the alternative function,  Perl-compatible. Some of the features discussed below are not available when
68  and how it differs from the normal function, are discussed in the  <b>pcre_dfa_exec()</b> is used. The advantages and disadvantages of the
69    alternative function, and how it differs from the normal function, are
70    discussed in the
71  <a href="pcrematching.html"><b>pcrematching</b></a>  <a href="pcrematching.html"><b>pcrematching</b></a>
72  page.  page.
73  </P>  </P>
# Line 253  Absolute and relative back references Line 256  Absolute and relative back references
256  </b><br>  </b><br>
257  <P>  <P>
258  The sequence \g followed by a positive or negative number, optionally enclosed  The sequence \g followed by a positive or negative number, optionally enclosed
259  in braces, is an absolute or relative back reference. Back references are  in braces, is an absolute or relative back reference. A named back reference
260  discussed  can be coded as \g{name}. Back references are discussed
261  <a href="#backreferences">later,</a>  <a href="#backreferences">later,</a>
262  following the discussion of  following the discussion of
263  <a href="#subpattern">parenthesized subpatterns.</a>  <a href="#subpattern">parenthesized subpatterns.</a>
# Line 268  following are always recognized: Line 271  following are always recognized:
271  <pre>  <pre>
272    \d     any decimal digit    \d     any decimal digit
273    \D     any character that is not a decimal digit    \D     any character that is not a decimal digit
274      \h     any horizontal whitespace character
275      \H     any character that is not a horizontal whitespace character
276    \s     any whitespace character    \s     any whitespace character
277    \S     any character that is not a whitespace character    \S     any character that is not a whitespace character
278      \v     any vertical whitespace character
279      \V     any character that is not a vertical whitespace character
280    \w     any "word" character    \w     any "word" character
281    \W     any "non-word" character    \W     any "non-word" character
282  </pre>  </pre>
# Line 285  there is no character to match. Line 292  there is no character to match.
292  <P>  <P>
293  For compatibility with Perl, \s does not match the VT character (code 11).  For compatibility with Perl, \s does not match the VT character (code 11).
294  This makes it different from the the POSIX "space" class. The \s characters  This makes it different from the the POSIX "space" class. The \s characters
295  are HT (9), LF (10), FF (12), CR (13), and space (32). (If "use locale;" is  are HT (9), LF (10), FF (12), CR (13), and space (32). If "use locale;" is
296  included in a Perl script, \s may match the VT character. In PCRE, it never  included in a Perl script, \s may match the VT character. In PCRE, it never
297  does.)  does.
298    </P>
299    <P>
300    In UTF-8 mode, characters with values greater than 128 never match \d, \s, or
301    \w, and always match \D, \S, and \W. This is true even when Unicode
302    character property support is available. These sequences retain their original
303    meanings from before UTF-8 support was available, mainly for efficiency
304    reasons.
305    </P>
306    <P>
307    The sequences \h, \H, \v, and \V are Perl 5.10 features. In contrast to the
308    other sequences, these do match certain high-valued codepoints in UTF-8 mode.
309    The horizontal space characters are:
310    <pre>
311      U+0009     Horizontal tab
312      U+0020     Space
313      U+00A0     Non-break space
314      U+1680     Ogham space mark
315      U+180E     Mongolian vowel separator
316      U+2000     En quad
317      U+2001     Em quad
318      U+2002     En space
319      U+2003     Em space
320      U+2004     Three-per-em space
321      U+2005     Four-per-em space
322      U+2006     Six-per-em space
323      U+2007     Figure space
324      U+2008     Punctuation space
325      U+2009     Thin space
326      U+200A     Hair space
327      U+202F     Narrow no-break space
328      U+205F     Medium mathematical space
329      U+3000     Ideographic space
330    </pre>
331    The vertical space characters are:
332    <pre>
333      U+000A     Linefeed
334      U+000B     Vertical tab
335      U+000C     Formfeed
336      U+000D     Carriage return
337      U+0085     Next line
338      U+2028     Line separator
339      U+2029     Paragraph separator
340    </PRE>
341  </P>  </P>
342  <P>  <P>
343  A "word" character is an underscore or any character less than 256 that is a  A "word" character is an underscore or any character less than 256 that is a
# Line 297  place (see Line 347  place (see
347  <a href="pcreapi.html#localesupport">"Locale support"</a>  <a href="pcreapi.html#localesupport">"Locale support"</a>
348  in the  in the
349  <a href="pcreapi.html"><b>pcreapi</b></a>  <a href="pcreapi.html"><b>pcreapi</b></a>
350  page). For example, in the "fr_FR" (French) locale, some character codes  page). For example, in a French locale such as "fr_FR" in Unix-like systems,
351  greater than 128 are used for accented letters, and these are matched by \w.  or "french" in Windows, some character codes greater than 128 are used for
352  </P>  accented letters, and these are matched by \w. The use of locales with Unicode
353  <P>  is discouraged.
 In UTF-8 mode, characters with values greater than 128 never match \d, \s, or  
 \w, and always match \D, \S, and \W. This is true even when Unicode  
 character property support is available. The use of locales with Unicode is  
 discouraged.  
354  </P>  </P>
355  <br><b>  <br><b>
356  Newline sequences  Newline sequences
357  </b><br>  </b><br>
358  <P>  <P>
359  Outside a character class, the escape sequence \R matches any Unicode newline  Outside a character class, the escape sequence \R matches any Unicode newline
360  sequence. This is an extension to Perl. In non-UTF-8 mode \R is equivalent to  sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is equivalent to
361  the following:  the following:
362  <pre>  <pre>
363    (?&#62;\r\n|\n|\x0b|\f|\r|\x85)    (?&#62;\r\n|\n|\x0b|\f|\r|\x85)
# Line 527  Matching characters by Unicode property Line 573  Matching characters by Unicode property
573  a structure that contains data for over fifteen thousand characters. That is  a structure that contains data for over fifteen thousand characters. That is
574  why the traditional escape sequences such as \d and \w do not use Unicode  why the traditional escape sequences such as \d and \w do not use Unicode
575  properties in PCRE.  properties in PCRE.
576    <a name="resetmatchstart"></a></P>
577    <br><b>
578    Resetting the match start
579    </b><br>
580    <P>
581    The escape sequence \K, which is a Perl 5.10 feature, causes any previously
582    matched characters not to be included in the final matched sequence. For
583    example, the pattern:
584    <pre>
585      foo\Kbar
586    </pre>
587    matches "foobar", but reports that it has matched "bar". This feature is
588    similar to a lookbehind assertion
589    <a href="#lookbehind">(described below).</a>
590    However, in this case, the part of the subject before the real match does not
591    have to be of fixed length, as lookbehind assertions do. The use of \K does
592    not interfere with the setting of
593    <a href="#subpattern">captured substrings.</a>
594    For example, when the pattern
595    <pre>
596      (foo)\Kbar
597    </pre>
598    matches "foobar", the first substring is still set to "foo".
599  <a name="smallassertions"></a></P>  <a name="smallassertions"></a></P>
600  <br><b>  <br><b>
601  Simple assertions  Simple assertions
# Line 756  example [\x{100}-\x{2ff}]. Line 825  example [\x{100}-\x{2ff}].
825  If a range that includes letters is used when caseless matching is set, it  If a range that includes letters is used when caseless matching is set, it
826  matches the letters in either case. For example, [W-c] is equivalent to  matches the letters in either case. For example, [W-c] is equivalent to
827  [][\\^_`wxyzabc], matched caselessly, and in non-UTF-8 mode, if character  [][\\^_`wxyzabc], matched caselessly, and in non-UTF-8 mode, if character
828  tables for the "fr_FR" locale are in use, [\xc8-\xcb] matches accented E  tables for a French locale are in use, [\xc8-\xcb] matches accented E
829  characters in both cases. In UTF-8 mode, PCRE supports the concept of case for  characters in both cases. In UTF-8 mode, PCRE supports the concept of case for
830  characters with values greater than 128 only when it is compiled with Unicode  characters with values greater than 128 only when it is compiled with Unicode
831  property support.  property support.
# Line 940  from left to right, and options are not Line 1009  from left to right, and options are not
1009  is reached, an option setting in one branch does affect subsequent branches, so  is reached, an option setting in one branch does affect subsequent branches, so
1010  the above patterns match "SUNDAY" as well as "Saturday".  the above patterns match "SUNDAY" as well as "Saturday".
1011  </P>  </P>
1012  <br><a name="SEC12" href="#TOC1">NAMED SUBPATTERNS</a><br>  <br><a name="SEC12" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
1013    <P>
1014    Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
1015    the same numbers for its capturing parentheses. Such a subpattern starts with
1016    (?| and is itself a non-capturing subpattern. For example, consider this
1017    pattern:
1018    <pre>
1019      (?|(Sat)ur|(Sun))day
1020    </pre>
1021    Because the two alternatives are inside a (?| group, both sets of capturing
1022    parentheses are numbered one. Thus, when the pattern matches, you can look
1023    at captured substring number one, whichever alternative matched. This construct
1024    is useful when you want to capture part, but not all, of one of a number of
1025    alternatives. Inside a (?| group, parentheses are numbered as usual, but the
1026    number is reset at the start of each branch. The numbers of any capturing
1027    buffers that follow the subpattern start after the highest number used in any
1028    branch. The following example is taken from the Perl documentation.
1029    The numbers underneath show in which buffer the captured content will be
1030    stored.
1031    <pre>
1032      # before  ---------------branch-reset----------- after
1033      / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
1034      # 1            2         2  3        2     3     4
1035    </pre>
1036    A backreference or a recursive call to a numbered subpattern always refers to
1037    the first one in the pattern with the given number.
1038    </P>
1039    <P>
1040    An alternative approach to using this "branch reset" feature is to use
1041    duplicate named subpatterns, as described in the next section.
1042    </P>
1043    <br><a name="SEC13" href="#TOC1">NAMED SUBPATTERNS</a><br>
1044  <P>  <P>
1045  Identifying capturing parentheses by number is simple, but it can be very hard  Identifying capturing parentheses by number is simple, but it can be very hard
1046  to keep track of the numbers in complicated regular expressions. Furthermore,  to keep track of the numbers in complicated regular expressions. Furthermore,
# Line 982  abbreviation. This pattern (ignoring the Line 1082  abbreviation. This pattern (ignoring the
1082    (?&#60;DN&#62;Sat)(?:urday)?    (?&#60;DN&#62;Sat)(?:urday)?
1083  </pre>  </pre>
1084  There are five capturing substrings, but only one is ever set after a match.  There are five capturing substrings, but only one is ever set after a match.
1085    (An alternative way of solving this problem is to use a "branch reset"
1086    subpattern, as described in the previous section.)
1087    </P>
1088    <P>
1089  The convenience function for extracting the data by name returns the substring  The convenience function for extracting the data by name returns the substring
1090  for the first (and in this example, the only) subpattern of that name that  for the first (and in this example, the only) subpattern of that name that
1091  matched. This saves searching to find which numbered subpattern it was. If you  matched. This saves searching to find which numbered subpattern it was. If you
# Line 991  details of the interfaces for handling n Line 1095  details of the interfaces for handling n
1095  <a href="pcreapi.html"><b>pcreapi</b></a>  <a href="pcreapi.html"><b>pcreapi</b></a>
1096  documentation.  documentation.
1097  </P>  </P>
1098  <br><a name="SEC13" href="#TOC1">REPETITION</a><br>  <br><a name="SEC14" href="#TOC1">REPETITION</a><br>
1099  <P>  <P>
1100  Repetition is specified by quantifiers, which can follow any of the following  Repetition is specified by quantifiers, which can follow any of the following
1101  items:  items:
# Line 1142  example, after Line 1246  example, after
1246  </pre>  </pre>
1247  matches "aba" the value of the second captured substring is "b".  matches "aba" the value of the second captured substring is "b".
1248  <a name="atomicgroup"></a></P>  <a name="atomicgroup"></a></P>
1249  <br><a name="SEC14" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>  <br><a name="SEC15" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>
1250  <P>  <P>
1251  With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")  With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
1252  repetition, failure of what follows normally causes the repeated item to be  repetition, failure of what follows normally causes the repeated item to be
# Line 1241  an atomic group, like this: Line 1345  an atomic group, like this:
1345  </pre>  </pre>
1346  sequences of non-digits cannot be broken, and failure happens quickly.  sequences of non-digits cannot be broken, and failure happens quickly.
1347  <a name="backreferences"></a></P>  <a name="backreferences"></a></P>
1348  <br><a name="SEC15" href="#TOC1">BACK REFERENCES</a><br>  <br><a name="SEC16" href="#TOC1">BACK REFERENCES</a><br>
1349  <P>  <P>
1350  Outside a character class, a backslash followed by a digit greater than 0 (and  Outside a character class, a backslash followed by a digit greater than 0 (and
1351  possibly further digits) is a back reference to a capturing subpattern earlier  possibly further digits) is a back reference to a capturing subpattern earlier
# Line 1308  matches "rah rah" and "RAH RAH", but not Line 1412  matches "rah rah" and "RAH RAH", but not
1412  capturing subpattern is matched caselessly.  capturing subpattern is matched caselessly.
1413  </P>  </P>
1414  <P>  <P>
1415  Back references to named subpatterns use the Perl syntax \k&#60;name&#62; or \k'name'  There are several different ways of writing back references to named
1416  or the Python syntax (?P=name). We could rewrite the above example in either of  subpatterns. The .NET syntax \k{name} and the Perl syntax \k&#60;name&#62; or
1417    \k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
1418    back reference syntax, in which \g can be used for both numeric and named
1419    references, is also supported. We could rewrite the above example in any of
1420  the following ways:  the following ways:
1421  <pre>  <pre>
1422    (?&#60;p1&#62;(?i)rah)\s+\k&#60;p1&#62;    (?&#60;p1&#62;(?i)rah)\s+\k&#60;p1&#62;
1423      (?'p1'(?i)rah)\s+\k{p1}
1424    (?P&#60;p1&#62;(?i)rah)\s+(?P=p1)    (?P&#60;p1&#62;(?i)rah)\s+(?P=p1)
1425      (?&#60;p1&#62;(?i)rah)\s+\g{p1}
1426  </pre>  </pre>
1427  A subpattern that is referenced by name may appear in the pattern before or  A subpattern that is referenced by name may appear in the pattern before or
1428  after the reference.  after the reference.
# Line 1349  that the first iteration does not need t Line 1458  that the first iteration does not need t
1458  done using alternation, as in the example above, or by a quantifier with a  done using alternation, as in the example above, or by a quantifier with a
1459  minimum of zero.  minimum of zero.
1460  <a name="bigassertions"></a></P>  <a name="bigassertions"></a></P>
1461  <br><a name="SEC16" href="#TOC1">ASSERTIONS</a><br>  <br><a name="SEC17" href="#TOC1">ASSERTIONS</a><br>
1462  <P>  <P>
1463  An assertion is a test on the characters following or preceding the current  An assertion is a test on the characters following or preceding the current
1464  matching point that does not actually consume any characters. The simple  matching point that does not actually consume any characters. The simple
# Line 1431  lengths, but it is acceptable if rewritt Line 1540  lengths, but it is acceptable if rewritt
1540  <pre>  <pre>
1541    (?&#60;=abc|abde)    (?&#60;=abc|abde)
1542  </pre>  </pre>
1543    In some cases, the Perl 5.10 escape sequence \K
1544    <a href="#resetmatchstart">(see above)</a>
1545    can be used instead of a lookbehind assertion; this is not restricted to a
1546    fixed-length.
1547    </P>
1548    <P>
1549  The implementation of lookbehind assertions is, for each alternative, to  The implementation of lookbehind assertions is, for each alternative, to
1550  temporarily move the current position back by the fixed length and then try to  temporarily move the current position back by the fixed length and then try to
1551  match. If there are insufficient characters before the current position, the  match. If there are insufficient characters before the current position, the
# Line 1503  preceded by "foo", while Line 1618  preceded by "foo", while
1618  is another pattern that matches "foo" preceded by three digits and any three  is another pattern that matches "foo" preceded by three digits and any three
1619  characters that are not "999".  characters that are not "999".
1620  <a name="conditions"></a></P>  <a name="conditions"></a></P>
1621  <br><a name="SEC17" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>  <br><a name="SEC18" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
1622  <P>  <P>
1623  It is possible to cause the matching process to obey a subpattern  It is possible to cause the matching process to obey a subpattern
1624  conditionally or to choose between two alternative subpatterns, depending on  conditionally or to choose between two alternative subpatterns, depending on
# Line 1527  Checking for a used subpattern by number Line 1642  Checking for a used subpattern by number
1642  <P>  <P>
1643  If the text between the parentheses consists of a sequence of digits, the  If the text between the parentheses consists of a sequence of digits, the
1644  condition is true if the capturing subpattern of that number has previously  condition is true if the capturing subpattern of that number has previously
1645  matched.  matched. An alternative notation is to precede the digits with a plus or minus
1646    sign. In this case, the subpattern number is relative rather than absolute.
1647    The most recently opened parentheses can be referenced by (?(-1), the next most
1648    recent by (?(-2), and so on. In looping constructs it can also make sense to
1649    refer to subsequent groups with constructs such as (?(+2).
1650  </P>  </P>
1651  <P>  <P>
1652  Consider the following pattern, which contains non-significant white space to  Consider the following pattern, which contains non-significant white space to
# Line 1546  parenthesis is required. Otherwise, sinc Line 1665  parenthesis is required. Otherwise, sinc
1665  subpattern matches nothing. In other words, this pattern matches a sequence of  subpattern matches nothing. In other words, this pattern matches a sequence of
1666  non-parentheses, optionally enclosed in parentheses.  non-parentheses, optionally enclosed in parentheses.
1667  </P>  </P>
1668    <P>
1669    If you were embedding this pattern in a larger one, you could use a relative
1670    reference:
1671    <pre>
1672      ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
1673    </pre>
1674    This makes the fragment independent of the parentheses in the larger pattern.
1675    </P>
1676  <br><b>  <br><b>
1677  Checking for a used subpattern by name  Checking for a used subpattern by name
1678  </b><br>  </b><br>
# Line 1629  subject is matched against the first alt Line 1756  subject is matched against the first alt
1756  against the second. This pattern matches strings in one of the two forms  against the second. This pattern matches strings in one of the two forms
1757  dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.  dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
1758  <a name="comments"></a></P>  <a name="comments"></a></P>
1759  <br><a name="SEC18" href="#TOC1">COMMENTS</a><br>  <br><a name="SEC19" href="#TOC1">COMMENTS</a><br>
1760  <P>  <P>
1761  The sequence (?# marks the start of a comment that continues up to the next  The sequence (?# marks the start of a comment that continues up to the next
1762  closing parenthesis. Nested parentheses are not permitted. The characters  closing parenthesis. Nested parentheses are not permitted. The characters
# Line 1640  If the PCRE_EXTENDED option is set, an u Line 1767  If the PCRE_EXTENDED option is set, an u
1767  character class introduces a comment that continues to immediately after the  character class introduces a comment that continues to immediately after the
1768  next newline in the pattern.  next newline in the pattern.
1769  <a name="recursion"></a></P>  <a name="recursion"></a></P>
1770  <br><a name="SEC19" href="#TOC1">RECURSIVE PATTERNS</a><br>  <br><a name="SEC20" href="#TOC1">RECURSIVE PATTERNS</a><br>
1771  <P>  <P>
1772  Consider the problem of matching a string in parentheses, allowing for  Consider the problem of matching a string in parentheses, allowing for
1773  unlimited nested parentheses. Without the use of recursion, the best that can  unlimited nested parentheses. Without the use of recursion, the best that can
# Line 1696  pattern, so instead you could use this: Line 1823  pattern, so instead you could use this:
1823    ( \( ( (?&#62;[^()]+) | (?1) )* \) )    ( \( ( (?&#62;[^()]+) | (?1) )* \) )
1824  </pre>  </pre>
1825  We have put the pattern into parentheses, and caused the recursion to refer to  We have put the pattern into parentheses, and caused the recursion to refer to
1826  them instead of the whole pattern. In a larger pattern, keeping track of  them instead of the whole pattern.
1827  parenthesis numbers can be tricky. It may be more convenient to use named  </P>
1828  parentheses instead. The Perl syntax for this is (?&name); PCRE's earlier  <P>
1829  syntax (?P&#62;name) is also supported. We could rewrite the above example as  In a larger pattern, keeping track of parenthesis numbers can be tricky. This
1830  follows:  is made easier by the use of relative references. (A Perl 5.10 feature.)
1831    Instead of (?1) in the pattern above you can write (?-2) to refer to the second
1832    most recently opened parentheses preceding the recursion. In other words, a
1833    negative number counts capturing parentheses leftwards from the point at which
1834    it is encountered.
1835    </P>
1836    <P>
1837    It is also possible to refer to subsequently opened parentheses, by writing
1838    references such as (?+2). However, these cannot be recursive because the
1839    reference is not inside the parentheses that are referenced. They are always
1840    "subroutine" calls, as described in the next section.
1841    </P>
1842    <P>
1843    An alternative approach is to use named parentheses instead. The Perl syntax
1844    for this is (?&name); PCRE's earlier syntax (?P&#62;name) is also supported. We
1845    could rewrite the above example as follows:
1846  <pre>  <pre>
1847    (?&#60;pn&#62; \( ( (?&#62;[^()]+) | (?&pn) )* \) )    (?&#60;pn&#62; \( ( (?&#62;[^()]+) | (?&pn) )* \) )
1848  </pre>  </pre>
1849  If there is more than one subpattern with the same name, the earliest one is  If there is more than one subpattern with the same name, the earliest one is
1850  used. This particular example pattern contains nested unlimited repeats, and so  used.
1851  the use of atomic grouping for matching strings of non-parentheses is important  </P>
1852  when applying the pattern to strings that do not match. For example, when this  <P>
1853  pattern is applied to  This particular example pattern that we have been looking at contains nested
1854    unlimited repeats, and so the use of atomic grouping for matching strings of
1855    non-parentheses is important when applying the pattern to strings that do not
1856    match. For example, when this pattern is applied to
1857  <pre>  <pre>
1858    (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()    (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
1859  </pre>  </pre>
# Line 1752  In this pattern, (?(R) is the start of a Line 1897  In this pattern, (?(R) is the start of a
1897  different alternatives for the recursive and non-recursive cases. The (?R) item  different alternatives for the recursive and non-recursive cases. The (?R) item
1898  is the actual recursive call.  is the actual recursive call.
1899  <a name="subpatternsassubroutines"></a></P>  <a name="subpatternsassubroutines"></a></P>
1900  <br><a name="SEC20" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>  <br><a name="SEC21" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
1901  <P>  <P>
1902  If the syntax for a recursive subpattern reference (either by number or by  If the syntax for a recursive subpattern reference (either by number or by
1903  name) is used outside the parentheses to which it refers, it operates like a  name) is used outside the parentheses to which it refers, it operates like a
1904  subroutine in a programming language. The "called" subpattern may be defined  subroutine in a programming language. The "called" subpattern may be defined
1905  before or after the reference. An earlier example pointed out that the pattern  before or after the reference. A numbered reference can be absolute or
1906    relative, as in these examples:
1907    <pre>
1908      (...(absolute)...)...(?2)...
1909      (...(relative)...)...(?-1)...
1910      (...(?+1)...(relative)...
1911    </pre>
1912    An earlier example pointed out that the pattern
1913  <pre>  <pre>
1914    (sens|respons)e and \1ibility    (sens|respons)e and \1ibility
1915  </pre>  </pre>
# Line 1780  When a subpattern is used as a subroutin Line 1932  When a subpattern is used as a subroutin
1932  case-independence are fixed when the subpattern is defined. They cannot be  case-independence are fixed when the subpattern is defined. They cannot be
1933  changed for different calls. For example, consider this pattern:  changed for different calls. For example, consider this pattern:
1934  <pre>  <pre>
1935    (abc)(?i:(?1))    (abc)(?i:(?-1))
1936  </pre>  </pre>
1937  It matches "abcabc". It does not match "abcABC" because the change of  It matches "abcabc". It does not match "abcABC" because the change of
1938  processing option does not affect the called subpattern.  processing option does not affect the called subpattern.
1939  </P>  </P>
1940  <br><a name="SEC21" href="#TOC1">CALLOUTS</a><br>  <br><a name="SEC22" href="#TOC1">CALLOUTS</a><br>
1941  <P>  <P>
1942  Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl  Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
1943  code to be obeyed in the middle of matching a regular expression. This makes it  code to be obeyed in the middle of matching a regular expression. This makes it
# Line 1804  function is to be called. If you want to Line 1956  function is to be called. If you want to
1956  can put a number less than 256 after the letter C. The default value is zero.  can put a number less than 256 after the letter C. The default value is zero.
1957  For example, this pattern has two callout points:  For example, this pattern has two callout points:
1958  <pre>  <pre>
1959    (?C1)\dabc(?C2)def    (?C1)abc(?C2)def
1960  </pre>  </pre>
1961  If the PCRE_AUTO_CALLOUT flag is passed to <b>pcre_compile()</b>, callouts are  If the PCRE_AUTO_CALLOUT flag is passed to <b>pcre_compile()</b>, callouts are
1962  automatically installed before each item in the pattern. They are all numbered  automatically installed before each item in the pattern. They are all numbered
# Line 1820  description of the interface to the call Line 1972  description of the interface to the call
1972  <a href="pcrecallout.html"><b>pcrecallout</b></a>  <a href="pcrecallout.html"><b>pcrecallout</b></a>
1973  documentation.  documentation.
1974  </P>  </P>
1975  <br><a name="SEC22" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC23" href="#TOC1">SEE ALSO</a><br>
1976  <P>  <P>
1977  <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3).  <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3).
1978  </P>  </P>
1979  <br><a name="SEC23" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC24" href="#TOC1">AUTHOR</a><br>
1980  <P>  <P>
1981  Philip Hazel  Philip Hazel
1982  <br>  <br>
# Line 1833  University Computing Service Line 1985  University Computing Service
1985  Cambridge CB2 3QH, England.  Cambridge CB2 3QH, England.
1986  <br>  <br>
1987  </P>  </P>
1988  <br><a name="SEC24" href="#TOC1">REVISION</a><br>  <br><a name="SEC25" href="#TOC1">REVISION</a><br>
1989  <P>  <P>
1990  Last updated: 06 March 2007  Last updated: 13 June 2007
1991  <br>  <br>
1992  Copyright &copy; 1997-2007 University of Cambridge.  Copyright &copy; 1997-2007 University of Cambridge.
1993  <br>  <br>

Legend:
Removed from v.111  
changed lines
  Added in v.182

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12