/[pcre]/code/trunk/doc/html/pcrepattern.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepattern.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 87 by nigel, Sat Feb 24 21:41:21 2007 UTC revision 91 by nigel, Sat Feb 24 21:41:34 2007 UTC
# Line 123  The following sections describe the use Line 123  The following sections describe the use
123  <br><a name="SEC2" href="#TOC1">BACKSLASH</a><br>  <br><a name="SEC2" href="#TOC1">BACKSLASH</a><br>
124  <P>  <P>
125  The backslash character has several uses. Firstly, if it is followed by a  The backslash character has several uses. Firstly, if it is followed by a
126  non-alphanumeric character, it takes away any special meaning that character may  non-alphanumeric character, it takes away any special meaning that character
127  have. This use of backslash as an escape character applies both inside and  may have. This use of backslash as an escape character applies both inside and
128  outside character classes.  outside character classes.
129  </P>  </P>
130  <P>  <P>
# Line 137  particular, if you want to match a backs Line 137  particular, if you want to match a backs
137  <P>  <P>
138  If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the  If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the
139  pattern (other than in a character class) and characters between a # outside  pattern (other than in a character class) and characters between a # outside
140  a character class and the next newline character are ignored. An escaping  a character class and the next newline are ignored. An escaping backslash can
141  backslash can be used to include a whitespace or # character as part of the  be used to include a whitespace or # character as part of the pattern.
 pattern.  
142  </P>  </P>
143  <P>  <P>
144  If you want to remove the special meaning from a sequence of characters, you  If you want to remove the special meaning from a sequence of characters, you
# Line 198  syntaxes for \x. There is no difference Line 197  syntaxes for \x. There is no difference
197  example, \xdc is exactly the same as \x{dc}.  example, \xdc is exactly the same as \x{dc}.
198  </P>  </P>
199  <P>  <P>
200  After \0 up to two further octal digits are read. In both cases, if there  After \0 up to two further octal digits are read. If there are fewer than two
201  are fewer than two digits, just those that are present are used. Thus the  digits, just those that are present are used. Thus the sequence \0\x\07
202  sequence \0\x\07 specifies two binary zeros followed by a BEL character  specifies two binary zeros followed by a BEL character (code value 7). Make
203  (code value 7). Make sure you supply two digits after the initial zero if the  sure you supply two digits after the initial zero if the pattern character that
204  pattern character that follows is itself an octal digit.  follows is itself an octal digit.
205  </P>  </P>
206  <P>  <P>
207  The handling of a backslash followed by a digit other than 0 is complicated.  The handling of a backslash followed by a digit other than 0 is complicated.
# Line 217  following the discussion of Line 216  following the discussion of
216  <P>  <P>
217  Inside a character class, or if the decimal number is greater than 9 and there  Inside a character class, or if the decimal number is greater than 9 and there
218  have not been that many capturing subpatterns, PCRE re-reads up to three octal  have not been that many capturing subpatterns, PCRE re-reads up to three octal
219  digits following the backslash, and generates a single byte from the least  digits following the backslash, ane uses them to generate a data character. Any
220  significant 8 bits of the value. Any subsequent digits stand for themselves.  subsequent digits stand for themselves. In non-UTF-8 mode, the value of a
221  For example:  character specified in octal must be less than \400. In UTF-8 mode, values up
222    to \777 are permitted. For example:
223  <pre>  <pre>
224    \040   is another way of writing a space    \040   is another way of writing a space
225    \40    is the same, provided there are fewer than 40 previous capturing subpatterns    \40    is the same, provided there are fewer than 40 previous capturing subpatterns
# Line 235  Note that octal values of 100 or greater Line 235  Note that octal values of 100 or greater
235  zero, because no more than three octal digits are ever read.  zero, because no more than three octal digits are ever read.
236  </P>  </P>
237  <P>  <P>
238  All the sequences that define a single byte value or a single UTF-8 character  All the sequences that define a single character value can be used both inside
239  (in UTF-8 mode) can be used both inside and outside character classes. In  and outside character classes. In addition, inside a character class, the
240  addition, inside a character class, the sequence \b is interpreted as the  sequence \b is interpreted as the backspace character (hex 08), and the
241  backspace character (hex 08), and the sequence \X is interpreted as the  sequence \X is interpreted as the character "X". Outside a character class,
242  character "X". Outside a character class, these sequences have different  these sequences have different meanings
 meanings  
243  <a href="#uniextseq">(see below).</a>  <a href="#uniextseq">(see below).</a>
244  </P>  </P>
245  <br><b>  <br><b>
# Line 269  there is no character to match. Line 268  there is no character to match.
268  <P>  <P>
269  For compatibility with Perl, \s does not match the VT character (code 11).  For compatibility with Perl, \s does not match the VT character (code 11).
270  This makes it different from the the POSIX "space" class. The \s characters  This makes it different from the the POSIX "space" class. The \s characters
271  are HT (9), LF (10), FF (12), CR (13), and space (32).  are HT (9), LF (10), FF (12), CR (13), and space (32). (If "use locale;" is
272    included in a Perl script, \s may match the VT character. In PCRE, it never
273    does.)
274  </P>  </P>
275  <P>  <P>
276  A "word" character is an underscore or any character less than 256 that is a  A "word" character is an underscore or any character less than 256 that is a
# Line 447  a modifier or "other". Line 448  a modifier or "other".
448  </P>  </P>
449  <P>  <P>
450  The long synonyms for these properties that Perl supports (such as \p{Letter})  The long synonyms for these properties that Perl supports (such as \p{Letter})
451  are not supported by PCRE. Nor is is permitted to prefix any of these  are not supported by PCRE, nor is it permitted to prefix any of these
452  properties with "Is".  properties with "Is".
453  </P>  </P>
454  <P>  <P>
# Line 487  specifies a condition that has to be met Line 488  specifies a condition that has to be met
488  without consuming any characters from the subject string. The use of  without consuming any characters from the subject string. The use of
489  subpatterns for more complicated assertions is described  subpatterns for more complicated assertions is described
490  <a href="#bigassertions">below.</a>  <a href="#bigassertions">below.</a>
491  The backslashed  The backslashed assertions are:
 assertions are:  
492  <pre>  <pre>
493    \b     matches at a word boundary    \b     matches at a word boundary
494    \B     matches when not at a word boundary    \B     matches when not at a word boundary
# Line 515  PCRE_NOTBOL or PCRE_NOTEOL options, whic Line 515  PCRE_NOTBOL or PCRE_NOTEOL options, whic
515  circumflex and dollar metacharacters. However, if the <i>startoffset</i>  circumflex and dollar metacharacters. However, if the <i>startoffset</i>
516  argument of <b>pcre_exec()</b> is non-zero, indicating that matching is to start  argument of <b>pcre_exec()</b> is non-zero, indicating that matching is to start
517  at a point other than the beginning of the subject, \A can never match. The  at a point other than the beginning of the subject, \A can never match. The
518  difference between \Z and \z is that \Z matches before a newline that is the  difference between \Z and \z is that \Z matches before a newline at the end
519  last character of the string as well as at the end of the string, whereas \z  of the string as well as at the very end, whereas \z matches only at the end.
 matches only at the end.  
520  </P>  </P>
521  <P>  <P>
522  The \G assertion is true only when the current matching position is at the  The \G assertion is true only when the current matching position is at the
# Line 561  to be anchored.) Line 560  to be anchored.)
560  <P>  <P>
561  A dollar character is an assertion that is true only if the current matching  A dollar character is an assertion that is true only if the current matching
562  point is at the end of the subject string, or immediately before a newline  point is at the end of the subject string, or immediately before a newline
563  character that is the last character in the string (by default). Dollar need  at the end of the string (by default). Dollar need not be the last character of
564  not be the last character of the pattern if a number of alternatives are  the pattern if a number of alternatives are involved, but it should be the last
565  involved, but it should be the last item in any branch in which it appears.  item in any branch in which it appears. Dollar has no special meaning in a
566  Dollar has no special meaning in a character class.  character class.
567  </P>  </P>
568  <P>  <P>
569  The meaning of dollar can be changed so that it matches only at the very end of  The meaning of dollar can be changed so that it matches only at the very end of
# Line 573  does not affect the \Z assertion. Line 572  does not affect the \Z assertion.
572  </P>  </P>
573  <P>  <P>
574  The meanings of the circumflex and dollar characters are changed if the  The meanings of the circumflex and dollar characters are changed if the
575  PCRE_MULTILINE option is set. When this is the case, they match immediately  PCRE_MULTILINE option is set. When this is the case, a circumflex matches
576  after and immediately before an internal newline character, respectively, in  immediately after internal newlines as well as at the start of the subject
577  addition to matching at the start and end of the subject string. For example,  string. It does not match after a newline that ends the string. A dollar
578  the pattern /^abc$/ matches the subject string "def\nabc" (where \n  matches before any newlines in the string, as well as at the very end, when
579  represents a newline character) in multiline mode, but not otherwise.  PCRE_MULTILINE is set. When newline is specified as the two-character
580  Consequently, patterns that are anchored in single line mode because all  sequence CRLF, isolated CR and LF characters do not indicate newlines.
581  branches start with ^ are not anchored in multiline mode, and a match for  </P>
582  circumflex is possible when the <i>startoffset</i> argument of <b>pcre_exec()</b>  <P>
583  is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is  For example, the pattern /^abc$/ matches the subject string "def\nabc" (where
584  set.  \n represents a newline) in multiline mode, but not otherwise. Consequently,
585    patterns that are anchored in single line mode because all branches start with
586    ^ are not anchored in multiline mode, and a match for circumflex is possible
587    when the <i>startoffset</i> argument of <b>pcre_exec()</b> is non-zero. The
588    PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.
589  </P>  </P>
590  <P>  <P>
591  Note that the sequences \A, \Z, and \z can be used to match the start and  Note that the sequences \A, \Z, and \z can be used to match the start and
592  end of the subject in both modes, and if all branches of a pattern start with  end of the subject in both modes, and if all branches of a pattern start with
593  \A it is always anchored, whether PCRE_MULTILINE is set or not.  \A it is always anchored, whether or not PCRE_MULTILINE is set.
594  </P>  </P>
595  <br><a name="SEC4" href="#TOC1">FULL STOP (PERIOD, DOT)</a><br>  <br><a name="SEC4" href="#TOC1">FULL STOP (PERIOD, DOT)</a><br>
596  <P>  <P>
597  Outside a character class, a dot in the pattern matches any one character in  Outside a character class, a dot in the pattern matches any one character in
598  the subject, including a non-printing character, but not (by default) newline.  the subject string except (by default) a character that signifies the end of a
599  In UTF-8 mode, a dot matches any UTF-8 character, which might be more than one  line. In UTF-8 mode, the matched character may be more than one byte long. When
600  byte long, except (by default) newline. If the PCRE_DOTALL option is set,  a line ending is defined as a single character (CR or LF), dot never matches
601  dots match newlines as well. The handling of dot is entirely independent of the  that character; when the two-character sequence CRLF is used, dot does not
602  handling of circumflex and dollar, the only relationship being that they both  match CR if it is immediately followed by LF, but otherwise it matches all
603  involve newline characters. Dot has no special meaning in a character class.  characters (including isolated CRs and LFs).
604    </P>
605    <P>
606    The behaviour of dot with regard to newlines can be changed. If the PCRE_DOTALL
607    option is set, a dot matches any one character, without exception. If newline
608    is defined as the two-character sequence CRLF, it takes two dots to match it.
609    </P>
610    <P>
611    The handling of dot is entirely independent of the handling of circumflex and
612    dollar, the only relationship being that they both involve newlines. Dot has no
613    special meaning in a character class.
614  </P>  </P>
615  <br><a name="SEC5" href="#TOC1">MATCHING A SINGLE BYTE</a><br>  <br><a name="SEC5" href="#TOC1">MATCHING A SINGLE BYTE</a><br>
616  <P>  <P>
617  Outside a character class, the escape sequence \C matches any one byte, both  Outside a character class, the escape sequence \C matches any one byte, both
618  in and out of UTF-8 mode. Unlike a dot, it can match a newline. The feature is  in and out of UTF-8 mode. Unlike a dot, it always matches CR and LF. The
619  provided in Perl in order to match individual bytes in UTF-8 mode. Because it  feature is provided in Perl in order to match individual bytes in UTF-8 mode.
620  breaks up UTF-8 characters into individual bytes, what remains in the string  Because it breaks up UTF-8 characters into individual bytes, what remains in
621  may be a malformed UTF-8 string. For this reason, the \C escape sequence is  the string may be a malformed UTF-8 string. For this reason, the \C escape
622  best avoided.  sequence is best avoided.
623  </P>  </P>
624  <P>  <P>
625  PCRE does not allow \C to appear in lookbehind assertions  PCRE does not allow \C to appear in lookbehind assertions
# Line 657  ensure that PCRE is compiled with Unicod Line 670  ensure that PCRE is compiled with Unicod
670  UTF-8 support.  UTF-8 support.
671  </P>  </P>
672  <P>  <P>
673  The newline character is never treated in any special way in character classes,  Characters that might indicate line breaks (CR and LF) are never treated in any
674  whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class  special way when matching character classes, whatever line-ending sequence is
675  such as [^a] will always match a newline.  in use, and whatever setting of the PCRE_DOTALL and PCRE_MULTILINE options is
676    used. A class such as [^a] always matches one of these characters.
677  </P>  </P>
678  <P>  <P>
679  The minus (hyphen) character can be used to specify a range of characters in a  The minus (hyphen) character can be used to specify a range of characters in a
# Line 762  the pattern Line 776  the pattern
776    gilbert|sullivan    gilbert|sullivan
777  </pre>  </pre>
778  matches either "gilbert" or "sullivan". Any number of alternatives may appear,  matches either "gilbert" or "sullivan". Any number of alternatives may appear,
779  and an empty alternative is permitted (matching the empty string).  and an empty alternative is permitted (matching the empty string). The matching
780  The matching process tries each alternative in turn, from left to right,  process tries each alternative in turn, from left to right, and the first one
781  and the first one that succeeds is used. If the alternatives are within a  that succeeds is used. If the alternatives are within a subpattern
 subpattern  
782  <a href="#subpattern">(defined below),</a>  <a href="#subpattern">(defined below),</a>
783  "succeeds" means matching the rest of the main pattern as well as the  "succeeds" means matching the rest of the main pattern as well as the
784  alternative in the subpattern.  alternative in the subpattern.
# Line 814  option settings happen at compile time. Line 827  option settings happen at compile time.
827  behaviour otherwise.  behaviour otherwise.
828  </P>  </P>
829  <P>  <P>
830  The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can be changed in the  The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be
831  same way as the Perl-compatible options by using the characters U and X  changed in the same way as the Perl-compatible options by using the characters
832  respectively. The (?X) flag setting is special in that it must always occur  J, U and X respectively.
 earlier in the pattern than any of the additional features it turns on, even  
 when it is at top level. It is best to put it at the start.  
833  <a name="subpattern"></a></P>  <a name="subpattern"></a></P>
834  <br><a name="SEC10" href="#TOC1">SUBPATTERNS</a><br>  <br><a name="SEC10" href="#TOC1">SUBPATTERNS</a><br>
835  <P>  <P>
# Line 881  Identifying capturing parentheses by num Line 892  Identifying capturing parentheses by num
892  to keep track of the numbers in complicated regular expressions. Furthermore,  to keep track of the numbers in complicated regular expressions. Furthermore,
893  if an expression is modified, the numbers may change. To help with this  if an expression is modified, the numbers may change. To help with this
894  difficulty, PCRE supports the naming of subpatterns, something that Perl does  difficulty, PCRE supports the naming of subpatterns, something that Perl does
895  not provide. The Python syntax (?P&#60;name&#62;...) is used. Names consist of  not provide. The Python syntax (?P&#60;name&#62;...) is used. References to capturing
896  alphanumeric characters and underscores, and must be unique within a pattern.  parentheses from other parts of the pattern, such as
897  </P>  <a href="#backreferences">backreferences,</a>
898  <P>  <a href="#recursion">recursion,</a>
899  Named capturing parentheses are still allocated numbers as well as names. The  and
900  PCRE API provides function calls for extracting the name-to-number translation  <a href="#conditions">conditions,</a>
901  table from a compiled pattern. There is also a convenience function for  can be made by name as well as by number.
902  extracting a captured substring by name. For further details see the  </P>
903    <P>
904    Names consist of up to 32 alphanumeric characters and underscores. Named
905    capturing parentheses are still allocated numbers as well as names. The PCRE
906    API provides function calls for extracting the name-to-number translation table
907    from a compiled pattern. There is also a convenience function for extracting a
908    captured substring by name.
909    </P>
910    <P>
911    By default, a name must be unique within a pattern, but it is possible to relax
912    this constraint by setting the PCRE_DUPNAMES option at compile time. This can
913    be useful for patterns where only one instance of the named parentheses can
914    match. Suppose you want to match the name of a weekday, either as a 3-letter
915    abbreviation or as the full name, and in both cases you want to extract the
916    abbreviation. This pattern (ignoring the line breaks) does the job:
917    <pre>
918      (?P&#60;DN&#62;Mon|Fri|Sun)(?:day)?|
919      (?P&#60;DN&#62;Tue)(?:sday)?|
920      (?P&#60;DN&#62;Wed)(?:nesday)?|
921      (?P&#60;DN&#62;Thu)(?:rsday)?|
922      (?P&#60;DN&#62;Sat)(?:urday)?
923    </pre>
924    There are five capturing substrings, but only one is ever set after a match.
925    The convenience function for extracting the data by name returns the substring
926    for the first, and in this example, the only, subpattern of that name that
927    matched. This saves searching to find which numbered subpattern it was. If you
928    make a reference to a non-unique named subpattern from elsewhere in the
929    pattern, the one that corresponds to the lowest number is used. For further
930    details of the interfaces for handling named subpatterns, see the
931  <a href="pcreapi.html"><b>pcreapi</b></a>  <a href="pcreapi.html"><b>pcreapi</b></a>
932  documentation.  documentation.
933  </P>  </P>
# Line 1102  atomic group. However, there is no diffe Line 1141  atomic group. However, there is no diffe
1141  possessive quantifier and the equivalent atomic group.  possessive quantifier and the equivalent atomic group.
1142  </P>  </P>
1143  <P>  <P>
1144  The possessive quantifier syntax is an extension to the Perl syntax. It  The possessive quantifier syntax is an extension to the Perl syntax. Jeffrey
1145  originates in Sun's Java package.  Friedl originated the idea (and the name) in the first edition of his book.
1146    Mike McCloskey liked it, so implemented it when he built Sun's Java package,
1147    and PCRE copied it from there.
1148  </P>  </P>
1149  <P>  <P>
1150  When a pattern contains an unlimited repeat inside a subpattern that can itself  When a pattern contains an unlimited repeat inside a subpattern that can itself
# Line 1144  However, if the decimal number following Line 1185  However, if the decimal number following
1185  always taken as a back reference, and causes an error only if there are not  always taken as a back reference, and causes an error only if there are not
1186  that many capturing left parentheses in the entire pattern. In other words, the  that many capturing left parentheses in the entire pattern. In other words, the
1187  parentheses that are referenced need not be to the left of the reference for  parentheses that are referenced need not be to the left of the reference for
1188  numbers less than 10. See the subsection entitled "Non-printing characters"  numbers less than 10. A "forward back reference" of this type can make sense
1189    when a repetition is involved and the subpattern to the right has participated
1190    in an earlier iteration.
1191    </P>
1192    <P>
1193    It is not possible to have a numerical "forward back reference" to subpattern
1194    whose number is 10 or more. However, a back reference to any subpattern is
1195    possible using named parentheses (see below). See also the subsection entitled
1196    "Non-printing characters"
1197  <a href="#digitsafterbackslash">above</a>  <a href="#digitsafterbackslash">above</a>
1198  for further details of the handling of digits following a backslash.  for further details of the handling of digits following a backslash.
1199  </P>  </P>
# Line 1170  capturing subpattern is matched caseless Line 1219  capturing subpattern is matched caseless
1219  Back references to named subpatterns use the Python syntax (?P=name). We could  Back references to named subpatterns use the Python syntax (?P=name). We could
1220  rewrite the above example as follows:  rewrite the above example as follows:
1221  <pre>  <pre>
1222    (?&#60;p1&#62;(?i)rah)\s+(?P=p1)    (?P&#60;p1&#62;(?i)rah)\s+(?P=p1)
1223  </pre>  </pre>
1224    A subpattern that is referenced by name may appear in the pattern before or
1225    after the reference.
1226    </P>
1227    <P>
1228  There may be more than one back reference to the same subpattern. If a  There may be more than one back reference to the same subpattern. If a
1229  subpattern has not actually been used in a particular match, any back  subpattern has not actually been used in a particular match, any back
1230  references to it always fail. For example, the pattern  references to it always fail. For example, the pattern
# Line 1227  because it does not make sense for negat Line 1280  because it does not make sense for negat
1280  Lookahead assertions  Lookahead assertions
1281  </b><br>  </b><br>
1282  <P>  <P>
1283  Lookahead assertions start  Lookahead assertions start with (?= for positive assertions and (?! for
1284  with (?= for positive assertions and (?! for negative assertions. For example,  negative assertions. For example,
1285  <pre>  <pre>
1286    \w+(?=;)    \w+(?=;)
1287  </pre>  </pre>
# Line 1263  negative assertions. For example, Line 1316  negative assertions. For example,
1316  </pre>  </pre>
1317  does find an occurrence of "bar" that is not preceded by "foo". The contents of  does find an occurrence of "bar" that is not preceded by "foo". The contents of
1318  a lookbehind assertion are restricted such that all the strings it matches must  a lookbehind assertion are restricted such that all the strings it matches must
1319  have a fixed length. However, if there are several alternatives, they do not  have a fixed length. However, if there are several top-level alternatives, they
1320  all have to have the same fixed length. Thus  do not all have to have the same fixed length. Thus
1321  <pre>  <pre>
1322    (?&#60;=bullock|donkey)    (?&#60;=bullock|donkey)
1323  </pre>  </pre>
# Line 1359  preceded by "foo", while Line 1412  preceded by "foo", while
1412  </pre>  </pre>
1413  is another pattern that matches "foo" preceded by three digits and any three  is another pattern that matches "foo" preceded by three digits and any three
1414  characters that are not "999".  characters that are not "999".
1415  </P>  <a name="conditions"></a></P>
1416  <br><a name="SEC16" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>  <br><a name="SEC16" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
1417  <P>  <P>
1418  It is possible to cause the matching process to obey a subpattern  It is possible to cause the matching process to obey a subpattern
# Line 1376  subpattern, a compile-time error occurs. Line 1429  subpattern, a compile-time error occurs.
1429  </P>  </P>
1430  <P>  <P>
1431  There are three kinds of condition. If the text between the parentheses  There are three kinds of condition. If the text between the parentheses
1432  consists of a sequence of digits, the condition is satisfied if the capturing  consists of a sequence of digits, or a sequence of alphanumeric characters and
1433  subpattern of that number has previously matched. The number must be greater  underscores, the condition is satisfied if the capturing subpattern of that
1434  than zero. Consider the following pattern, which contains non-significant white  number or name has previously matched. There is a possible ambiguity here,
1435  space to make it more readable (assume the PCRE_EXTENDED option) and to divide  because subpattern names may consist entirely of digits. PCRE looks first for a
1436  it into three parts for ease of discussion:  named subpattern; if it cannot find one and the text consists entirely of
1437    digits, it looks for a subpattern of that number, which must be greater than
1438    zero. Using subpattern names that consist entirely of digits is not
1439    recommended.
1440    </P>
1441    <P>
1442    Consider the following pattern, which contains non-significant white space to
1443    make it more readable (assume the PCRE_EXTENDED option) and to divide it into
1444    three parts for ease of discussion:
1445  <pre>  <pre>
1446    ( \( )?    [^()]+    (?(1) \) )    ( \( )?    [^()]+    (?(1) \) )
1447  </pre>  </pre>
# Line 1392  or not. If they did, that is, if subject Line 1453  or not. If they did, that is, if subject
1453  the condition is true, and so the yes-pattern is executed and a closing  the condition is true, and so the yes-pattern is executed and a closing
1454  parenthesis is required. Otherwise, since no-pattern is not present, the  parenthesis is required. Otherwise, since no-pattern is not present, the
1455  subpattern matches nothing. In other words, this pattern matches a sequence of  subpattern matches nothing. In other words, this pattern matches a sequence of
1456  non-parentheses, optionally enclosed in parentheses.  non-parentheses, optionally enclosed in parentheses. Rewriting it to use a
1457  </P>  named subpattern gives this:
1458  <P>  <pre>
1459  If the condition is the string (R), it is satisfied if a recursive call to the    (?P&#60;OPEN&#62; \( )?    [^()]+    (?(OPEN) \) )
1460  pattern or subpattern has been made. At "top level", the condition is false.  </pre>
1461  This is a PCRE extension. Recursive patterns are described in the next section.  If the condition is the string (R), and there is no subpattern with the name R,
1462    the condition is satisfied if a recursive call to the pattern or subpattern has
1463    been made. At "top level", the condition is false. This is a PCRE extension.
1464    Recursive patterns are described in the next section.
1465  </P>  </P>
1466  <P>  <P>
1467  If the condition is not a sequence of digits or (R), it must be an assertion.  If the condition is not a sequence of digits or (R), it must be an assertion.
# Line 1423  that make up a comment play no part in t Line 1487  that make up a comment play no part in t
1487  </P>  </P>
1488  <P>  <P>
1489  If the PCRE_EXTENDED option is set, an unescaped # character outside a  If the PCRE_EXTENDED option is set, an unescaped # character outside a
1490  character class introduces a comment that continues up to the next newline  character class introduces a comment that continues to immediately after the
1491  character in the pattern.  next newline in the pattern.
1492  </P>  <a name="recursion"></a></P>
1493  <br><a name="SEC18" href="#TOC1">RECURSIVE PATTERNS</a><br>  <br><a name="SEC18" href="#TOC1">RECURSIVE PATTERNS</a><br>
1494  <P>  <P>
1495  Consider the problem of matching a string in parentheses, allowing for  Consider the problem of matching a string in parentheses, allowing for
# Line 1544  matches "sense and sensibility" and "res Line 1608  matches "sense and sensibility" and "res
1608    (sens|respons)e and (?1)ibility    (sens|respons)e and (?1)ibility
1609  </pre>  </pre>
1610  is used, it does match "sense and responsibility" as well as the other two  is used, it does match "sense and responsibility" as well as the other two
1611  strings. Such references must, however, follow the subpattern to which they  strings. Such references, if given numerically, must follow the subpattern to
1612  refer.  which they refer. However, named references can refer to later subpatterns.
1613  </P>  </P>
1614  <P>  <P>
1615  Like recursive subpatterns, a "subroutine" call is always treated as an atomic  Like recursive subpatterns, a "subroutine" call is always treated as an atomic
# Line 1589  description of the interface to the call Line 1653  description of the interface to the call
1653  documentation.  documentation.
1654  </P>  </P>
1655  <P>  <P>
1656  Last updated: 24 January 2006  Last updated: 06 June 2006
1657  <br>  <br>
1658  Copyright &copy; 1997-2006 University of Cambridge.  Copyright &copy; 1997-2006 University of Cambridge.
1659  <p>  <p>

Legend:
Removed from v.87  
changed lines
  Added in v.91

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12