/[pcre]/code/tags/pcre-6.7/doc/pcre.3
ViewVC logotype

Diff of /code/tags/pcre-6.7/doc/pcre.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 43 by nigel, Sat Feb 24 21:39:21 2007 UTC revision 47 by nigel, Sat Feb 24 21:39:29 2007 UTC
# Line 323  no back references. Line 323  no back references.
323    
324  Return information about the first character of any matched string, for a  Return information about the first character of any matched string, for a
325  non-anchored pattern. If there is a fixed first character, e.g. from a pattern  non-anchored pattern. If there is a fixed first character, e.g. from a pattern
326  such as (cat|cow|coyote), then it is returned in the integer pointed to by  such as (cat|cow|coyote), it is returned in the integer pointed to by
327  \fIwhere\fR. Otherwise, if either  \fIwhere\fR. Otherwise, if either
328    
329  (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch  (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
# Line 332  starts with "^", or Line 332  starts with "^", or
332  (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set  (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
333  (if it were set, the pattern would be anchored),  (if it were set, the pattern would be anchored),
334    
335  then -1 is returned, indicating that the pattern matches only at the  -1 is returned, indicating that the pattern matches only at the start of a
336  start of a subject string or after any "\\n" within the string. Otherwise -2 is  subject string or after any "\\n" within the string. Otherwise -2 is returned.
337  returned. For anchored patterns, -2 is returned.  For anchored patterns, -2 is returned.
338    
339    PCRE_INFO_FIRSTTABLE    PCRE_INFO_FIRSTTABLE
340    
# Line 550  is a pointer to the vector of integer of Line 550  is a pointer to the vector of integer of
550  were captured by the match, including the substring that matched the entire  were captured by the match, including the substring that matched the entire
551  regular expression. This is the value returned by \fBpcre_exec\fR if it  regular expression. This is the value returned by \fBpcre_exec\fR if it
552  is greater than zero. If \fBpcre_exec()\fR returned zero, indicating that it  is greater than zero. If \fBpcre_exec()\fR returned zero, indicating that it
553  ran out of space in \fIovector\fR, then the value passed as  ran out of space in \fIovector\fR, the value passed as \fIstringcount\fR should
554  \fIstringcount\fR should be the size of the vector divided by three.  be the size of the vector divided by three.
555    
556  The functions \fBpcre_copy_substring()\fR and \fBpcre_get_substring()\fR  The functions \fBpcre_copy_substring()\fR and \fBpcre_get_substring()\fR
557  extract a single substring, whose number is given as \fIstringnumber\fR. A  extract a single substring, whose number is given as \fIstringnumber\fR. A
# Line 650  patterns using the non-Perl item (?R). Line 650  patterns using the non-Perl item (?R).
650  with the settings of captured strings when part of a pattern is repeated. For  with the settings of captured strings when part of a pattern is repeated. For
651  example, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value  example, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
652  "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, if  "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, if
653  the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) get set.  the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) are set.
654    
655  In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in the  In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in the
656  future Perl changes to a consistent state that is different, PCRE may change to  future Perl changes to a consistent state that is different, PCRE may change to
# Line 920  end of the subject in both modes, and if Line 920  end of the subject in both modes, and if
920  .SH FULL STOP (PERIOD, DOT)  .SH FULL STOP (PERIOD, DOT)
921  Outside a character class, a dot in the pattern matches any one character in  Outside a character class, a dot in the pattern matches any one character in
922  the subject, including a non-printing character, but not (by default) newline.  the subject, including a non-printing character, but not (by default) newline.
923  If the PCRE_DOTALL option is set, then dots match newlines as well. The  If the PCRE_DOTALL option is set, dots match newlines as well. The handling of
924  handling of dot is entirely independent of the handling of circumflex and  dot is entirely independent of the handling of circumflex and dollar, the only
925  dollar, the only relationship being that they both involve newline characters.  relationship being that they both involve newline characters. Dot has no
926  Dot has no special meaning in a character class.  special meaning in a character class.
927    
928    
929  .SH SQUARE BRACKETS  .SH SQUARE BRACKETS
# Line 1213  to the string Line 1213  to the string
1213  fails, because it matches the entire string due to the greediness of the .*  fails, because it matches the entire string due to the greediness of the .*
1214  item.  item.
1215    
1216  However, if a quantifier is followed by a question mark, then it ceases to be  However, if a quantifier is followed by a question mark, it ceases to be
1217  greedy, and instead matches the minimum number of times possible, so the  greedy, and instead matches the minimum number of times possible, so the
1218  pattern  pattern
1219    
# Line 1229  own right. Because it has two uses, it c Line 1229  own right. Because it has two uses, it c
1229  which matches one digit by preference, but can match two if that is the only  which matches one digit by preference, but can match two if that is the only
1230  way the rest of the pattern matches.  way the rest of the pattern matches.
1231    
1232  If the PCRE_UNGREEDY option is set (an option which is not available in Perl)  If the PCRE_UNGREEDY option is set (an option which is not available in Perl),
1233  then the quantifiers are not greedy by default, but individual ones can be made  the quantifiers are not greedy by default, but individual ones can be made
1234  greedy by following them with a question mark. In other words, it inverts the  greedy by following them with a question mark. In other words, it inverts the
1235  default behaviour.  default behaviour.
1236    
# Line 1239  is greater than 1 or with a limited maxi Line 1239  is greater than 1 or with a limited maxi
1239  compiled pattern, in proportion to the size of the minimum or maximum.  compiled pattern, in proportion to the size of the minimum or maximum.
1240    
1241  If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent  If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent
1242  to Perl's /s) is set, thus allowing the . to match newlines, then the pattern  to Perl's /s) is set, thus allowing the . to match newlines, the pattern is
1243  is implicitly anchored, because whatever follows will be tried against every  implicitly anchored, because whatever follows will be tried against every
1244  character position in the subject string, so there is no point in retrying the  character position in the subject string, so there is no point in retrying the
1245  overall match at any position after the first. PCRE treats such a pattern as  overall match at any position after the first. PCRE treats such a pattern as
1246  though it were preceded by \\A. In cases where it is known that the subject  though it were preceded by \\A. In cases where it is known that the subject
# Line 1284  itself. So the pattern Line 1284  itself. So the pattern
1284    
1285  matches "sense and sensibility" and "response and responsibility", but not  matches "sense and sensibility" and "response and responsibility", but not
1286  "sense and responsibility". If caseful matching is in force at the time of the  "sense and responsibility". If caseful matching is in force at the time of the
1287  back reference, then the case of letters is relevant. For example,  back reference, the case of letters is relevant. For example,
1288    
1289    ((?i)rah)\\s+\\1    ((?i)rah)\\s+\\1
1290    
# Line 1292  matches "rah rah" and "RAH RAH", but not Line 1292  matches "rah rah" and "RAH RAH", but not
1292  capturing subpattern is matched caselessly.  capturing subpattern is matched caselessly.
1293    
1294  There may be more than one back reference to the same subpattern. If a  There may be more than one back reference to the same subpattern. If a
1295  subpattern has not actually been used in a particular match, then any back  subpattern has not actually been used in a particular match, any back
1296  references to it always fail. For example, the pattern  references to it always fail. For example, the pattern
1297    
1298    (a|(bc))\\2    (a|(bc))\\2
# Line 1300  references to it always fail. For exampl Line 1300  references to it always fail. For exampl
1300  always fails if it starts to match "a" rather than "bc". Because there may be  always fails if it starts to match "a" rather than "bc". Because there may be
1301  up to 99 back references, all digits following the backslash are taken  up to 99 back references, all digits following the backslash are taken
1302  as part of a potential back reference number. If the pattern continues with a  as part of a potential back reference number. If the pattern continues with a
1303  digit character, then some delimiter must be used to terminate the back  digit character, some delimiter must be used to terminate the back reference.
1304  reference. If the PCRE_EXTENDED option is set, this can be whitespace.  If the PCRE_EXTENDED option is set, this can be whitespace. Otherwise an empty
1305  Otherwise an empty comment can be used.  comment can be used.
1306    
1307  A back reference that occurs inside the parentheses to which it refers fails  A back reference that occurs inside the parentheses to which it refers fails
1308  when the subpattern is first used, so, for example, (a\\1) never matches.  when the subpattern is first used, so, for example, (a\\1) never matches.
# Line 1390  Several assertions (of any sort) may occ Line 1390  Several assertions (of any sort) may occ
1390  matches "foo" preceded by three digits that are not "999". Notice that each of  matches "foo" preceded by three digits that are not "999". Notice that each of
1391  the assertions is applied independently at the same point in the subject  the assertions is applied independently at the same point in the subject
1392  string. First there is a check that the previous three characters are all  string. First there is a check that the previous three characters are all
1393  digits, then there is a check that the same three characters are not "999".  digits, and then there is a check that the same three characters are not "999".
1394  This pattern does \fInot\fR match "foo" preceded by six characters, the first  This pattern does \fInot\fR match "foo" preceded by six characters, the first
1395  of which are digits and the last three of which are not "999". For example, it  of which are digits and the last three of which are not "999". For example, it
1396  doesn't match "123abcfoo". A pattern to do that is  doesn't match "123abcfoo". A pattern to do that is
# Line 1475  what follows matches the rest of the pat Line 1475  what follows matches the rest of the pat
1475    
1476    ^.*abcd$    ^.*abcd$
1477    
1478  then the initial .* matches the entire string at first, but when this fails  the initial .* matches the entire string at first, but when this fails (because
1479  (because there is no following "a"), it backtracks to match all but the last  there is no following "a"), it backtracks to match all but the last character,
1480  character, then all but the last two characters, and so on. Once again the  then all but the last two characters, and so on. Once again the search for "a"
1481  search for "a" covers the entire string, from right to left, so we are no  covers the entire string, from right to left, so we are no better off. However,
1482  better off. However, if the pattern is written as  if the pattern is written as
1483    
1484    ^(?>.*)(?<=abcd)    ^(?>.*)(?<=abcd)
1485    
1486  then there can be no backtracking for the .* item; it can match only the entire  there can be no backtracking for the .* item; it can match only the entire
1487  string. The subsequent lookbehind assertion does a single test on the last four  string. The subsequent lookbehind assertion does a single test on the last four
1488  characters. If it fails, the match fails immediately. For long strings, this  characters. If it fails, the match fails immediately. For long strings, this
1489  approach makes a significant difference to the processing time.  approach makes a significant difference to the processing time.
# Line 1528  no-pattern (if present) is used. If ther Line 1528  no-pattern (if present) is used. If ther
1528  subpattern, a compile-time error occurs.  subpattern, a compile-time error occurs.
1529    
1530  There are two kinds of condition. If the text between the parentheses consists  There are two kinds of condition. If the text between the parentheses consists
1531  of a sequence of digits, then the condition is satisfied if the capturing  of a sequence of digits, the condition is satisfied if the capturing subpattern
1532  subpattern of that number has previously matched. Consider the following  of that number has previously matched. Consider the following pattern, which
1533  pattern, which contains non-significant white space to make it more readable  contains non-significant white space to make it more readable (assume the
1534  (assume the PCRE_EXTENDED option) and to divide it into three parts for ease  PCRE_EXTENDED option) and to divide it into three parts for ease of discussion:
 of discussion:  
1535    
1536    ( \\( )?    [^()]+    (?(1) \\) )    ( \\( )?    [^()]+    (?(1) \\) )
1537    
# Line 1622  on at the top level. If additional paren Line 1621  on at the top level. If additional paren
1621    \\( ( ( (?>[^()]+) | (?R) )* ) \\)    \\( ( ( (?>[^()]+) | (?R) )* ) \\)
1622       ^                        ^       ^                        ^
1623       ^                        ^       ^                        ^
1624  then the string they capture is "ab(cd)ef", the contents of the top level  the string they capture is "ab(cd)ef", the contents of the top level
1625  parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE  parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE
1626  has to obtain extra memory to store data during a recursion, which it does by  has to obtain extra memory to store data during a recursion, which it does by
1627  using \fBpcre_malloc\fR, freeing it via \fBpcre_free\fR afterwards. If no  using \fBpcre_malloc\fR, freeing it via \fBpcre_free\fR afterwards. If no

Legend:
Removed from v.43  
changed lines
  Added in v.47

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12