/[pcre]/code/trunk/pcre.3
ViewVC logotype

Diff of /code/trunk/pcre.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 34 by nigel, Sat Feb 24 21:39:01 2007 UTC revision 35 by nigel, Sat Feb 24 21:39:05 2007 UTC
# Line 20  pcre - Perl-compatible regular expressio Line 20  pcre - Perl-compatible regular expressio
20  .br  .br
21  .B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"  .B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR,"
22  .ti +5n  .ti +5n
23  .B "const char *\fIsubject\fR," int \fIlength\fR, int \fIoptions\fR,  .B "const char *\fIsubject\fR," int \fIlength\fR, int \fIstartoffset\fR,
24  .ti +5n  .ti +5n
25  .B int *\fIovector\fR, int \fIovecsize\fR);  .B int \fIoptions\fR, int *\fIovector\fR, int \fIovecsize\fR);
26  .PP  .PP
27  .br  .br
28  .B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR,  .B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR,
# Line 249  treated as letters), the following code Line 249  treated as letters), the following code
249  The tables are built in memory that is obtained via \fBpcre_malloc\fR. The  The tables are built in memory that is obtained via \fBpcre_malloc\fR. The
250  pointer that is passed to \fBpcre_compile\fR is saved with the compiled  pointer that is passed to \fBpcre_compile\fR is saved with the compiled
251  pattern, and the same tables are used via this pointer by \fBpcre_study()\fR  pattern, and the same tables are used via this pointer by \fBpcre_study()\fR
252  and \fBpcre_match()\fR. Thus for any single pattern, compilation, studying and  and \fBpcre_exec()\fR. Thus for any single pattern, compilation, studying and
253  matching all happen in the same locale, but different patterns can be compiled  matching all happen in the same locale, but different patterns can be compiled
254  in different locales. It is the caller's responsibility to ensure that the  in different locales. It is the caller's responsibility to ensure that the
255  memory containing the tables remains available for as long as it is needed.  memory containing the tables remains available for as long as it is needed.
# Line 293  pre-compiled pattern, which is passed in Line 293  pre-compiled pattern, which is passed in
293  pattern has been studied, the result of the study should be passed in the  pattern has been studied, the result of the study should be passed in the
294  \fIextra\fR argument. Otherwise this must be NULL.  \fIextra\fR argument. Otherwise this must be NULL.
295    
 The subject string is passed as a pointer in \fIsubject\fR and a length in  
 \fIlength\fR. Unlike the pattern string, it may contain binary zero characters.  
   
296  The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose  The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose
297  unused bits must be zero. However, if a pattern was compiled with  unused bits must be zero. However, if a pattern was compiled with
298  PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it  PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it
# Line 316  should not match it nor (except in multi Line 313  should not match it nor (except in multi
313  it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never  it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never
314  to match.  to match.
315    
316    The subject string is passed as a pointer in \fIsubject\fR, a length in
317    \fIlength\fR, and a starting offset in \fIstartoffset\fR. Unlike the pattern
318    string, it may contain binary zero characters. When the starting offset is
319    zero, the search for a match starts at the beginning of the subject, and this
320    is by far the most common case.
321    
322    A non-zero starting offset is useful when searching for another match in the
323    same subject by calling \fBpcre_exec()\fR again after a previous success.
324    Setting \fIstartoffset\fR differs from just passing over a shortened string and
325    setting PCRE_NOTBOL in the case of a pattern that begins with any kind of
326    lookbehind. For example, consider the pattern
327    
328      \\Biss\\B
329    
330    which finds occurrences of "iss" in the middle of words. (\\B matches only if
331    the current position in the subject is not a word boundary.) When applied to
332    the string "Mississipi" the first call to \fBpcre_exec()\fR finds the first
333    occurrence. If \fBpcre_exec()\fR is called again with just the remainder of the
334    subject, namely "issipi", it does not match, because \\B is always false at the
335    start of the subject, which is deemed to be a word boundary. However, if
336    \fBpcre_exec()\fR is passed the entire string again, but with \fIstartoffset\fR
337    set to 4, it finds the second occurrence of "iss" because it is able to look
338    behind the starting point to discover that it is preceded by a letter.
339    
340    If a non-zero starting offset is passed when the pattern is anchored, one
341    attempt to match at the given offset is tried. This can only succeed if the
342    pattern does not require the match to be at the start of the subject.
343    
344  In general, a pattern matches a certain portion of the subject, and in  In general, a pattern matches a certain portion of the subject, and in
345  addition, further substrings from the subject may be picked out by parts of the  addition, further substrings from the subject may be picked out by parts of the
346  pattern. Following the usage in Jeffrey Friedl's book, this is called  pattern. Following the usage in Jeffrey Friedl's book, this is called
# Line 730  first or last character matches \\w, res Line 755  first or last character matches \\w, res
755  The \\A, \\Z, and \\z assertions differ from the traditional circumflex and  The \\A, \\Z, and \\z assertions differ from the traditional circumflex and
756  dollar (described below) in that they only ever match at the very start and end  dollar (described below) in that they only ever match at the very start and end
757  of the subject string, whatever options are set. They are not affected by the  of the subject string, whatever options are set. They are not affected by the
758  PCRE_NOTBOL or PCRE_NOTEOL options. The difference between \\Z and \\z is that  PCRE_NOTBOL or PCRE_NOTEOL options. If the \fIstartoffset\fR argument of
759  \\Z matches before a newline that is the last character of the string as well  \fBpcre_exec()\fR is non-zero, \\A can never match. The difference between \\Z
760  as at the end of the string, whereas \\z matches only at the end.  and \\z is that \\Z matches before a newline that is the last character of the
761    string as well as at the end of the string, whereas \\z matches only at the
762    end.
763    
764    
765  .SH CIRCUMFLEX AND DOLLAR  .SH CIRCUMFLEX AND DOLLAR
766  Outside a character class, in the default matching mode, the circumflex  Outside a character class, in the default matching mode, the circumflex
767  character is an assertion which is true only if the current matching point is  character is an assertion which is true only if the current matching point is
768  at the start of the subject string. Inside a character class, circumflex has an  at the start of the subject string. If the \fIstartoffset\fR argument of
769  entirely different meaning (see below).  \fBpcre_exec()\fR is non-zero, circumflex can never match. Inside a character
770    class, circumflex has an entirely different meaning (see below).
771    
772  Circumflex need not be the first character of the pattern if a number of  Circumflex need not be the first character of the pattern if a number of
773  alternatives are involved, but it should be the first thing in each alternative  alternatives are involved, but it should be the first thing in each alternative
# Line 766  after and immediately before an internal Line 794  after and immediately before an internal
794  addition to matching at the start and end of the subject string. For example,  addition to matching at the start and end of the subject string. For example,
795  the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode,  the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode,
796  but not otherwise. Consequently, patterns that are anchored in single line mode  but not otherwise. Consequently, patterns that are anchored in single line mode
797  because all branches start with "^" are not anchored in multiline mode. The  because all branches start with "^" are not anchored in multiline mode, and a
798  PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.  match for circumflex is possible when the \fIstartoffset\fR argument of
799    \fBpcre_exec()\fR is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if
800    PCRE_MULTILINE is set.
801    
802  Note that the sequences \\A, \\Z, and \\z can be used to match the start and  Note that the sequences \\A, \\Z, and \\z can be used to match the start and
803  end of the subject in both modes, and if all branches of a pattern start with  end of the subject in both modes, and if all branches of a pattern start with
# Line 1219  matches an occurrence of "baz" that is p Line 1249  matches an occurrence of "baz" that is p
1249  preceded by "foo".  preceded by "foo".
1250    
1251  Assertion subpatterns are not capturing subpatterns, and may not be repeated,  Assertion subpatterns are not capturing subpatterns, and may not be repeated,
1252  because it makes no sense to assert the same thing several times. If an  because it makes no sense to assert the same thing several times. If any kind
1253  assertion contains capturing subpatterns within it, these are always counted  of assertion contains capturing subpatterns within it, these are counted for
1254  for the purposes of numbering the capturing subpatterns in the whole pattern.  the purposes of numbering the capturing subpatterns in the whole pattern.
1255  Substring capturing is carried out for positive assertions, but it does not  However, substring capturing is carried out only for positive assertions,
1256  make sense for negative assertions.  because it does not make sense for negative assertions.
1257    
1258  Assertions count towards the maximum of 200 parenthesized subpatterns.  Assertions count towards the maximum of 200 parenthesized subpatterns.
1259    
# Line 1390  Cambridge CB2 3QG, England. Line 1420  Cambridge CB2 3QG, England.
1420  .br  .br
1421  Phone: +44 1223 334714  Phone: +44 1223 334714
1422    
1423    Last updated: 10 June 1999
1424    .br
1425  Copyright (c) 1997-1999 University of Cambridge.  Copyright (c) 1997-1999 University of Cambridge.

Legend:
Removed from v.34  
changed lines
  Added in v.35

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12