/[pcre]/code/trunk/pcre.3
ViewVC logotype

Diff of /code/trunk/pcre.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 28 by nigel, Sat Feb 24 21:38:49 2007 UTC revision 29 by nigel, Sat Feb 24 21:38:53 2007 UTC
# Line 13  pcre - Perl-compatible regular expressio Line 13  pcre - Perl-compatible regular expressio
13  .B const unsigned char *\fItableptr\fR);  .B const unsigned char *\fItableptr\fR);
14  .PP  .PP
15  .br  .br
 .B const unsigned char *pcre_maketables(void);  
 .PP  
 .br  
16  .B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR,  .B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR,
17  .ti +5n  .ti +5n
18  .B const char **\fIerrptr\fR);  .B const char **\fIerrptr\fR);
# Line 28  pcre - Perl-compatible regular expressio Line 25  pcre - Perl-compatible regular expressio
25  .B int *\fIovector\fR, int \fIovecsize\fR);  .B int *\fIovector\fR, int \fIovecsize\fR);
26  .PP  .PP
27  .br  .br
28    .B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR,
29    .ti +5n
30    .B int \fIstringcount\fR, int \fIstringnumber\fR, char *\fIbuffer\fR,
31    .ti +5n
32    .B int \fIbuffersize\fR);
33    .PP
34    .br
35    .B int pcre_get_substring(const char *\fIsubject\fR, int *\fIovector\fR,
36    .ti +5n
37    .B int \fIstringcount\fR, int \fIstringnumber\fR,
38    .ti +5n
39    .B const char **\fIstringptr\fR);
40    .PP
41    .br
42    .B int pcre_get_substring_list(const char *\fIsubject\fR,
43    .ti +5n
44    .B int *\fIovector\fR, int \fIstringcount\fR, "const char ***\fIlistptr\fR);"
45    .PP
46    .br
47    .B const unsigned char *pcre_maketables(void);
48    .PP
49    .br
50  .B int pcre_info(const pcre *\fIcode\fR, int *\fIoptptr\fR, int  .B int pcre_info(const pcre *\fIcode\fR, int *\fIoptptr\fR, int
51  .B *\fIfirstcharptr\fR);  .B *\fIfirstcharptr\fR);
52  .PP  .PP
# Line 51  PCRE has its own native API, which is de Line 70  PCRE has its own native API, which is de
70  a set of wrapper functions that correspond to the POSIX API. See  a set of wrapper functions that correspond to the POSIX API. See
71  \fBpcreposix (3)\fR.  \fBpcreposix (3)\fR.
72    
73  The three functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and  The functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and \fBpcre_exec()\fR
74  \fBpcre_exec()\fR are used for compiling and matching regular expressions. The  are used for compiling and matching regular expressions, while
75  function \fBpcre_maketables()\fR is used (optionally) to build a set of  \fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
76  character tables in the current locale for passing to \fBpcre_compile()\fR.  \fBpcre_get_substring_list()\fR are convenience functions for extracting
77    captured substrings from a matched subject string. The function
78    \fBpcre_maketables()\fR is used (optionally) to build a set of character tables
79    in the current locale for passing to \fBpcre_compile()\fR.
80    
81  The function \fBpcre_info()\fR is used to find out information about a compiled  The function \fBpcre_info()\fR is used to find out information about a compiled
82  pattern, while the function \fBpcre_version()\fR returns a pointer to a string  pattern, while the function \fBpcre_version()\fR returns a pointer to a string
# Line 233  in different locales. It is the caller's Line 255  in different locales. It is the caller's
255  memory containing the tables remains available for as long as it is needed.  memory containing the tables remains available for as long as it is needed.
256    
257    
258    .SH INFORMATION ABOUT A PATTERN
259    The \fBpcre_info()\fR function returns information about a compiled pattern.
260    Its yield is the number of capturing subpatterns, or one of the following
261    negative numbers:
262    
263      PCRE_ERROR_NULL       the argument \fIcode\fR was NULL
264      PCRE_ERROR_BADMAGIC   the "magic number" was not found
265    
266    If the \fIoptptr\fR argument is not NULL, a copy of the options with which the
267    pattern was compiled is placed in the integer it points to.
268    
269    If the \fIfirstcharptr\fR argument is not NULL, is is used to pass back
270    information about the first character of any matched string. If there is a
271    fixed first character, e.g. from a pattern such as (cat|cow|coyote), then it is
272    returned in the integer pointed to by \fIfirstcharptr\fR. Otherwise, if the
273    pattern was compiled with the PCRE_MULTILINE option, and every branch started
274    with "^", then -1 is returned, indicating that the pattern will match at the
275    start of a subject string or after any "\\n" within the string. Otherwise -2 is
276    returned.
277    
278    
279  .SH MATCHING A PATTERN  .SH MATCHING A PATTERN
280  The function \fBpcre_exec()\fR is called to match a subject string against a  The function \fBpcre_exec()\fR is called to match a subject string against a
281  pre-compiled pattern, which is passed in the \fIcode\fR argument. If the  pre-compiled pattern, which is passed in the \fIcode\fR argument. If the
# Line 290  is the number of pairs that have been se Line 333  is the number of pairs that have been se
333  subpatterns, the return value from a successful match is 1, indicating that  subpatterns, the return value from a successful match is 1, indicating that
334  just the first pair of offsets has been set.  just the first pair of offsets has been set.
335    
336    Some convenience functions are provided for extracting the captured substrings
337    as separate strings. These are described in the following section.
338    
339  It is possible for an capturing subpattern number \fIn+1\fR to match some  It is possible for an capturing subpattern number \fIn+1\fR to match some
340  part of the subject when subpattern \fIn\fR has not been used at all. For  part of the subject when subpattern \fIn\fR has not been used at all. For
341  example, if the string "abc" is matched against the pattern (a|(z))(bc)  example, if the string "abc" is matched against the pattern (a|(z))(bc)
# Line 350  call via \fBpcre_malloc()\fR fails, this Line 396  call via \fBpcre_malloc()\fR fails, this
396  the end of matching.  the end of matching.
397    
398    
399  .SH INFORMATION ABOUT A PATTERN  .SH EXTRACTING CAPTURED SUBSTRINGS
400  The \fBpcre_info()\fR function returns information about a compiled pattern.  Captured substrings can be accessed directly by using the offsets returned by
401  Its yield is the number of capturing subpatterns, or one of the following  \fBpcre_exec()\fR in \fIovector\fR. For convenience, the functions
402  negative numbers:  \fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
403    \fBpcre_get_substring_list()\fR are provided for extracting captured substrings
404    as new, separate, zero-terminated strings. A substring that contains a binary
405    zero is correctly extracted and has a further zero added on the end, but the
406    result does not, of course, function as a C string.
407    
408    The first three arguments are the same for all three functions: \fIsubject\fR
409    is the subject string which has just been successfully matched, \fIovector\fR
410    is a pointer to the vector of integer offsets that was passed to
411    \fBpcre_exec()\fR, and \fIstringcount\fR is the number of substrings that
412    were captured by the match, including the substring that matched the entire
413    regular expression. This is the value returned by \fBpcre_exec\fR if it
414    is greater than zero. If \fBpcre_exec()\fR returned zero, indicating that it
415    ran out of space in \fIovector\fR, then the value passed as
416    \fIstringcount\fR should be the size of the vector divided by three.
417    
418    The functions \fBpcre_copy_substring()\fR and \fBpcre_get_substring()\fR
419    extract a single substring, whose number is given as \fIstringnumber\fR. A
420    value of zero extracts the substring that matched the entire pattern, while
421    higher values extract the captured substrings. For \fBpcre_copy_substring()\fR,
422    the string is placed in \fIbuffer\fR, whose length is given by
423    \fIbuffersize\fR, while for \fBpcre_get_substring()\fR a new block of store is
424    obtained via \fBpcre_malloc\fR, and its address is returned via
425    \fIstringptr\fR. The yield of the function is the length of the string, not
426    including the terminating zero, or one of
427    
428    PCRE_ERROR_NULL       the argument \fIcode\fR was NULL    PCRE_ERROR_NOMEMORY       (-6)
   PCRE_ERROR_BADMAGIC   the "magic number" was not found  
429    
430  If the \fIoptptr\fR argument is not NULL, a copy of the options with which the  The buffer was too small for \fBpcre_copy_substring()\fR, or the attempt to get
431  pattern was compiled is placed in the integer it points to.  memory failed for \fBpcre_get_substring()\fR.
432    
433      PCRE_ERROR_NOSUBSTRING    (-7)
434    
435    There is no substring whose number is \fIstringnumber\fR.
436    
437    The \fBpcre_get_substring_list()\fR function extracts all available substrings
438    and builds a list of pointers to them. All this is done in a single block of
439    memory which is obtained via \fBpcre_malloc\fR. The address of the memory block
440    is returned via \fIlistptr\fR, which is also the start of the list of string
441    pointers. The end of the list is marked by a NULL pointer. The yield of the
442    function is zero if all went well, or
443    
444      PCRE_ERROR_NOMEMORY       (-6)
445    
446    if the attempt to get the memory block failed.
447    
448    When any of these functions encounter a substring that is unset, which can
449    happen when capturing subpattern number \fIn+1\fR matches some part of the
450    subject, but subpattern \fIn\fR has not been used at all, they return an empty
451    string. This can be distinguished from a genuine zero-length substring by
452    inspecting the appropriate offset in \fIovector\fR, which is negative for unset
453    substrings.
454    
 If the \fIfirstcharptr\fR argument is not NULL, is is used to pass back  
 information about the first character of any matched string. If there is a  
 fixed first character, e.g. from a pattern such as (cat|cow|coyote), then it is  
 returned in the integer pointed to by \fIfirstcharptr\fR. Otherwise, if the  
 pattern was compiled with the PCRE_MULTILINE option, and every branch started  
 with "^", then -1 is returned, indicating that the pattern will match at the  
 start of a subject string or after any "\\n" within the string. Otherwise -2 is  
 returned.  
455    
456    
457  .SH LIMITATIONS  .SH LIMITATIONS
# Line 723  The minus (hyphen) character can be used Line 806  The minus (hyphen) character can be used
806  character class. For example, [d-m] matches any letter between d and m,  character class. For example, [d-m] matches any letter between d and m,
807  inclusive. If a minus character is required in a class, it must be escaped with  inclusive. If a minus character is required in a class, it must be escaped with
808  a backslash or appear in a position where it cannot be interpreted as  a backslash or appear in a position where it cannot be interpreted as
809  indicating a range, typically as the first or last character in the class. It  indicating a range, typically as the first or last character in the class.
810  is not possible to have the character "]" as the end character of a range,  
811  since a sequence such as [w-] is interpreted as a class of two characters. The  It is not possible to have the literal character "]" as the end character of a
812  octal or hexadecimal representation of "]" can, however, be used to end a  range. A pattern such as [W-]46] is interpreted as a class of two characters
813  range.  ("W" and "-") followed by a literal string "46]", so it would match "W46]" or
814    "-46]". However, if the "]" is escaped with a backslash it is interpreted as
815    the end of range, so [W-\\]46] is interpreted as a single class containing a
816    range followed by two separate characters. The octal or hexadecimal
817    representation of "]" can also be used to end a range.
818    
819  Ranges operate in ASCII collating sequence. They can also be used for  Ranges operate in ASCII collating sequence. They can also be used for
820  characters specified numerically, for example [\\000-\\037]. If a range that  characters specified numerically, for example [\\000-\\037]. If a range that
# Line 1156  of characters that an identical standalo Line 1243  of characters that an identical standalo
1243  the current point in the subject string.  the current point in the subject string.
1244    
1245  Once-only subpatterns are not capturing subpatterns. Simple cases such as the  Once-only subpatterns are not capturing subpatterns. Simple cases such as the
1246  above example can be though of as a maximizing repeat that must swallow  above example can be thought of as a maximizing repeat that must swallow
1247  everything it can. So, while both \\d+ and \\d+? are prepared to adjust the  everything it can. So, while both \\d+ and \\d+? are prepared to adjust the
1248  number of digits they match in order to make the rest of the pattern match,  number of digits they match in order to make the rest of the pattern match,
1249  (?>\\d+) can only match an entire sequence of digits.  (?>\\d+) can only match an entire sequence of digits.

Legend:
Removed from v.28  
changed lines
  Added in v.29

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12