/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 90 by nigel, Sat Feb 24 21:41:21 2007 UTC revision 91 by nigel, Sat Feb 24 21:41:34 2007 UTC
# Line 75  PCRE - Perl-compatible regular expressio Line 75  PCRE - Perl-compatible regular expressio
75  .B const char *\fIname\fP);  .B const char *\fIname\fP);
76  .PP  .PP
77  .br  .br
78    .B int pcre_get_stringtable_entries(const pcre *\fIcode\fP,
79    .ti +5n
80    .B const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);
81    .PP
82    .br
83  .B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP,  .B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP,
84  .ti +5n  .ti +5n
85  .B int \fIstringcount\fP, int \fIstringnumber\fP,  .B int \fIstringcount\fP, int \fIstringnumber\fP,
# Line 164  documentation describes how to run it. Line 169  documentation describes how to run it.
169  .P  .P
170  A second matching function, \fBpcre_dfa_exec()\fP, which is not  A second matching function, \fBpcre_dfa_exec()\fP, which is not
171  Perl-compatible, is also provided. This uses a different algorithm for the  Perl-compatible, is also provided. This uses a different algorithm for the
172  matching. This allows it to find all possible matches (at a given point in the  matching. The alternative algorithm finds all possible matches (at a given
173  subject), not just one. However, this algorithm does not return captured  point in the subject). However, this algorithm does not return captured
174  substrings. A description of the two matching algorithms and their advantages  substrings. A description of the two matching algorithms and their advantages
175  and disadvantages is given in the  and disadvantages is given in the
176  .\" HREF  .\" HREF
# Line 183  matched by \fBpcre_exec()\fP. They are: Line 188  matched by \fBpcre_exec()\fP. They are:
188    \fBpcre_get_named_substring()\fP    \fBpcre_get_named_substring()\fP
189    \fBpcre_get_substring_list()\fP    \fBpcre_get_substring_list()\fP
190    \fBpcre_get_stringnumber()\fP    \fBpcre_get_stringnumber()\fP
191      \fBpcre_get_stringtable_entries()\fP
192  .sp  .sp
193  \fBpcre_free_substring()\fP and \fBpcre_free_substring_list()\fP are also  \fBpcre_free_substring()\fP and \fBpcre_free_substring_list()\fP are also
194  provided, to free the memory used for extracted strings.  provided, to free the memory used for extracted strings.
# Line 212  should be done before calling any PCRE f Line 218  should be done before calling any PCRE f
218  The global variables \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are also  The global variables \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are also
219  indirections to memory management functions. These special functions are used  indirections to memory management functions. These special functions are used
220  only when PCRE is compiled to use the heap for remembering data, instead of  only when PCRE is compiled to use the heap for remembering data, instead of
221  recursive function calls, when running the \fBpcre_exec()\fP function. This is  recursive function calls, when running the \fBpcre_exec()\fP function. See the
222  a non-standard way of building PCRE, for use in environments that have limited  .\" HREF
223  stacks. Because of the greater use of memory management, it runs more slowly.  \fBpcrebuild\fP
224  Separate functions are provided so that special-purpose external code can be  .\"
225  used for this case. When used, these functions are always called in a  documentation for details of how to do this. It is a non-standard way of
226  stack-like manner (last obtained, first freed), and always for memory blocks of  building PCRE, for use in environments that have limited stacks. Because of the
227  the same size.  greater use of memory management, it runs more slowly. Separate functions are
228    provided so that special-purpose external code can be used for this case. When
229    used, these functions are always called in a stack-like manner (last obtained,
230    first freed), and always for memory blocks of the same size. There is a
231    discussion about PCRE's stack usage in the
232    .\" HREF
233    \fBpcrestack\fP
234    .\"
235    documentation.
236  .P  .P
237  The global variable \fBpcre_callout\fP initially contains NULL. It can be set  The global variable \fBpcre_callout\fP initially contains NULL. It can be set
238  by the caller to a "callout" function, which PCRE will then call at specified  by the caller to a "callout" function, which PCRE will then call at specified
# Line 229  points during a matching operation. Deta Line 243  points during a matching operation. Deta
243  documentation.  documentation.
244  .  .
245  .  .
246    .SH NEWLINES
247    PCRE supports three different conventions for indicating line breaks in
248    strings: a single CR character, a single LF character, or the two-character
249    sequence CRLF. All three are used as "standard" by different operating systems.
250    When PCRE is built, a default can be specified. The default default is LF,
251    which is the Unix standard. When PCRE is run, the default can be overridden,
252    either when a pattern is compiled, or when it is matched.
253    .sp
254    In the PCRE documentation the word "newline" is used to mean "the character or
255    pair of characters that indicate a line break".
256    .
257    .
258  .SH MULTITHREADING  .SH MULTITHREADING
259  .rs  .rs
260  .sp  .sp
# Line 281  properties is available; otherwise it is Line 307  properties is available; otherwise it is
307  .sp  .sp
308    PCRE_CONFIG_NEWLINE    PCRE_CONFIG_NEWLINE
309  .sp  .sp
310  The output is an integer that is set to the value of the code that is used for  The output is an integer whose value specifies the default character sequence
311  the newline character. It is either linefeed (10) or carriage return (13), and  that is recognized as meaning "newline". The three values that are supported
312  should normally be the standard character for your operating system.  are: 10 for LF, 13 for CR, and 3338 for CRLF. The default should normally be
313    the standard sequence for your operating system.
314  .sp  .sp
315    PCRE_CONFIG_LINK_SIZE    PCRE_CONFIG_LINK_SIZE
316  .sp  .sp
# Line 353  The pattern is a C string terminated by Line 380  The pattern is a C string terminated by
380  via \fBpcre_malloc\fP is returned. This contains the compiled code and related  via \fBpcre_malloc\fP is returned. This contains the compiled code and related
381  data. The \fBpcre\fP type is defined for the returned block; this is a typedef  data. The \fBpcre\fP type is defined for the returned block; this is a typedef
382  for a structure whose contents are not externally defined. It is up to the  for a structure whose contents are not externally defined. It is up to the
383  caller to free the memory when it is no longer required.  caller to free the memory (via \fBpcre_free\fP) when it is no longer required.
384  .P  .P
385  Although the compiled code of a PCRE regex is relocatable, that is, it does not  Although the compiled code of a PCRE regex is relocatable, that is, it does not
386  depend on memory location, the complete \fBpcre\fP data block is not  depend on memory location, the complete \fBpcre\fP data block is not
# Line 370  the detailed description in the Line 397  the detailed description in the
397  .\"  .\"
398  documentation). For these options, the contents of the \fIoptions\fP argument  documentation). For these options, the contents of the \fIoptions\fP argument
399  specifies their initial settings at the start of compilation and execution. The  specifies their initial settings at the start of compilation and execution. The
400  PCRE_ANCHORED option can be set at the time of matching as well as at compile  PCRE_ANCHORED and PCRE_NEWLINE_\fIxxx\fP options can be set at the time of
401  time.  matching as well as at compile time.
402  .P  .P
403  If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.  If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
404  Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns  Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
# Line 442  with UTF-8 support. Line 469  with UTF-8 support.
469  .sp  .sp
470  If this bit is set, a dollar metacharacter in the pattern matches only at the  If this bit is set, a dollar metacharacter in the pattern matches only at the
471  end of the subject string. Without this option, a dollar also matches  end of the subject string. Without this option, a dollar also matches
472  immediately before the final character if it is a newline (but not before any  immediately before a newline at the end of the string (but not before any other
473  other newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is  newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.
474  set. There is no equivalent to this option in Perl, and no way to set it within  There is no equivalent to this option in Perl, and no way to set it within a
475  a pattern.  pattern.
476  .sp  .sp
477    PCRE_DOTALL    PCRE_DOTALL
478  .sp  .sp
479  If this bit is set, a dot metacharater in the pattern matches all characters,  If this bit is set, a dot metacharater in the pattern matches all characters,
480  including newlines. Without it, newlines are excluded. This option is  including those that indicate newline. Without it, a dot does not match when
481  equivalent to Perl's /s option, and it can be changed within a pattern by a  the current position is at a newline. This option is equivalent to Perl's /s
482  (?s) option setting. A negative class such as [^a] always matches a newline  option, and it can be changed within a pattern by a (?s) option setting. A
483  character, independent of the setting of this option.  negative class such as [^a] always matches newlines, independent of the setting
484    of this option.
485    .sp
486      PCRE_DUPNAMES
487    .sp
488    If this bit is set, names used to identify capturing subpatterns need not be
489    unique. This can be helpful for certain types of pattern when it is known that
490    only one instance of the named subpattern can ever be matched. There are more
491    details of named subpatterns below; see also the
492    .\" HREF
493    \fBpcrepattern\fP
494    .\"
495    documentation.
496  .sp  .sp
497    PCRE_EXTENDED    PCRE_EXTENDED
498  .sp  .sp
499  If this bit is set, whitespace data characters in the pattern are totally  If this bit is set, whitespace data characters in the pattern are totally
500  ignored except when escaped or inside a character class. Whitespace does not  ignored except when escaped or inside a character class. Whitespace does not
501  include the VT character (code 11). In addition, characters between an  include the VT character (code 11). In addition, characters between an
502  unescaped # outside a character class and the next newline character,  unescaped # outside a character class and the next newline, inclusive, are also
503  inclusive, are also ignored. This is equivalent to Perl's /x option, and it can  ignored. This is equivalent to Perl's /x option, and it can be changed within a
504  be changed within a pattern by a (?x) option setting.  pattern by a (?x) option setting.
505  .P  .P
506  This option makes it possible to include comments inside complicated patterns.  This option makes it possible to include comments inside complicated patterns.
507  Note, however, that this applies only to data characters. Whitespace characters  Note, however, that this applies only to data characters. Whitespace characters
# Line 476  that is incompatible with Perl, but it i Line 515  that is incompatible with Perl, but it i
515  set, any backslash in a pattern that is followed by a letter that has no  set, any backslash in a pattern that is followed by a letter that has no
516  special meaning causes an error, thus reserving these combinations for future  special meaning causes an error, thus reserving these combinations for future
517  expansion. By default, as in Perl, a backslash followed by a letter with no  expansion. By default, as in Perl, a backslash followed by a letter with no
518  special meaning is treated as a literal. There are at present no other features  special meaning is treated as a literal. (Perl can, however, be persuaded to
519  controlled by this option. It can also be set by a (?X) option setting within a  give a warning for this.) There are at present no other features controlled by
520  pattern.  this option. It can also be set by a (?X) option setting within a pattern.
521  .sp  .sp
522    PCRE_FIRSTLINE    PCRE_FIRSTLINE
523  .sp  .sp
524  If this option is set, an unanchored pattern is required to match before or at  If this option is set, an unanchored pattern is required to match before or at
525  the first newline character in the subject string, though the matched text may  the first newline in the subject string, though the matched text may continue
526  continue over the newline.  over the newline.
527  .sp  .sp
528    PCRE_MULTILINE    PCRE_MULTILINE
529  .sp  .sp
# Line 496  terminating newline (unless PCRE_DOLLAR_ Line 535  terminating newline (unless PCRE_DOLLAR_
535  Perl.  Perl.
536  .P  .P
537  When PCRE_MULTILINE it is set, the "start of line" and "end of line" constructs  When PCRE_MULTILINE it is set, the "start of line" and "end of line" constructs
538  match immediately following or immediately before any newline in the subject  match immediately following or immediately before internal newlines in the
539  string, respectively, as well as at the very start and end. This is equivalent  subject string, respectively, as well as at the very start and end. This is
540  to Perl's /m option, and it can be changed within a pattern by a (?m) option  equivalent to Perl's /m option, and it can be changed within a pattern by a
541  setting. If there are no "\en" characters in a subject string, or no  (?m) option setting. If there are no newlines in a subject string, or no
542  occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect.  occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect.
543  .sp  .sp
544      PCRE_NEWLINE_CR
545      PCRE_NEWLINE_LF
546      PCRE_NEWLINE_CRLF
547    .sp
548    These options override the default newline definition that was chosen when PCRE
549    was built. Setting the first or the second specifies that a newline is
550    indicated by a single character (CR or LF, respectively). Setting both of them
551    specifies that a newline is indicated by the two-character CRLF sequence. For
552    convenience, PCRE_NEWLINE_CRLF is defined to contain both bits. The only time
553    that a line break is relevant when compiling a pattern is if PCRE_EXTENDED is
554    set, and an unescaped # outside a character class is encountered. This
555    indicates a comment that lasts until after the next newline.
556    .P
557    The newline option set at compile time becomes the default that is used for
558    \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, but it can be overridden.
559    .sp
560    PCRE_NO_AUTO_CAPTURE    PCRE_NO_AUTO_CAPTURE
561  .sp  .sp
562  If this option is set, it disables the use of numbered capturing parentheses in  If this option is set, it disables the use of numbered capturing parentheses in
# Line 579  both compiling functions. Line 634  both compiling functions.
634    23  internal error: code overflow    23  internal error: code overflow
635    24  unrecognized character after (?<    24  unrecognized character after (?<
636    25  lookbehind assertion is not fixed length    25  lookbehind assertion is not fixed length
637    26  malformed number after (?(    26  malformed number or name after (?(
638    27  conditional group contains more than two branches    27  conditional group contains more than two branches
639    28  assertion expected after (?(    28  assertion expected after (?(
640    29  (?R or (?digits must be followed by )    29  (?R or (?digits must be followed by )
# Line 596  both compiling functions. Line 651  both compiling functions.
651    40  recursive call could loop indefinitely    40  recursive call could loop indefinitely
652    41  unrecognized character after (?P    41  unrecognized character after (?P
653    42  syntax error after (?P    42  syntax error after (?P
654    43  two named groups have the same name    43  two named subpatterns have the same name
655    44  invalid UTF-8 string    44  invalid UTF-8 string
656    45  support for \eP, \ep, and \eX has not been compiled    45  support for \eP, \ep, and \eX has not been compiled
657    46  malformed \eP or \ep sequence    46  malformed \eP or \ep sequence
658    47  unknown property name after \eP or \ep    47  unknown property name after \eP or \ep
659      48  subpattern name is too long (maximum 32 characters)
660      49  too many named subpatterns (maximum 10,000)
661      50  repeated subpattern is too long
662      51  octal value is greater than \e377 (not in UTF-8 mode)
663  .  .
664  .  .
665  .SH "STUDYING A PATTERN"  .SH "STUDYING A PATTERN"
# Line 731  check against passing an arbitrary memor Line 790  check against passing an arbitrary memor
790  \fBpcre_fullinfo()\fP, to obtain the length of the compiled pattern:  \fBpcre_fullinfo()\fP, to obtain the length of the compiled pattern:
791  .sp  .sp
792    int rc;    int rc;
793    unsigned long int length;    size_t length;
794    rc = pcre_fullinfo(    rc = pcre_fullinfo(
795      re,               /* result of pcre_compile() */      re,               /* result of pcre_compile() */
796      pe,               /* result of pcre_study(), or NULL */      pe,               /* result of pcre_study(), or NULL */
# Line 763  a NULL table pointer. Line 822  a NULL table pointer.
822    PCRE_INFO_FIRSTBYTE    PCRE_INFO_FIRSTBYTE
823  .sp  .sp
824  Return information about the first byte of any matched string, for a  Return information about the first byte of any matched string, for a
825  non-anchored pattern. (This option used to be called PCRE_INFO_FIRSTCHAR; the  non-anchored pattern. The fourth argument should point to an \fBint\fP
826  old name is still recognized for backwards compatibility.)  variable. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name is
827    still recognized for backwards compatibility.)
828  .P  .P
829  If there is a fixed first byte, for example, from a pattern such as  If there is a fixed first byte, for example, from a pattern such as
830  (cat|cow|coyote), it is returned in the integer pointed to by \fIwhere\fP.  (cat|cow|coyote). Otherwise, if either
 Otherwise, if either  
831  .sp  .sp
832  (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch  (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
833  starts with "^", or  starts with "^", or
# Line 803  is -1. Line 862  is -1.
862  .sp  .sp
863  PCRE supports the use of named as well as numbered capturing parentheses. The  PCRE supports the use of named as well as numbered capturing parentheses. The
864  names are just an additional way of identifying the parentheses, which still  names are just an additional way of identifying the parentheses, which still
865  acquire numbers. A convenience function called \fBpcre_get_named_substring()\fP  acquire numbers. Several convenience functions such as
866  is provided for extracting an individual captured substring by name. It is also  \fBpcre_get_named_substring()\fP are provided for extracting captured
867  possible to extract the data directly, by first converting the name to a number  substrings by name. It is also possible to extract the data directly, by first
868  in order to access the correct pointers in the output vector (described with  converting the name to a number in order to access the correct pointers in the
869  \fBpcre_exec()\fP below). To do the conversion, you need to use the  output vector (described with \fBpcre_exec()\fP below). To do the conversion,
870  name-to-number map, which is described by these three values.  you need to use the name-to-number map, which is described by these three
871    values.
872  .P  .P
873  The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives  The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives
874  the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each  the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each
# Line 817  length of the longest name. PCRE_INFO_NA Line 877  length of the longest name. PCRE_INFO_NA
877  entry of the table (a pointer to \fBchar\fP). The first two bytes of each entry  entry of the table (a pointer to \fBchar\fP). The first two bytes of each entry
878  are the number of the capturing parenthesis, most significant byte first. The  are the number of the capturing parenthesis, most significant byte first. The
879  rest of the entry is the corresponding name, zero terminated. The names are in  rest of the entry is the corresponding name, zero terminated. The names are in
880  alphabetical order. For example, consider the following pattern (assume  alphabetical order. When PCRE_DUPNAMES is set, duplicate names are in order of
881    their parentheses numbers. For example, consider the following pattern (assume
882  PCRE_EXTENDED is set, so white space - including newlines - is ignored):  PCRE_EXTENDED is set, so white space - including newlines - is ignored):
883  .sp  .sp
884  .\" JOIN  .\" JOIN
# Line 834  bytes shows in hexadecimal, and undefine Line 895  bytes shows in hexadecimal, and undefine
895    00 02 y  e  a  r  00 ??    00 02 y  e  a  r  00 ??
896  .sp  .sp
897  When writing code to extract data from named subpatterns using the  When writing code to extract data from named subpatterns using the
898  name-to-number map, remember that the length of each entry is likely to be  name-to-number map, remember that the length of the entries is likely to be
899  different for each compiled pattern.  different for each compiled pattern.
900  .sp  .sp
901    PCRE_INFO_OPTIONS    PCRE_INFO_OPTIONS
# Line 1057  documentation for a discussion of saving Line 1118  documentation for a discussion of saving
1118  .rs  .rs
1119  .sp  .sp
1120  The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be  The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be
1121  zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NOTBOL,  zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP,
1122  PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.
1123  .sp  .sp
1124    PCRE_ANCHORED    PCRE_ANCHORED
1125  .sp  .sp
# Line 1067  matching position. If a pattern was comp Line 1128  matching position. If a pattern was comp
1128  to be anchored by virtue of its contents, it cannot be made unachored at  to be anchored by virtue of its contents, it cannot be made unachored at
1129  matching time.  matching time.
1130  .sp  .sp
1131      PCRE_NEWLINE_CR
1132      PCRE_NEWLINE_LF
1133      PCRE_NEWLINE_CRLF
1134    .sp
1135    These options override the newline definition that was chosen or defaulted when
1136    the pattern was compiled. For details, see the description \fBpcre_compile()\fP
1137    above. During matching, the newline choice affects the behaviour of the dot,
1138    circumflex, and dollar metacharacters.
1139    .sp
1140    PCRE_NOTBOL    PCRE_NOTBOL
1141  .sp  .sp
1142  This option specifies that first character of the subject string is not the  This option specifies that first character of the subject string is not the
# Line 1198  is set to the offset of the first charac Line 1268  is set to the offset of the first charac
1268  first pair, \fIovector[0]\fP and \fIovector[1]\fP, identify the portion of the  first pair, \fIovector[0]\fP and \fIovector[1]\fP, identify the portion of the
1269  subject string matched by the entire pattern. The next pair is used for the  subject string matched by the entire pattern. The next pair is used for the
1270  first capturing subpattern, and so on. The value returned by \fBpcre_exec()\fP  first capturing subpattern, and so on. The value returned by \fBpcre_exec()\fP
1271  is the number of pairs that have been set. If there are no capturing  is one more than the highest numbered pair that has been set. For example, if
1272  subpatterns, the return value from a successful match is 1, indicating that  two substrings have been captured, the returned value is 3. If there are no
1273  just the first pair of offsets has been set.  capturing subpatterns, the return value from a successful match is 1,
1274  .P  indicating that just the first pair of offsets has been set.
 Some convenience functions are provided for extracting the captured substrings  
 as separate strings. These are described in the following section.  
 .P  
 It is possible for an capturing subpattern number \fIn+1\fP to match some  
 part of the subject when subpattern \fIn\fP has not been used at all. For  
 example, if the string "abc" is matched against the pattern (a|(z))(bc)  
 subpatterns 1 and 3 are matched, but 2 is not. When this happens, both offset  
 values corresponding to the unused subpattern are set to -1.  
1275  .P  .P
1276  If a capturing subpattern is matched repeatedly, it is the last portion of the  If a capturing subpattern is matched repeatedly, it is the last portion of the
1277  string that it matched that is returned.  string that it matched that is returned.
# Line 1223  the \fIovector\fP is not big enough to r Line 1285  the \fIovector\fP is not big enough to r
1285  has to get additional memory for use during matching. Thus it is usually  has to get additional memory for use during matching. Thus it is usually
1286  advisable to supply an \fIovector\fP.  advisable to supply an \fIovector\fP.
1287  .P  .P
1288  Note that \fBpcre_info()\fP can be used to find out how many capturing  The \fBpcre_info()\fP function can be used to find out how many capturing
1289  subpatterns there are in a compiled pattern. The smallest size for  subpatterns there are in a compiled pattern. The smallest size for
1290  \fIovector\fP that will allow for \fIn\fP captured substrings, in addition to  \fIovector\fP that will allow for \fIn\fP captured substrings, in addition to
1291  the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3.  the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3.
1292    .P
1293    It is possible for capturing subpattern number \fIn+1\fP to match some part of
1294    the subject when subpattern \fIn\fP has not been used at all. For example, if
1295    the string "abc" is matched against the pattern (a|(z))(bc) the return from the
1296    function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this
1297    happens, both values in the offset pairs corresponding to unused subpatterns
1298    are set to -1.
1299    .P
1300    Offset values that correspond to unused subpatterns at the end of the
1301    expression are also set to -1. For example, if the string "abc" is matched
1302    against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched. The
1303    return from the function is 2, because the highest used capturing subpattern
1304    number is 1. However, you can refer to the offsets for the second and third
1305    capturing subpatterns if you wish (assuming the vector is large enough, of
1306    course).
1307    .P
1308    Some convenience functions are provided for extracting the captured substrings
1309    as separate strings. These are described below.
1310  .  .
1311  .\" HTML <a name="errorlist"></a>  .\" HTML <a name="errorlist"></a>
1312  .SS "Return values from \fBpcre_exec()\fP"  .SS "Error return values from \fBpcre_exec()\fP"
1313  .rs  .rs
1314  .sp  .sp
1315  If \fBpcre_exec()\fP fails, it returns a negative number. The following are  If \fBpcre_exec()\fP fails, it returns a negative number. The following are
# Line 1360  Captured substrings can be accessed dire Line 1440  Captured substrings can be accessed dire
1440  \fBpcre_get_substring_list()\fP are provided for extracting captured substrings  \fBpcre_get_substring_list()\fP are provided for extracting captured substrings
1441  as new, separate, zero-terminated strings. These functions identify substrings  as new, separate, zero-terminated strings. These functions identify substrings
1442  by number. The next section describes functions for extracting named  by number. The next section describes functions for extracting named
1443  substrings. A substring that contains a binary zero is correctly extracted and  substrings.
1444  has a further zero added on the end, but the result is not, of course,  .P
1445  a C string.  A substring that contains a binary zero is correctly extracted and has a
1446    further zero added on the end, but the result is not, of course, a C string.
1447    However, you can process such a string by referring to the length that is
1448    returned by \fBpcre_copy_substring()\fP and \fBpcre_get_substring()\fP.
1449    Unfortunately, the interface to \fBpcre_get_substring_list()\fP is not adequate
1450    for handling strings containing binary zeros, because the end of the final
1451    string is not independently indicated.
1452  .P  .P
1453  The first three arguments are the same for all three of these functions:  The first three arguments are the same for all three of these functions:
1454  \fIsubject\fP is the subject string that has just been successfully matched,  \fIsubject\fP is the subject string that has just been successfully matched,
# Line 1417  a previous call of \fBpcre_get_substring Line 1503  a previous call of \fBpcre_get_substring
1503  \fBpcre_get_substring_list()\fP, respectively. They do nothing more than call  \fBpcre_get_substring_list()\fP, respectively. They do nothing more than call
1504  the function pointed to by \fBpcre_free\fP, which of course could be called  the function pointed to by \fBpcre_free\fP, which of course could be called
1505  directly from a C program. However, PCRE is used in some situations where it is  directly from a C program. However, PCRE is used in some situations where it is
1506  linked via a special interface to another programming language which cannot use  linked via a special interface to another programming language that cannot use
1507  \fBpcre_free\fP directly; it is for these cases that the functions are  \fBpcre_free\fP directly; it is for these cases that the functions are
1508  provided.  provided.
1509  .  .
# Line 1452  For example, for this pattern Line 1538  For example, for this pattern
1538  .sp  .sp
1539    (a+)b(?P<xxx>\ed+)...    (a+)b(?P<xxx>\ed+)...
1540  .sp  .sp
1541  the number of the subpattern called "xxx" is 2. You can find the number from  the number of the subpattern called "xxx" is 2. If the name is known to be
1542  the name by calling \fBpcre_get_stringnumber()\fP. The first argument is the  unique (PCRE_DUPNAMES was not set), you can find the number from the name by
1543  compiled pattern, and the second is the name. The yield of the function is the  calling \fBpcre_get_stringnumber()\fP. The first argument is the compiled
1544    pattern, and the second is the name. The yield of the function is the
1545  subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no subpattern of  subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no subpattern of
1546  that name.  that name.
1547  .P  .P
# Line 1462  Given the number, you can extract the su Line 1549  Given the number, you can extract the su
1549  functions described in the previous section. For convenience, there are also  functions described in the previous section. For convenience, there are also
1550  two functions that do the whole job.  two functions that do the whole job.
1551  .P  .P
1552  Most of the arguments of \fIpcre_copy_named_substring()\fP and  Most of the arguments of \fBpcre_copy_named_substring()\fP and
1553  \fIpcre_get_named_substring()\fP are the same as those for the similarly named  \fBpcre_get_named_substring()\fP are the same as those for the similarly named
1554  functions that extract by number. As these are described in the previous  functions that extract by number. As these are described in the previous
1555  section, they are not re-described here. There are just two differences:  section, they are not re-described here. There are just two differences:
1556  .P  .P
# Line 1477  then call \fIpcre_copy_substring()\fP or Line 1564  then call \fIpcre_copy_substring()\fP or
1564  appropriate.  appropriate.
1565  .  .
1566  .  .
1567    .SH "DUPLICATE SUBPATTERN NAMES"
1568    .rs
1569    .sp
1570    .B int pcre_get_stringtable_entries(const pcre *\fIcode\fP,
1571    .ti +5n
1572    .B const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);
1573    .PP
1574    When a pattern is compiled with the PCRE_DUPNAMES option, names for subpatterns
1575    are not required to be unique. Normally, patterns with duplicate names are such
1576    that in any one match, only one of the named subpatterns participates. An
1577    example is shown in the
1578    .\" HREF
1579    \fBpcrepattern\fP
1580    .\"
1581    documentation. When duplicates are present, \fBpcre_copy_named_substring()\fP
1582    and \fBpcre_get_named_substring()\fP return the first substring corresponding
1583    to the given name that is set. If none are set, an empty string is returned.
1584    The \fBpcre_get_stringnumber()\fP function returns one of the numbers that are
1585    associated with the name, but it is not defined which it is.
1586    .sp
1587    If you want to get full details of all captured substrings for a given name,
1588    you must use the \fBpcre_get_stringtable_entries()\fP function. The first
1589    argument is the compiled pattern, and the second is the name. The third and
1590    fourth are pointers to variables which are updated by the function. After it
1591    has run, they point to the first and last entries in the name-to-number table
1592    for the given name. The function itself returns the length of each entry, or
1593    PCRE_ERROR_NOSUBSTRING if there are none. The format of the table is described
1594    above in the section entitled \fIInformation about a pattern\fP. Given all the
1595    relevant entries for the name, you can extract each of their numbers, and hence
1596    the captured data, if any.
1597    .
1598    .
1599  .SH "FINDING ALL POSSIBLE MATCHES"  .SH "FINDING ALL POSSIBLE MATCHES"
1600  .rs  .rs
1601  .sp  .sp
# Line 1531  here. Line 1650  here.
1650  The two additional arguments provide workspace for the function. The workspace  The two additional arguments provide workspace for the function. The workspace
1651  vector should contain at least 20 elements. It is used for keeping track of  vector should contain at least 20 elements. It is used for keeping track of
1652  multiple paths through the pattern tree. More workspace will be needed for  multiple paths through the pattern tree. More workspace will be needed for
1653  patterns and subjects where there are a lot of possible matches.  patterns and subjects where there are a lot of potential matches.
1654  .P  .P
1655  Here is an example of a simple call to \fBpcre_dfa_exec()\fP:  Here is an example of a simple call to \fBpcre_dfa_exec()\fP:
1656  .sp  .sp
# Line 1554  Here is an example of a simple call to \ Line 1673  Here is an example of a simple call to \
1673  .rs  .rs
1674  .sp  .sp
1675  The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be  The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be
1676  zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NOTBOL,  zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP,
1677  PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL,
1678  PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last three of these are  PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last three of these are
1679  the same as for \fBpcre_exec()\fP, so their description is not repeated here.  the same as for \fBpcre_exec()\fP, so their description is not repeated here.
1680  .sp  .sp
# Line 1665  error is given if the output vector is n Line 1784  error is given if the output vector is n
1784  extremely rare, as a vector of size 1000 is used.  extremely rare, as a vector of size 1000 is used.
1785  .P  .P
1786  .in 0  .in 0
1787  Last updated: 18 January 2006  Last updated: 08 June 2006
1788  .br  .br
1789  Copyright (c) 1997-2006 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.

Legend:
Removed from v.90  
changed lines
  Added in v.91

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12