/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 76 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 77 by nigel, Sat Feb 24 21:40:45 2007 UTC
# Line 15  PCRE - Perl-compatible regular expressio Line 15  PCRE - Perl-compatible regular expressio
15  .B const unsigned char *\fItableptr\fP);  .B const unsigned char *\fItableptr\fP);
16  .PP  .PP
17  .br  .br
18    .B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP,
19    .ti +5n
20    .B int *\fIerrorcodeptr\fP,
21    .ti +5n
22    .B const char **\fIerrptr\fP, int *\fIerroffset\fP,
23    .ti +5n
24    .B const unsigned char *\fItableptr\fP);
25    .PP
26    .br
27  .B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,  .B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,
28  .ti +5n  .ti +5n
29  .B const char **\fIerrptr\fP);  .B const char **\fIerrptr\fP);
# Line 27  PCRE - Perl-compatible regular expressio Line 36  PCRE - Perl-compatible regular expressio
36  .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);  .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
37  .PP  .PP
38  .br  .br
39    .B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
40    .ti +5n
41    .B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
42    .ti +5n
43    .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
44    .ti +5n
45    .B int *\fIworkspace\fP, int \fIwscount\fP);
46    .PP
47    .br
48  .B int pcre_copy_named_substring(const pcre *\fIcode\fP,  .B int pcre_copy_named_substring(const pcre *\fIcode\fP,
49  .ti +5n  .ti +5n
50  .B const char *\fIsubject\fP, int *\fIovector\fP,  .B const char *\fIsubject\fP, int *\fIovector\fP,
# Line 87  PCRE - Perl-compatible regular expressio Line 105  PCRE - Perl-compatible regular expressio
105  .B *\fIfirstcharptr\fP);  .B *\fIfirstcharptr\fP);
106  .PP  .PP
107  .br  .br
108    .B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP);
109    .PP
110    .br
111  .B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP);  .B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP);
112  .PP  .PP
113  .br  .br
# Line 111  PCRE - Perl-compatible regular expressio Line 132  PCRE - Perl-compatible regular expressio
132  .SH "PCRE API OVERVIEW"  .SH "PCRE API OVERVIEW"
133  .rs  .rs
134  .sp  .sp
135  PCRE has its own native API, which is described in this document. There is also  PCRE has its own native API, which is described in this document. There is
136  a set of wrapper functions that correspond to the POSIX regular expression API.  also a set of wrapper functions that correspond to the POSIX regular expression
137  These are described in the  API. These are described in the
138  .\" HREF  .\" HREF
139  \fBpcreposix\fP  \fBpcreposix\fP
140  .\"  .\"
141  documentation.  documentation. Both of these APIs define a set of C function calls. A C++
142    wrapper is distributed with PCRE. It is documented in the
143    .\" HREF
144    \fBpcrecpp\fP
145    .\"
146    page.
147  .P  .P
148  The native API function prototypes are defined in the header file \fBpcre.h\fP,  The native API C function prototypes are defined in the header file
149  and on Unix systems the library itself is called \fBlibpcre\fP. It can  \fBpcre.h\fP, and on Unix systems the library itself is called \fBlibpcre\fP.
150  normally be accessed by adding \fB-lpcre\fP to the command for linking an  It can normally be accessed by adding \fB-lpcre\fP to the command for linking
151  application that uses PCRE. The header file defines the macros PCRE_MAJOR and  an application that uses PCRE. The header file defines the macros PCRE_MAJOR
152  PCRE_MINOR to contain the major and minor release numbers for the library.  and PCRE_MINOR to contain the major and minor release numbers for the library.
153  Applications can use these to include support for different releases of PCRE.  Applications can use these to include support for different releases of PCRE.
154  .P  .P
155  The functions \fBpcre_compile()\fP, \fBpcre_study()\fP, and \fBpcre_exec()\fP  The functions \fBpcre_compile()\fP, \fBpcre_compile2()\fP, \fBpcre_study()\fP,
156  are used for compiling and matching regular expressions. A sample program that  and \fBpcre_exec()\fP are used for compiling and matching regular expressions
157  demonstrates the simplest way of using them is provided in the file called  in a Perl-compatible manner. A sample program that demonstrates the simplest
158  \fIpcredemo.c\fP in the source distribution. The  way of using them is provided in the file called \fIpcredemo.c\fP in the source
159    distribution. The
160  .\" HREF  .\" HREF
161  \fBpcresample\fP  \fBpcresample\fP
162  .\"  .\"
163  documentation describes how to run it.  documentation describes how to run it.
164  .P  .P
165    A second matching function, \fBpcre_dfa_exec()\fP, which is not
166    Perl-compatible, is also provided. This uses a different algorithm for the
167    matching. This allows it to find all possible matches (at a given point in the
168    subject), not just one. However, this algorithm does not return captured
169    substrings. A description of the two matching algorithms and their advantages
170    and disadvantages is given in the
171    .\" HREF
172    \fBpcrematching\fP
173    .\"
174    documentation.
175    .P
176  In addition to the main compiling and matching functions, there are convenience  In addition to the main compiling and matching functions, there are convenience
177  functions for extracting captured substrings from a matched subject string.  functions for extracting captured substrings from a subject string that is
178  They are:  matched by \fBpcre_exec()\fP. They are:
179  .sp  .sp
180    \fBpcre_copy_substring()\fP    \fBpcre_copy_substring()\fP
181    \fBpcre_copy_named_substring()\fP    \fBpcre_copy_named_substring()\fP
# Line 150  They are: Line 188  They are:
188  provided, to free the memory used for extracted strings.  provided, to free the memory used for extracted strings.
189  .P  .P
190  The function \fBpcre_maketables()\fP is used to build a set of character tables  The function \fBpcre_maketables()\fP is used to build a set of character tables
191  in the current locale for passing to \fBpcre_compile()\fP or \fBpcre_exec()\fP.  in the current locale for passing to \fBpcre_compile()\fP, \fBpcre_exec()\fP,
192  This is an optional facility that is provided for specialist use. Most  or \fBpcre_dfa_exec()\fP. This is an optional facility that is provided for
193  commonly, no special tables are passed, in which case internal tables that are  specialist use. Most commonly, no special tables are passed, in which case
194  generated when PCRE is built are used.  internal tables that are generated when PCRE is built are used.
195  .P  .P
196  The function \fBpcre_fullinfo()\fP is used to find out information about a  The function \fBpcre_fullinfo()\fP is used to find out information about a
197  compiled pattern; \fBpcre_info()\fP is an obsolete version that returns only  compiled pattern; \fBpcre_info()\fP is an obsolete version that returns only
# Line 161  some of the available information, but i Line 199  some of the available information, but i
199  The function \fBpcre_version()\fP returns a pointer to a string containing the  The function \fBpcre_version()\fP returns a pointer to a string containing the
200  version of PCRE and its date of release.  version of PCRE and its date of release.
201  .P  .P
202    The function \fBpcre_refcount()\fP maintains a reference count in a data block
203    containing a compiled pattern. This is provided for the benefit of
204    object-oriented applications.
205    .P
206  The global variables \fBpcre_malloc\fP and \fBpcre_free\fP initially contain  The global variables \fBpcre_malloc\fP and \fBpcre_free\fP initially contain
207  the entry points of the standard \fBmalloc()\fP and \fBfree()\fP functions,  the entry points of the standard \fBmalloc()\fP and \fBfree()\fP functions,
208  respectively. PCRE calls the memory management functions via these variables,  respectively. PCRE calls the memory management functions via these variables,
# Line 170  should be done before calling any PCRE f Line 212  should be done before calling any PCRE f
212  The global variables \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are also  The global variables \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are also
213  indirections to memory management functions. These special functions are used  indirections to memory management functions. These special functions are used
214  only when PCRE is compiled to use the heap for remembering data, instead of  only when PCRE is compiled to use the heap for remembering data, instead of
215  recursive function calls. This is a non-standard way of building PCRE, for use  recursive function calls, when running the \fBpcre_exec()\fP function. This is
216  in environments that have limited stacks. Because of the greater use of memory  a non-standard way of building PCRE, for use in environments that have limited
217  management, it runs more slowly. Separate functions are provided so that  stacks. Because of the greater use of memory management, it runs more slowly.
218  special-purpose external code can be used for this case. When used, these  Separate functions are provided so that special-purpose external code can be
219  functions are always called in a stack-like manner (last obtained, first  used for this case. When used, these functions are always called in a
220  freed), and always for memory blocks of the same size.  stack-like manner (last obtained, first freed), and always for memory blocks of
221    the same size.
222  .P  .P
223  The global variable \fBpcre_callout\fP initially contains NULL. It can be set  The global variable \fBpcre_callout\fP initially contains NULL. It can be set
224  by the caller to a "callout" function, which PCRE will then call at specified  by the caller to a "callout" function, which PCRE will then call at specified
# Line 268  details are given with \fBpcre_exec()\fP Line 311  details are given with \fBpcre_exec()\fP
311  .sp  .sp
312    PCRE_CONFIG_STACKRECURSE    PCRE_CONFIG_STACKRECURSE
313  .sp  .sp
314  The output is an integer that is set to one if internal recursion is  The output is an integer that is set to one if internal recursion when running
315  implemented by recursive function calls that use the stack to remember their  \fBpcre_exec()\fP is implemented by recursive function calls that use the stack
316  state. This is the usual way that PCRE is compiled. The output is zero if PCRE  to remember their state. This is the usual way that PCRE is compiled. The
317  was compiled to use blocks of data on the heap instead of recursive function  output is zero if PCRE was compiled to use blocks of data on the heap instead
318  calls. In this case, \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are  of recursive function calls. In this case, \fBpcre_stack_malloc\fP and
319  called to manage memory blocks on the heap, thus avoiding the use of the stack.  \fBpcre_stack_free\fP are called to manage memory blocks on the heap, thus
320    avoiding the use of the stack.
321  .  .
322  .  .
323  .SH "COMPILING A PATTERN"  .SH "COMPILING A PATTERN"
# Line 284  called to manage memory blocks on the he Line 328  called to manage memory blocks on the he
328  .B const char **\fIerrptr\fP, int *\fIerroffset\fP,  .B const char **\fIerrptr\fP, int *\fIerroffset\fP,
329  .ti +5n  .ti +5n
330  .B const unsigned char *\fItableptr\fP);  .B const unsigned char *\fItableptr\fP);
331    .sp
332    .B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP,
333    .ti +5n
334    .B int *\fIerrorcodeptr\fP,
335    .ti +5n
336    .B const char **\fIerrptr\fP, int *\fIerroffset\fP,
337    .ti +5n
338    .B const unsigned char *\fItableptr\fP);
339  .P  .P
340  The function \fBpcre_compile()\fP is called to compile a pattern into an  Either of the functions \fBpcre_compile()\fP or \fBpcre_compile2()\fP can be
341  internal form. The pattern is a C string terminated by a binary zero, and  called to compile a pattern into an internal form. The only difference between
342  is passed in the \fIpattern\fP argument. A pointer to a single block of memory  the two interfaces is that \fBpcre_compile2()\fP has an additional argument,
343  that is obtained via \fBpcre_malloc\fP is returned. This contains the compiled  \fIerrorcodeptr\fP, via which a numerical error code can be returned.
344  code and related data. The \fBpcre\fP type is defined for the returned block;  .P
345  this is a typedef for a structure whose contents are not externally defined. It  The pattern is a C string terminated by a binary zero, and is passed in the
346  is up to the caller to free the memory when it is no longer required.  \fIpattern\fP argument. A pointer to a single block of memory that is obtained
347    via \fBpcre_malloc\fP is returned. This contains the compiled code and related
348    data. The \fBpcre\fP type is defined for the returned block; this is a typedef
349    for a structure whose contents are not externally defined. It is up to the
350    caller to free the memory when it is no longer required.
351  .P  .P
352  Although the compiled code of a PCRE regex is relocatable, that is, it does not  Although the compiled code of a PCRE regex is relocatable, that is, it does not
353  depend on memory location, the complete \fBpcre\fP data block is not  depend on memory location, the complete \fBpcre\fP data block is not
# Line 318  error message. The offset from the start Line 374  error message. The offset from the start
374  the error was discovered is placed in the variable pointed to by  the error was discovered is placed in the variable pointed to by
375  \fIerroffset\fP, which must not be NULL. If it is, an immediate error is given.  \fIerroffset\fP, which must not be NULL. If it is, an immediate error is given.
376  .P  .P
377    If \fBpcre_compile2()\fP is used instead of \fBpcre_compile()\fP, and the
378    \fIerrorcodeptr\fP argument is not NULL, a non-zero error code number is
379    returned via this argument in the event of an error. This is in addition to the
380    textual error message. Error codes and messages are listed below.
381    .P
382  If the final argument, \fItableptr\fP, is NULL, PCRE uses a default set of  If the final argument, \fItableptr\fP, is NULL, PCRE uses a default set of
383  character tables that are built when PCRE is compiled, using the default C  character tables that are built when PCRE is compiled, using the default C
384  locale. Otherwise, \fItableptr\fP must be an address that is the result of a  locale. Otherwise, \fItableptr\fP must be an address that is the result of a
# Line 362  documentation. Line 423  documentation.
423  .sp  .sp
424  If this bit is set, letters in the pattern match both upper and lower case  If this bit is set, letters in the pattern match both upper and lower case
425  letters. It is equivalent to Perl's /i option, and it can be changed within a  letters. It is equivalent to Perl's /i option, and it can be changed within a
426  pattern by a (?i) option setting. When running in UTF-8 mode, case support for  pattern by a (?i) option setting. In UTF-8 mode, PCRE always understands the
427  high-valued characters is available only when PCRE is built with Unicode  concept of case for characters whose values are less than 128, so caseless
428  character property support.  matching is always possible. For characters with higher values, the concept of
429    case is supported if PCRE is compiled with Unicode property support, but not
430    otherwise. If you want to use caseless matching for characters 128 and above,
431    you must ensure that PCRE is compiled with Unicode property support as well as
432    with UTF-8 support.
433  .sp  .sp
434    PCRE_DOLLAR_ENDONLY    PCRE_DOLLAR_ENDONLY
435  .sp  .sp
# Line 408  special meaning is treated as a literal. Line 473  special meaning is treated as a literal.
473  controlled by this option. It can also be set by a (?X) option setting within a  controlled by this option. It can also be set by a (?X) option setting within a
474  pattern.  pattern.
475  .sp  .sp
476      PCRE_FIRSTLINE
477    .sp
478    If this option is set, an unanchored pattern is required to match before or at
479    the first newline character in the subject string, though the matched text may
480    continue over the newline.
481    .sp
482    PCRE_MULTILINE    PCRE_MULTILINE
483  .sp  .sp
484  By default, PCRE treats the subject string as consisting of a single line of  By default, PCRE treats the subject string as consisting of a single line of
# Line 463  automatically checked. If an invalid UTF Line 534  automatically checked. If an invalid UTF
534  valid, and you want to skip this check for performance reasons, you can set the  valid, and you want to skip this check for performance reasons, you can set the
535  PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid  PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid
536  UTF-8 string as a pattern is undefined. It may cause your program to crash.  UTF-8 string as a pattern is undefined. It may cause your program to crash.
537  Note that this option can also be passed to \fBpcre_exec()\fP, to suppress the  Note that this option can also be passed to \fBpcre_exec()\fP and
538  UTF-8 validity checking of subject strings.  \fBpcre_dfa_exec()\fP, to suppress the UTF-8 validity checking of subject
539    strings.
540    .
541    .
542    .SH "COMPILATION ERROR CODES"
543    .rs
544    .sp
545    The following table lists the error codes than may be returned by
546    \fBpcre_compile2()\fP, along with the error messages that may be returned by
547    both compiling functions.
548    .sp
549       0  no error
550       1  \e at end of pattern
551       2  \ec at end of pattern
552       3  unrecognized character follows \e
553       4  numbers out of order in {} quantifier
554       5  number too big in {} quantifier
555       6  missing terminating ] for character class
556       7  invalid escape sequence in character class
557       8  range out of order in character class
558       9  nothing to repeat
559      10  operand of unlimited repeat could match the empty string
560      11  internal error: unexpected repeat
561      12  unrecognized character after (?
562      13  POSIX named classes are supported only within a class
563      14  missing )
564      15  reference to non-existent subpattern
565      16  erroffset passed as NULL
566      17  unknown option bit(s) set
567      18  missing ) after comment
568      19  parentheses nested too deeply
569      20  regular expression too large
570      21  failed to get memory
571      22  unmatched parentheses
572      23  internal error: code overflow
573      24  unrecognized character after (?<
574      25  lookbehind assertion is not fixed length
575      26  malformed number after (?(
576      27  conditional group contains more than two branches
577      28  assertion expected after (?(
578      29  (?R or (?digits must be followed by )
579      30  unknown POSIX class name
580      31  POSIX collating elements are not supported
581      32  this version of PCRE is not compiled with PCRE_UTF8 support
582      33  spare error
583      34  character value in \ex{...} sequence is too large
584      35  invalid condition (?(0)
585      36  \eC not allowed in lookbehind assertion
586      37  PCRE does not support \eL, \el, \eN, \eU, or \eu
587      38  number after (?C is > 255
588      39  closing ) for (?C expected
589      40  recursive call could loop indefinitely
590      41  unrecognized character after (?P
591      42  syntax error after (?P
592      43  two named groups have the same name
593      44  invalid UTF-8 string
594      45  support for \eP, \ep, and \eX has not been compiled
595      46  malformed \eP or \ep sequence
596      47  unknown property name after \eP or \ep
597  .  .
598  .  .
599  .SH "STUDYING A PATTERN"  .SH "STUDYING A PATTERN"
600  .rs  .rs
601  .sp  .sp
602  .B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,  .B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP
603  .ti +5n  .ti +5n
604  .B const char **\fIerrptr\fP);  .B const char **\fIerrptr\fP);
605  .PP  .PP
# Line 492  below Line 621  below
621  .\"  .\"
622  in the section on matching a pattern.  in the section on matching a pattern.
623  .P  .P
624  If studying the pattern does not produce any additional information,  If studying the pattern does not produce any additional information
625  \fBpcre_study()\fP returns NULL. In that circumstance, if the calling program  \fBpcre_study()\fP returns NULL. In that circumstance, if the calling program
626  wants to pass any of the other fields to \fBpcre_exec()\fP, it must set up its  wants to pass any of the other fields to \fBpcre_exec()\fP, it must set up its
627  own \fBpcre_extra\fP block.  own \fBpcre_extra\fP block.
# Line 523  bytes is created. Line 652  bytes is created.
652  .SH "LOCALE SUPPORT"  .SH "LOCALE SUPPORT"
653  .rs  .rs
654  .sp  .sp
655  PCRE handles caseless matching, and determines whether characters are letters,  PCRE handles caseless matching, and determines whether characters are letters
656  digits, or whatever, by reference to a set of tables, indexed by character  digits, or whatever, by reference to a set of tables, indexed by character
657  value. (When running in UTF-8 mode, this applies only to characters with codes  value. When running in UTF-8 mode, this applies only to characters with codes
658  less than 128. Higher-valued codes never match escapes such as \ew or \ed, but  less than 128. Higher-valued codes never match escapes such as \ew or \ed, but
659  can be tested with \ep if PCRE is built with Unicode character property  can be tested with \ep if PCRE is built with Unicode character property
660  support.)  support.
661  .P  .P
662  An internal set of tables is created in the default C locale when PCRE is  An internal set of tables is created in the default C locale when PCRE is
663  built. This is used when the final argument of \fBpcre_compile()\fP is NULL,  built. This is used when the final argument of \fBpcre_compile()\fP is NULL,
# Line 615  no back references. Line 744  no back references.
744  Return the number of capturing subpatterns in the pattern. The fourth argument  Return the number of capturing subpatterns in the pattern. The fourth argument
745  should point to an \fBint\fP variable.  should point to an \fBint\fP variable.
746  .sp  .sp
747    PCRE_INFO_DEFAULTTABLES    PCRE_INFO_DEFAULT_TABLES
748  .sp  .sp
749  Return a pointer to the internal default character tables within PCRE. The  Return a pointer to the internal default character tables within PCRE. The
750  fourth argument should point to an \fBunsigned char *\fP variable. This  fourth argument should point to an \fBunsigned char *\fP variable. This
# Line 760  it is used to pass back information abou Line 889  it is used to pass back information abou
889  string (see PCRE_INFO_FIRSTBYTE above).  string (see PCRE_INFO_FIRSTBYTE above).
890  .  .
891  .  .
892  .SH "MATCHING A PATTERN"  .SH "REFERENCE COUNTS"
893    .rs
894    .sp
895    .B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP);
896    .PP
897    The \fBpcre_refcount()\fP function is used to maintain a reference count in the
898    data block that contains a compiled pattern. It is provided for the benefit of
899    applications that operate in an object-oriented manner, where different parts
900    of the application may be using the same compiled pattern, but you want to free
901    the block when they are all done.
902    .P
903    When a pattern is compiled, the reference count field is initialized to zero.
904    It is changed only by calling this function, whose action is to add the
905    \fIadjust\fP value (which may be positive or negative) to it. The yield of the
906    function is the new value. However, the value of the count is constrained to
907    lie between 0 and 65535, inclusive. If the new value is outside these limits,
908    it is forced to the appropriate limit value.
909    .P
910    Except when it is zero, the reference count is not correctly preserved if a
911    pattern is compiled on one host and then transferred to a host whose byte-order
912    is different. (This seems a highly unlikely scenario.)
913    .
914    .
915    .SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION"
916  .rs  .rs
917  .sp  .sp
918  .B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"  .B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
# Line 772  string (see PCRE_INFO_FIRSTBYTE above). Line 924  string (see PCRE_INFO_FIRSTBYTE above).
924  The function \fBpcre_exec()\fP is called to match a subject string against a  The function \fBpcre_exec()\fP is called to match a subject string against a
925  compiled pattern, which is passed in the \fIcode\fP argument. If the  compiled pattern, which is passed in the \fIcode\fP argument. If the
926  pattern has been studied, the result of the study should be passed in the  pattern has been studied, the result of the study should be passed in the
927  \fIextra\fP argument.  \fIextra\fP argument. This function is the main matching facility of the
928    library, and it operates in a Perl-like manner. For specialist use there is
929    also an alternative matching function, which is described
930    .\" HTML <a href="#dfamatch">
931    .\" </a>
932    below
933    .\"
934    in the section about the \fBpcre_dfa_exec()\fP function.
935  .P  .P
936  In most applications, the pattern will have been compiled (and optionally  In most applications, the pattern will have been compiled (and optionally
937  studied) in the same process that calls \fBpcre_exec()\fP. However, it is  studied) in the same process that calls \fBpcre_exec()\fP. However, it is
# Line 796  Here is an example of a simple call to \ Line 955  Here is an example of a simple call to \
955      0,              /* start at offset 0 in the subject */      0,              /* start at offset 0 in the subject */
956      0,              /* default options */      0,              /* default options */
957      ovector,        /* vector of integers for substring information */      ovector,        /* vector of integers for substring information */
958      30);            /* number of elements in the vector (NOT size in bytes) */      30);            /* number of elements (NOT size in bytes) */
959  .  .
960  .\" HTML <a name="extradata"></a>  .\" HTML <a name="extradata"></a>
961  .SS "Extra data for \fBpcre_exec()\fR"  .SS "Extra data for \fBpcre_exec()\fR"
# Line 1041  subpatterns there are in a compiled patt Line 1200  subpatterns there are in a compiled patt
1200  \fIovector\fP that will allow for \fIn\fP captured substrings, in addition to  \fIovector\fP that will allow for \fIn\fP captured substrings, in addition to
1201  the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3.  the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3.
1202  .  .
1203    .\" HTML <a name="errorlist"></a>
1204  .SS "Return values from \fBpcre_exec()\fP"  .SS "Return values from \fBpcre_exec()\fP"
1205  .rs  .rs
1206  .sp  .sp
# Line 1112  A string that contains an invalid UTF-8 Line 1272  A string that contains an invalid UTF-8
1272  The UTF-8 byte sequence that was passed as a subject was valid, but the value  The UTF-8 byte sequence that was passed as a subject was valid, but the value
1273  of \fIstartoffset\fP did not point to the beginning of a UTF-8 character.  of \fIstartoffset\fP did not point to the beginning of a UTF-8 character.
1274  .sp  .sp
1275    PCRE_ERROR_PARTIAL (-12)    PCRE_ERROR_PARTIAL        (-12)
1276  .sp  .sp
1277  The subject string did not match, but it did match partially. See the  The subject string did not match, but it did match partially. See the
1278  .\" HREF  .\" HREF
# Line 1120  The subject string did not match, but it Line 1280  The subject string did not match, but it
1280  .\"  .\"
1281  documentation for details of partial matching.  documentation for details of partial matching.
1282  .sp  .sp
1283    PCRE_ERROR_BAD_PARTIAL (-13)    PCRE_ERROR_BADPARTIAL     (-13)
1284  .sp  .sp
1285  The PCRE_PARTIAL option was used with a compiled pattern containing items that  The PCRE_PARTIAL option was used with a compiled pattern containing items that
1286  are not supported for partial matching. See the  are not supported for partial matching. See the
# Line 1129  are not supported for partial matching. Line 1289  are not supported for partial matching.
1289  .\"  .\"
1290  documentation for details of partial matching.  documentation for details of partial matching.
1291  .sp  .sp
1292    PCRE_ERROR_INTERNAL (-14)    PCRE_ERROR_INTERNAL       (-14)
1293  .sp  .sp
1294  An unexpected internal error has occurred. This error could be caused by a bug  An unexpected internal error has occurred. This error could be caused by a bug
1295  in PCRE or by overwriting of the compiled pattern.  in PCRE or by overwriting of the compiled pattern.
1296  .sp  .sp
1297    PCRE_ERROR_BADCOUNT (-15)    PCRE_ERROR_BADCOUNT       (-15)
1298  .sp  .sp
1299  This error is given if the value of the \fIovecsize\fP argument is negative.  This error is given if the value of the \fIovecsize\fP argument is negative.
1300  .  .
# Line 1281  translation table. Line 1441  translation table.
1441  These functions call \fBpcre_get_stringnumber()\fP, and if it succeeds, they  These functions call \fBpcre_get_stringnumber()\fP, and if it succeeds, they
1442  then call \fIpcre_copy_substring()\fP or \fIpcre_get_substring()\fP, as  then call \fIpcre_copy_substring()\fP or \fIpcre_get_substring()\fP, as
1443  appropriate.  appropriate.
1444    .
1445    .
1446    .SH "FINDING ALL POSSIBLE MATCHES"
1447    .rs
1448    .sp
1449    The traditional matching function uses a similar algorithm to Perl, which stops
1450    when it finds the first match, starting at a given point in the subject. If you
1451    want to find all possible matches, or the longest possible match, consider
1452    using the alternative matching function (see below) instead. If you cannot use
1453    the alternative function, but still need to find all possible matches, you
1454    can kludge it up by making use of the callout facility, which is described in
1455    the
1456    .\" HREF
1457    \fBpcrecallout\fP
1458    .\"
1459    documentation.
1460    .P
1461    What you have to do is to insert a callout right at the end of the pattern.
1462    When your callout function is called, extract and save the current matched
1463    substring. Then return 1, which forces \fBpcre_exec()\fP to backtrack and try
1464    other alternatives. Ultimately, when it runs out of matches, \fBpcre_exec()\fP
1465    will yield PCRE_ERROR_NOMATCH.
1466    .
1467    .
1468    .\" HTML <a name="dfamatch"></a>
1469    .SH "MATCHING A PATTERN: THE ALTERNATIVE FUNCTION"
1470    .rs
1471    .sp
1472    .B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
1473    .ti +5n
1474    .B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
1475    .ti +5n
1476    .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
1477    .ti +5n
1478    .B int *\fIworkspace\fP, int \fIwscount\fP);
1479    .P
1480    The function \fBpcre_dfa_exec()\fP is called to match a subject string against
1481    a compiled pattern, using a "DFA" matching algorithm. This has different
1482    characteristics to the normal algorithm, and is not compatible with Perl. Some
1483    of the features of PCRE patterns are not supported. Nevertheless, there are
1484    times when this kind of matching can be useful. For a discussion of the two
1485    matching algorithms, see the
1486    .\" HREF
1487    \fBpcrematching\fP
1488    .\"
1489    documentation.
1490    .P
1491    The arguments for the \fBpcre_dfa_exec()\fP function are the same as for
1492    \fBpcre_exec()\fP, plus two extras. The \fIovector\fP argument is used in a
1493    different way, and this is described below. The other common arguments are used
1494    in the same way as for \fBpcre_exec()\fP, so their description is not repeated
1495    here.
1496    .P
1497    The two additional arguments provide workspace for the function. The workspace
1498    vector should contain at least 20 elements. It is used for keeping track of
1499    multiple paths through the pattern tree. More workspace will be needed for
1500    patterns and subjects where there are a lot of possible matches.
1501    .P
1502    Here is an example of a simple call to \fBpcre_exec()\fP:
1503    .sp
1504      int rc;
1505      int ovector[10];
1506      int wspace[20];
1507      rc = pcre_exec(
1508        re,             /* result of pcre_compile() */
1509        NULL,           /* we didn't study the pattern */
1510        "some string",  /* the subject string */
1511        11,             /* the length of the subject string */
1512        0,              /* start at offset 0 in the subject */
1513        0,              /* default options */
1514        ovector,        /* vector of integers for substring information */
1515        10,             /* number of elements (NOT size in bytes) */
1516        wspace,         /* working space vector */
1517        20);            /* number of elements (NOT size in bytes) */
1518    .
1519    .SS "Option bits for \fBpcre_dfa_exec()\fP"
1520    .rs
1521    .sp
1522    The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be
1523    zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NOTBOL,
1524    PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL,
1525    PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last three of these are
1526    the same as for \fBpcre_exec()\fP, so their description is not repeated here.
1527    .sp
1528      PCRE_PARTIAL
1529    .sp
1530    This has the same general effect as it does for \fBpcre_exec()\fP, but the
1531    details are slightly different. When PCRE_PARTIAL is set for
1532    \fBpcre_dfa_exec()\fP, the return code PCRE_ERROR_NOMATCH is converted into
1533    PCRE_ERROR_PARTIAL if the end of the subject is reached, there have been no
1534    complete matches, but there is still at least one matching possibility. The
1535    portion of the string that provided the partial match is set as the first
1536    matching string.
1537    .sp
1538      PCRE_DFA_SHORTEST
1539    .sp
1540    Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as
1541    soon as it has found one match. Because of the way the DFA algorithm works,
1542    this is necessarily the shortest possible match at the first possible matching
1543    point in the subject string.
1544    .sp
1545      PCRE_DFA_RESTART
1546    .sp
1547    When \fBpcre_dfa_exec()\fP is called with the PCRE_PARTIAL option, and returns
1548    a partial match, it is possible to call it again, with additional subject
1549    characters, and have it continue with the same match. The PCRE_DFA_RESTART
1550    option requests this action; when it is set, the \fIworkspace\fP and
1551    \fIwscount\fP options must reference the same vector as before because data
1552    about the match so far is left in them after a partial match. There is more
1553    discussion of this facility in the
1554    .\" HREF
1555    \fBpcrepartial\fP
1556    .\"
1557    documentation.
1558    .
1559    .SS "Successful returns from \fBpcre_dfa_exec()\fP"
1560    .rs
1561    .sp
1562    When \fBpcre_dfa_exec()\fP succeeds, it may have matched more than one
1563    substring in the subject. Note, however, that all the matches from one run of
1564    the function start at the same point in the subject. The shorter matches are
1565    all initial substrings of the longer matches. For example, if the pattern
1566    .sp
1567      <.*>
1568    .sp
1569    is matched against the string
1570    .sp
1571      This is <something> <something else> <something further> no more
1572    .sp
1573    the three matched strings are
1574    .sp
1575      <something>
1576      <something> <something else>
1577      <something> <something else> <something further>
1578    .sp
1579    On success, the yield of the function is a number greater than zero, which is
1580    the number of matched substrings. The substrings themselves are returned in
1581    \fIovector\fP. Each string uses two elements; the first is the offset to the
1582    start, and the second is the offset to the end. All the strings have the same
1583    start offset. (Space could have been saved by giving this only once, but it was
1584    decided to retain some compatibility with the way \fBpcre_exec()\fP returns
1585    data, even though the meaning of the strings is different.)
1586    .P
1587    The strings are returned in reverse order of length; that is, the longest
1588    matching string is given first. If there were too many matches to fit into
1589    \fIovector\fP, the yield of the function is zero, and the vector is filled with
1590    the longest matches.
1591    .
1592    .SS "Error returns from \fBpcre_dfa_exec()\fP"
1593    .rs
1594    .sp
1595    The \fBpcre_dfa_exec()\fP function returns a negative number when it fails.
1596    Many of the errors are the same as for \fBpcre_exec()\fP, and these are
1597    described
1598    .\" HTML <a href="#errorlist">
1599    .\" </a>
1600    above.
1601    .\"
1602    There are in addition the following errors that are specific to
1603    \fBpcre_dfa_exec()\fP:
1604    .sp
1605      PCRE_ERROR_DFA_UITEM      (-16)
1606    .sp
1607    This return is given if \fBpcre_dfa_exec()\fP encounters an item in the pattern
1608    that it does not support, for instance, the use of \eC or a back reference.
1609    .sp
1610      PCRE_ERROR_DFA_UCOND      (-17)
1611    .sp
1612    This return is given if \fBpcre_dfa_exec()\fP encounters a condition item in a
1613    pattern that uses a back reference for the condition. This is not supported.
1614    .sp
1615      PCRE_ERROR_DFA_UMLIMIT    (-18)
1616    .sp
1617    This return is given if \fBpcre_dfa_exec()\fP is called with an \fIextra\fP
1618    block that contains a setting of the \fImatch_limit\fP field. This is not
1619    supported (it is meaningless).
1620    .sp
1621      PCRE_ERROR_DFA_WSSIZE     (-19)
1622    .sp
1623    This return is given if \fBpcre_dfa_exec()\fP runs out of space in the
1624    \fIworkspace\fP vector.
1625    .sp
1626      PCRE_ERROR_DFA_RECURSE    (-20)
1627    .sp
1628    When a recursive subpattern is processed, the matching function calls itself
1629    recursively, using private vectors for \fIovector\fP and \fIworkspace\fP. This
1630    error is given if the output vector is not large enough. This should be
1631    extremely rare, as a vector of size 1000 is used.
1632  .P  .P
1633  .in 0  .in 0
1634  Last updated: 09 September 2004  Last updated: 16 May 2005
1635  .br  .br
1636  Copyright (c) 1997-2004 University of Cambridge.  Copyright (c) 1997-2005 University of Cambridge.

Legend:
Removed from v.76  
changed lines
  Added in v.77

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12