/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 72 by nigel, Sat Feb 24 21:40:24 2007 UTC revision 73 by nigel, Sat Feb 24 21:40:30 2007 UTC
# Line 99  PCRE - Perl-compatible regular expressio Line 99  PCRE - Perl-compatible regular expressio
99  .B void (*pcre_free)(void *);  .B void (*pcre_free)(void *);
100  .PP  .PP
101  .br  .br
102    .B void *(*pcre_stack_malloc)(size_t);
103    .PP
104    .br
105    .B void (*pcre_stack_free)(void *);
106    .PP
107    .br
108  .B int (*pcre_callout)(pcre_callout_block *);  .B int (*pcre_callout)(pcre_callout_block *);
109    
110  .SH PCRE API  .SH PCRE API
# Line 147  respectively. PCRE calls the memory mana Line 153  respectively. PCRE calls the memory mana
153  so a calling program can replace them if it wishes to intercept the calls. This  so a calling program can replace them if it wishes to intercept the calls. This
154  should be done before calling any PCRE functions.  should be done before calling any PCRE functions.
155    
156    The global variables \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are also
157    indirections to memory management functions. These special functions are used
158    only when PCRE is compiled to use the heap for remembering data, instead of
159    recursive function calls. This is a non-standard way of building PCRE, for use
160    in environments that have limited stacks. Because of the greater use of memory
161    management, it runs more slowly. Separate functions are provided so that
162    special-purpose external code can be used for this case. When used, these
163    functions are always called in a stack-like manner (last obtained, first
164    freed), and always for memory blocks of the same size.
165    
166  The global variable \fBpcre_callout\fR initially contains NULL. It can be set  The global variable \fBpcre_callout\fR initially contains NULL. It can be set
167  by the caller to a "callout" function, which PCRE will then call at specified  by the caller to a "callout" function, which PCRE will then call at specified
168  points during a matching operation. Details are given in the \fBpcrecallout\fR  points during a matching operation. Details are given in the \fBpcrecallout\fR
# Line 156  documentation. Line 172  documentation.
172  .rs  .rs
173  .sp  .sp
174  The PCRE functions can be used in multi-threading applications, with the  The PCRE functions can be used in multi-threading applications, with the
175  proviso that the memory management functions pointed to by \fBpcre_malloc\fR  proviso that the memory management functions pointed to by \fBpcre_malloc\fR,
176  and \fBpcre_free\fR, and the callout function pointed to by \fBpcre_callout\fR,  \fBpcre_free\fR, \fBpcre_stack_malloc\fR, and \fBpcre_stack_free\fR, and the
177  are shared by all threads.  callout function pointed to by \fBpcre_callout\fR, are shared by all threads.
178    
179  The compiled form of a regular expression is not altered during matching, so  The compiled form of a regular expression is not altered during matching, so
180  the same compiled pattern can safely be used by several threads at once.  the same compiled pattern can safely be used by several threads at once.
# Line 210  The output is an integer that gives the Line 226  The output is an integer that gives the
226  internal matching function calls in a \fBpcre_exec()\fR execution. Further  internal matching function calls in a \fBpcre_exec()\fR execution. Further
227  details are given with \fBpcre_exec()\fR below.  details are given with \fBpcre_exec()\fR below.
228    
229      PCRE_CONFIG_STACKRECURSE
230    
231    The output is an integer that is set to one if internal recursion is
232    implemented by recursive function calls that use the stack to remember their
233    state. This is the usual way that PCRE is compiled. The output is zero if PCRE
234    was compiled to use blocks of data on the heap instead of recursive function
235    calls. In this case, \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are
236    called to manage memory blocks on the heap, thus avoiding the use of the stack.
237    
238  .SH COMPILING A PATTERN  .SH COMPILING A PATTERN
239  .rs  .rs
240  .sp  .sp
# Line 711  or turned out to be anchored by virtue o Line 736  or turned out to be anchored by virtue o
736  unachored at matching time.  unachored at matching time.
737    
738  When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8  When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8
739  string is automatically checked. If an invalid UTF-8 sequence of bytes is  string is automatically checked, and the value of \fIstartoffset\fR is also
740  found, \fBpcre_exec()\fR returns the error PCRE_ERROR_BADUTF8. If you already  checked to ensure that it points to the start of a UTF-8 character. If an
741  know that your subject is valid, and you want to skip this check for  invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fR returns the error
742  performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling  PCRE_ERROR_BADUTF8. If \fIstartoffset\fR contains an invalid value,
743  \fBpcre_exec()\fR. When this option is set, the effect of passing an invalid  PCRE_ERROR_BADUTF8_OFFSET is returned.
744  UTF-8 string as a subject is undefined. It may cause your program to crash.  
745    If you already know that your subject is valid, and you want to skip these
746    checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when
747    calling \fBpcre_exec()\fR. You might want to do this for the second and
748    subsequent calls to \fBpcre_exec()\fR if you are making repeated calls to find
749    all the matches in a single subject string. However, you should be sure that
750    the value of \fIstartoffset\fR points to the start of a UTF-8 character. When
751    PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a
752    subject, or a value of \fIstartoffset\fR that does not point to the start of a
753    UTF-8 character, is undefined. Your program may crash.
754    
755  There are also three further options that can be set only at matching time:  There are also three further options that can be set only at matching time:
756    
# Line 753  PCRE_NOTEMPTY set, and then if that fail Line 787  PCRE_NOTEMPTY set, and then if that fail
787  below) and trying an ordinary match again.  below) and trying an ordinary match again.
788    
789  The subject string is passed to \fBpcre_exec()\fR as a pointer in  The subject string is passed to \fBpcre_exec()\fR as a pointer in
790  \fIsubject\fR, a length in \fIlength\fR, and a starting offset in  \fIsubject\fR, a length in \fIlength\fR, and a starting byte offset in
791  \fIstartoffset\fR. Unlike the pattern string, the subject may contain binary  \fIstartoffset\fR. Unlike the pattern string, the subject may contain binary
792  zero bytes. When the starting offset is zero, the search for a match starts at  zero bytes. When the starting offset is zero, the search for a match starts at
793  the beginning of the subject, and this is by far the most common case.  the beginning of the subject, and this is by far the most common case.
794    
795  If the pattern was compiled with the PCRE_UTF8 option, the subject must be a  If the pattern was compiled with the PCRE_UTF8 option, the subject must be a
796  sequence of bytes that is a valid UTF-8 string. If an invalid UTF-8 string is  sequence of bytes that is a valid UTF-8 string, and the starting offset must
797  passed, PCRE's behaviour is not defined.  point to the beginning of a UTF-8 character. If an invalid UTF-8 string or
798    offset is passed, an error (either PCRE_ERROR_BADUTF8 or
799    PCRE_ERROR_BADUTF8_OFFSET) is returned, unless the option PCRE_NO_UTF8_CHECK is
800    set, in which case PCRE's behaviour is not defined.
801    
802  A non-zero starting offset is useful when searching for another match in the  A non-zero starting offset is useful when searching for another match in the
803  same subject by calling \fBpcre_exec()\fR again after a previous success.  same subject by calling \fBpcre_exec()\fR again after a previous success.
# Line 892  This error is never generated by \fBpcre Line 929  This error is never generated by \fBpcre
929  use by callout functions that want to yield a distinctive error code. See the  use by callout functions that want to yield a distinctive error code. See the
930  \fBpcrecallout\fR documentation for details.  \fBpcrecallout\fR documentation for details.
931    
932    PCRE_ERROR_BADUTF8       (-10)    PCRE_ERROR_BADUTF8        (-10)
933    
934  A string that contains an invalid UTF-8 byte sequence was passed as a subject.  A string that contains an invalid UTF-8 byte sequence was passed as a subject.
935    
936      PCRE_ERROR_BADUTF8_OFFSET (-11)
937    
938    The UTF-8 byte sequence that was passed as a subject was valid, but the value
939    of \fIstartoffset\fR did not point to the beginning of a UTF-8 character.
940    
941  .SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER  .SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
942  .rs  .rs
943  .sp  .sp
# Line 1035  then call \fIpcre_copy_substring()\fR or Line 1077  then call \fIpcre_copy_substring()\fR or
1077  appropriate.  appropriate.
1078    
1079  .in 0  .in 0
1080  Last updated: 20 August 2003  Last updated: 09 December 2003
1081  .br  .br
1082  Copyright (c) 1997-2003 University of Cambridge.  Copyright (c) 1997-2003 University of Cambridge.

Legend:
Removed from v.72  
changed lines
  Added in v.73

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12