| 99 |
.B void (*pcre_free)(void *); |
.B void (*pcre_free)(void *); |
| 100 |
.PP |
.PP |
| 101 |
.br |
.br |
| 102 |
|
.B void *(*pcre_stack_malloc)(size_t); |
| 103 |
|
.PP |
| 104 |
|
.br |
| 105 |
|
.B void (*pcre_stack_free)(void *); |
| 106 |
|
.PP |
| 107 |
|
.br |
| 108 |
.B int (*pcre_callout)(pcre_callout_block *); |
.B int (*pcre_callout)(pcre_callout_block *); |
| 109 |
|
|
| 110 |
.SH PCRE API |
.SH PCRE API |
| 153 |
so a calling program can replace them if it wishes to intercept the calls. This |
so a calling program can replace them if it wishes to intercept the calls. This |
| 154 |
should be done before calling any PCRE functions. |
should be done before calling any PCRE functions. |
| 155 |
|
|
| 156 |
|
The global variables \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are also |
| 157 |
|
indirections to memory management functions. These special functions are used |
| 158 |
|
only when PCRE is compiled to use the heap for remembering data, instead of |
| 159 |
|
recursive function calls. This is a non-standard way of building PCRE, for use |
| 160 |
|
in environments that have limited stacks. Because of the greater use of memory |
| 161 |
|
management, it runs more slowly. Separate functions are provided so that |
| 162 |
|
special-purpose external code can be used for this case. When used, these |
| 163 |
|
functions are always called in a stack-like manner (last obtained, first |
| 164 |
|
freed), and always for memory blocks of the same size. |
| 165 |
|
|
| 166 |
The global variable \fBpcre_callout\fR initially contains NULL. It can be set |
The global variable \fBpcre_callout\fR initially contains NULL. It can be set |
| 167 |
by the caller to a "callout" function, which PCRE will then call at specified |
by the caller to a "callout" function, which PCRE will then call at specified |
| 168 |
points during a matching operation. Details are given in the \fBpcrecallout\fR |
points during a matching operation. Details are given in the \fBpcrecallout\fR |
| 172 |
.rs |
.rs |
| 173 |
.sp |
.sp |
| 174 |
The PCRE functions can be used in multi-threading applications, with the |
The PCRE functions can be used in multi-threading applications, with the |
| 175 |
proviso that the memory management functions pointed to by \fBpcre_malloc\fR |
proviso that the memory management functions pointed to by \fBpcre_malloc\fR, |
| 176 |
and \fBpcre_free\fR, and the callout function pointed to by \fBpcre_callout\fR, |
\fBpcre_free\fR, \fBpcre_stack_malloc\fR, and \fBpcre_stack_free\fR, and the |
| 177 |
are shared by all threads. |
callout function pointed to by \fBpcre_callout\fR, are shared by all threads. |
| 178 |
|
|
| 179 |
The compiled form of a regular expression is not altered during matching, so |
The compiled form of a regular expression is not altered during matching, so |
| 180 |
the same compiled pattern can safely be used by several threads at once. |
the same compiled pattern can safely be used by several threads at once. |
| 226 |
internal matching function calls in a \fBpcre_exec()\fR execution. Further |
internal matching function calls in a \fBpcre_exec()\fR execution. Further |
| 227 |
details are given with \fBpcre_exec()\fR below. |
details are given with \fBpcre_exec()\fR below. |
| 228 |
|
|
| 229 |
|
PCRE_CONFIG_STACKRECURSE |
| 230 |
|
|
| 231 |
|
The output is an integer that is set to one if internal recursion is |
| 232 |
|
implemented by recursive function calls that use the stack to remember their |
| 233 |
|
state. This is the usual way that PCRE is compiled. The output is zero if PCRE |
| 234 |
|
was compiled to use blocks of data on the heap instead of recursive function |
| 235 |
|
calls. In this case, \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are |
| 236 |
|
called to manage memory blocks on the heap, thus avoiding the use of the stack. |
| 237 |
|
|
| 238 |
.SH COMPILING A PATTERN |
.SH COMPILING A PATTERN |
| 239 |
.rs |
.rs |
| 240 |
.sp |
.sp |
| 736 |
unachored at matching time. |
unachored at matching time. |
| 737 |
|
|
| 738 |
When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8 |
When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8 |
| 739 |
string is automatically checked. If an invalid UTF-8 sequence of bytes is |
string is automatically checked, and the value of \fIstartoffset\fR is also |
| 740 |
found, \fBpcre_exec()\fR returns the error PCRE_ERROR_BADUTF8. If you already |
checked to ensure that it points to the start of a UTF-8 character. If an |
| 741 |
know that your subject is valid, and you want to skip this check for |
invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fR returns the error |
| 742 |
performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling |
PCRE_ERROR_BADUTF8. If \fIstartoffset\fR contains an invalid value, |
| 743 |
\fBpcre_exec()\fR. When this option is set, the effect of passing an invalid |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
| 744 |
UTF-8 string as a subject is undefined. It may cause your program to crash. |
|
| 745 |
|
If you already know that your subject is valid, and you want to skip these |
| 746 |
|
checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when |
| 747 |
|
calling \fBpcre_exec()\fR. You might want to do this for the second and |
| 748 |
|
subsequent calls to \fBpcre_exec()\fR if you are making repeated calls to find |
| 749 |
|
all the matches in a single subject string. However, you should be sure that |
| 750 |
|
the value of \fIstartoffset\fR points to the start of a UTF-8 character. When |
| 751 |
|
PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a |
| 752 |
|
subject, or a value of \fIstartoffset\fR that does not point to the start of a |
| 753 |
|
UTF-8 character, is undefined. Your program may crash. |
| 754 |
|
|
| 755 |
There are also three further options that can be set only at matching time: |
There are also three further options that can be set only at matching time: |
| 756 |
|
|
| 787 |
below) and trying an ordinary match again. |
below) and trying an ordinary match again. |
| 788 |
|
|
| 789 |
The subject string is passed to \fBpcre_exec()\fR as a pointer in |
The subject string is passed to \fBpcre_exec()\fR as a pointer in |
| 790 |
\fIsubject\fR, a length in \fIlength\fR, and a starting offset in |
\fIsubject\fR, a length in \fIlength\fR, and a starting byte offset in |
| 791 |
\fIstartoffset\fR. Unlike the pattern string, the subject may contain binary |
\fIstartoffset\fR. Unlike the pattern string, the subject may contain binary |
| 792 |
zero bytes. When the starting offset is zero, the search for a match starts at |
zero bytes. When the starting offset is zero, the search for a match starts at |
| 793 |
the beginning of the subject, and this is by far the most common case. |
the beginning of the subject, and this is by far the most common case. |
| 794 |
|
|
| 795 |
If the pattern was compiled with the PCRE_UTF8 option, the subject must be a |
If the pattern was compiled with the PCRE_UTF8 option, the subject must be a |
| 796 |
sequence of bytes that is a valid UTF-8 string. If an invalid UTF-8 string is |
sequence of bytes that is a valid UTF-8 string, and the starting offset must |
| 797 |
passed, PCRE's behaviour is not defined. |
point to the beginning of a UTF-8 character. If an invalid UTF-8 string or |
| 798 |
|
offset is passed, an error (either PCRE_ERROR_BADUTF8 or |
| 799 |
|
PCRE_ERROR_BADUTF8_OFFSET) is returned, unless the option PCRE_NO_UTF8_CHECK is |
| 800 |
|
set, in which case PCRE's behaviour is not defined. |
| 801 |
|
|
| 802 |
A non-zero starting offset is useful when searching for another match in the |
A non-zero starting offset is useful when searching for another match in the |
| 803 |
same subject by calling \fBpcre_exec()\fR again after a previous success. |
same subject by calling \fBpcre_exec()\fR again after a previous success. |
| 929 |
use by callout functions that want to yield a distinctive error code. See the |
use by callout functions that want to yield a distinctive error code. See the |
| 930 |
\fBpcrecallout\fR documentation for details. |
\fBpcrecallout\fR documentation for details. |
| 931 |
|
|
| 932 |
PCRE_ERROR_BADUTF8 (-10) |
PCRE_ERROR_BADUTF8 (-10) |
| 933 |
|
|
| 934 |
A string that contains an invalid UTF-8 byte sequence was passed as a subject. |
A string that contains an invalid UTF-8 byte sequence was passed as a subject. |
| 935 |
|
|
| 936 |
|
PCRE_ERROR_BADUTF8_OFFSET (-11) |
| 937 |
|
|
| 938 |
|
The UTF-8 byte sequence that was passed as a subject was valid, but the value |
| 939 |
|
of \fIstartoffset\fR did not point to the beginning of a UTF-8 character. |
| 940 |
|
|
| 941 |
.SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER |
.SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER |
| 942 |
.rs |
.rs |
| 943 |
.sp |
.sp |
| 1077 |
appropriate. |
appropriate. |
| 1078 |
|
|
| 1079 |
.in 0 |
.in 0 |
| 1080 |
Last updated: 20 August 2003 |
Last updated: 09 December 2003 |
| 1081 |
.br |
.br |
| 1082 |
Copyright (c) 1997-2003 University of Cambridge. |
Copyright (c) 1997-2003 University of Cambridge. |