| 98 |
<b>void (*pcre_free)(void *);</b> |
<b>void (*pcre_free)(void *);</b> |
| 99 |
</P> |
</P> |
| 100 |
<P> |
<P> |
| 101 |
|
<b>void *(*pcre_stack_malloc)(size_t);</b> |
| 102 |
|
</P> |
| 103 |
|
<P> |
| 104 |
|
<b>void (*pcre_stack_free)(void *);</b> |
| 105 |
|
</P> |
| 106 |
|
<P> |
| 107 |
<b>int (*pcre_callout)(pcre_callout_block *);</b> |
<b>int (*pcre_callout)(pcre_callout_block *);</b> |
| 108 |
</P> |
</P> |
| 109 |
<br><a name="SEC2" href="#TOC1">PCRE API</a><br> |
<br><a name="SEC2" href="#TOC1">PCRE API</a><br> |
| 162 |
should be done before calling any PCRE functions. |
should be done before calling any PCRE functions. |
| 163 |
</P> |
</P> |
| 164 |
<P> |
<P> |
| 165 |
|
The global variables <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> are also |
| 166 |
|
indirections to memory management functions. These special functions are used |
| 167 |
|
only when PCRE is compiled to use the heap for remembering data, instead of |
| 168 |
|
recursive function calls. This is a non-standard way of building PCRE, for use |
| 169 |
|
in environments that have limited stacks. Because of the greater use of memory |
| 170 |
|
management, it runs more slowly. Separate functions are provided so that |
| 171 |
|
special-purpose external code can be used for this case. When used, these |
| 172 |
|
functions are always called in a stack-like manner (last obtained, first |
| 173 |
|
freed), and always for memory blocks of the same size. |
| 174 |
|
</P> |
| 175 |
|
<P> |
| 176 |
The global variable <b>pcre_callout</b> initially contains NULL. It can be set |
The global variable <b>pcre_callout</b> initially contains NULL. It can be set |
| 177 |
by the caller to a "callout" function, which PCRE will then call at specified |
by the caller to a "callout" function, which PCRE will then call at specified |
| 178 |
points during a matching operation. Details are given in the <b>pcrecallout</b> |
points during a matching operation. Details are given in the <b>pcrecallout</b> |
| 181 |
<br><a name="SEC3" href="#TOC1">MULTITHREADING</a><br> |
<br><a name="SEC3" href="#TOC1">MULTITHREADING</a><br> |
| 182 |
<P> |
<P> |
| 183 |
The PCRE functions can be used in multi-threading applications, with the |
The PCRE functions can be used in multi-threading applications, with the |
| 184 |
proviso that the memory management functions pointed to by <b>pcre_malloc</b> |
proviso that the memory management functions pointed to by <b>pcre_malloc</b>, |
| 185 |
and <b>pcre_free</b>, and the callout function pointed to by <b>pcre_callout</b>, |
<b>pcre_free</b>, <b>pcre_stack_malloc</b>, and <b>pcre_stack_free</b>, and the |
| 186 |
are shared by all threads. |
callout function pointed to by <b>pcre_callout</b>, are shared by all threads. |
| 187 |
</P> |
</P> |
| 188 |
<P> |
<P> |
| 189 |
The compiled form of a regular expression is not altered during matching, so |
The compiled form of a regular expression is not altered during matching, so |
| 255 |
internal matching function calls in a <b>pcre_exec()</b> execution. Further |
internal matching function calls in a <b>pcre_exec()</b> execution. Further |
| 256 |
details are given with <b>pcre_exec()</b> below. |
details are given with <b>pcre_exec()</b> below. |
| 257 |
</P> |
</P> |
| 258 |
|
<P> |
| 259 |
|
<pre> |
| 260 |
|
PCRE_CONFIG_STACKRECURSE |
| 261 |
|
</PRE> |
| 262 |
|
</P> |
| 263 |
|
<P> |
| 264 |
|
The output is an integer that is set to one if internal recursion is |
| 265 |
|
implemented by recursive function calls that use the stack to remember their |
| 266 |
|
state. This is the usual way that PCRE is compiled. The output is zero if PCRE |
| 267 |
|
was compiled to use blocks of data on the heap instead of recursive function |
| 268 |
|
calls. In this case, <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> are |
| 269 |
|
called to manage memory blocks on the heap, thus avoiding the use of the stack. |
| 270 |
|
</P> |
| 271 |
<br><a name="SEC5" href="#TOC1">COMPILING A PATTERN</a><br> |
<br><a name="SEC5" href="#TOC1">COMPILING A PATTERN</a><br> |
| 272 |
<P> |
<P> |
| 273 |
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b> |
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b> |
| 908 |
</P> |
</P> |
| 909 |
<P> |
<P> |
| 910 |
When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8 |
When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8 |
| 911 |
string is automatically checked. If an invalid UTF-8 sequence of bytes is |
string is automatically checked, and the value of <i>startoffset</i> is also |
| 912 |
found, <b>pcre_exec()</b> returns the error PCRE_ERROR_BADUTF8. If you already |
checked to ensure that it points to the start of a UTF-8 character. If an |
| 913 |
know that your subject is valid, and you want to skip this check for |
invalid UTF-8 sequence of bytes is found, <b>pcre_exec()</b> returns the error |
| 914 |
performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling |
PCRE_ERROR_BADUTF8. If <i>startoffset</i> contains an invalid value, |
| 915 |
<b>pcre_exec()</b>. When this option is set, the effect of passing an invalid |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
| 916 |
UTF-8 string as a subject is undefined. It may cause your program to crash. |
</P> |
| 917 |
|
<P> |
| 918 |
|
If you already know that your subject is valid, and you want to skip these |
| 919 |
|
checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when |
| 920 |
|
calling <b>pcre_exec()</b>. You might want to do this for the second and |
| 921 |
|
subsequent calls to <b>pcre_exec()</b> if you are making repeated calls to find |
| 922 |
|
all the matches in a single subject string. However, you should be sure that |
| 923 |
|
the value of <i>startoffset</i> points to the start of a UTF-8 character. When |
| 924 |
|
PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a |
| 925 |
|
subject, or a value of <i>startoffset</i> that does not point to the start of a |
| 926 |
|
UTF-8 character, is undefined. Your program may crash. |
| 927 |
</P> |
</P> |
| 928 |
<P> |
<P> |
| 929 |
There are also three further options that can be set only at matching time: |
There are also three further options that can be set only at matching time: |
| 979 |
</P> |
</P> |
| 980 |
<P> |
<P> |
| 981 |
The subject string is passed to <b>pcre_exec()</b> as a pointer in |
The subject string is passed to <b>pcre_exec()</b> as a pointer in |
| 982 |
<i>subject</i>, a length in <i>length</i>, and a starting offset in |
<i>subject</i>, a length in <i>length</i>, and a starting byte offset in |
| 983 |
<i>startoffset</i>. Unlike the pattern string, the subject may contain binary |
<i>startoffset</i>. Unlike the pattern string, the subject may contain binary |
| 984 |
zero bytes. When the starting offset is zero, the search for a match starts at |
zero bytes. When the starting offset is zero, the search for a match starts at |
| 985 |
the beginning of the subject, and this is by far the most common case. |
the beginning of the subject, and this is by far the most common case. |
| 986 |
</P> |
</P> |
| 987 |
<P> |
<P> |
| 988 |
If the pattern was compiled with the PCRE_UTF8 option, the subject must be a |
If the pattern was compiled with the PCRE_UTF8 option, the subject must be a |
| 989 |
sequence of bytes that is a valid UTF-8 string. If an invalid UTF-8 string is |
sequence of bytes that is a valid UTF-8 string, and the starting offset must |
| 990 |
passed, PCRE's behaviour is not defined. |
point to the beginning of a UTF-8 character. If an invalid UTF-8 string or |
| 991 |
|
offset is passed, an error (either PCRE_ERROR_BADUTF8 or |
| 992 |
|
PCRE_ERROR_BADUTF8_OFFSET) is returned, unless the option PCRE_NO_UTF8_CHECK is |
| 993 |
|
set, in which case PCRE's behaviour is not defined. |
| 994 |
</P> |
</P> |
| 995 |
<P> |
<P> |
| 996 |
A non-zero starting offset is useful when searching for another match in the |
A non-zero starting offset is useful when searching for another match in the |
| 1175 |
</P> |
</P> |
| 1176 |
<P> |
<P> |
| 1177 |
<pre> |
<pre> |
| 1178 |
PCRE_ERROR_BADUTF8 (-10) |
PCRE_ERROR_BADUTF8 (-10) |
| 1179 |
</PRE> |
</PRE> |
| 1180 |
</P> |
</P> |
| 1181 |
<P> |
<P> |
| 1182 |
A string that contains an invalid UTF-8 byte sequence was passed as a subject. |
A string that contains an invalid UTF-8 byte sequence was passed as a subject. |
| 1183 |
</P> |
</P> |
| 1184 |
|
<P> |
| 1185 |
|
<pre> |
| 1186 |
|
PCRE_ERROR_BADUTF8_OFFSET (-11) |
| 1187 |
|
</PRE> |
| 1188 |
|
</P> |
| 1189 |
|
<P> |
| 1190 |
|
The UTF-8 byte sequence that was passed as a subject was valid, but the value |
| 1191 |
|
of <i>startoffset</i> did not point to the beginning of a UTF-8 character. |
| 1192 |
|
</P> |
| 1193 |
<br><a name="SEC11" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br> |
<br><a name="SEC11" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br> |
| 1194 |
<P> |
<P> |
| 1195 |
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b> |
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b> |
| 1341 |
appropriate. |
appropriate. |
| 1342 |
</P> |
</P> |
| 1343 |
<P> |
<P> |
| 1344 |
Last updated: 20 August 2003 |
Last updated: 09 December 2003 |
| 1345 |
<br> |
<br> |
| 1346 |
Copyright © 1997-2003 University of Cambridge. |
Copyright © 1997-2003 University of Cambridge. |