| 1 |
.TH PCREAPI 3 |
.TH PCREAPI 3 |
| 2 |
.SH NAME |
.SH NAME |
| 3 |
PCRE - Perl-compatible regular expressions |
PCRE - Perl-compatible regular expressions |
| 4 |
.SH "PCRE NATIVE API" |
.SH "PCRE NATIVE API BASIC FUNCTIONS" |
| 5 |
.rs |
.rs |
| 6 |
.sp |
.sp |
| 7 |
.B #include <pcre.h> |
.B #include <pcre.h> |
| 25 |
.ti +5n |
.ti +5n |
| 26 |
.B const char **\fIerrptr\fP); |
.B const char **\fIerrptr\fP); |
| 27 |
.PP |
.PP |
| 28 |
|
.B void pcre_free_study(pcre_extra *\fIextra\fP); |
| 29 |
|
.PP |
| 30 |
.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," |
.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," |
| 31 |
.ti +5n |
.ti +5n |
| 32 |
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP, |
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP, |
| 33 |
.ti +5n |
.ti +5n |
| 34 |
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP); |
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP); |
| 35 |
|
. |
| 36 |
|
. |
| 37 |
|
.SH "PCRE NATIVE API AUXILIARY FUNCTIONS" |
| 38 |
|
.rs |
| 39 |
|
.sp |
| 40 |
|
.B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP); |
| 41 |
|
.PP |
| 42 |
|
.B void pcre_jit_stack_free(pcre_jit_stack *\fIstack\fP); |
| 43 |
|
.PP |
| 44 |
|
.B void pcre_assign_jit_stack(pcre_extra *\fIextra\fP, |
| 45 |
|
.ti +5n |
| 46 |
|
.B pcre_jit_callback \fIcallback\fP, void *\fIdata\fP); |
| 47 |
.PP |
.PP |
| 48 |
.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," |
.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," |
| 49 |
.ti +5n |
.ti +5n |
| 111 |
.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP); |
.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP); |
| 112 |
.PP |
.PP |
| 113 |
.B char *pcre_version(void); |
.B char *pcre_version(void); |
| 114 |
.PP |
. |
| 115 |
|
. |
| 116 |
|
.SH "PCRE NATIVE API INDIRECTED FUNCTIONS" |
| 117 |
|
.rs |
| 118 |
|
.sp |
| 119 |
.B void *(*pcre_malloc)(size_t); |
.B void *(*pcre_malloc)(size_t); |
| 120 |
.PP |
.PP |
| 121 |
.B void (*pcre_free)(void *); |
.B void (*pcre_free)(void *); |
| 132 |
.sp |
.sp |
| 133 |
PCRE has its own native API, which is described in this document. There are |
PCRE has its own native API, which is described in this document. There are |
| 134 |
also some wrapper functions that correspond to the POSIX regular expression |
also some wrapper functions that correspond to the POSIX regular expression |
| 135 |
API. These are described in the |
API, but they do not give access to all the functionality. They are described |
| 136 |
|
in the |
| 137 |
.\" HREF |
.\" HREF |
| 138 |
\fBpcreposix\fP |
\fBpcreposix\fP |
| 139 |
.\" |
.\" |
| 140 |
documentation. Both of these APIs define a set of C function calls. A C++ |
documentation. Both of these APIs define a set of C function calls. A C++ |
| 141 |
wrapper is distributed with PCRE. It is documented in the |
wrapper is also distributed with PCRE. It is documented in the |
| 142 |
.\" HREF |
.\" HREF |
| 143 |
\fBpcrecpp\fP |
\fBpcrecpp\fP |
| 144 |
.\" |
.\" |
| 171 |
.\" |
.\" |
| 172 |
documentation describes how to compile and run it. |
documentation describes how to compile and run it. |
| 173 |
.P |
.P |
| 174 |
|
Just-in-time compiler support is an optional feature of PCRE that can be built |
| 175 |
|
in appropriate hardware environments. It greatly speeds up the matching |
| 176 |
|
performance of many patterns. Simple programs can easily request that it be |
| 177 |
|
used if available, by setting an option that is ignored when it is not |
| 178 |
|
relevant. More complicated programs might need to make use of the functions |
| 179 |
|
\fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and |
| 180 |
|
\fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage. |
| 181 |
|
These functions are discussed in the |
| 182 |
|
.\" HREF |
| 183 |
|
\fBpcrejit\fP |
| 184 |
|
.\" |
| 185 |
|
documentation. |
| 186 |
|
.P |
| 187 |
A second matching function, \fBpcre_dfa_exec()\fP, which is not |
A second matching function, \fBpcre_dfa_exec()\fP, which is not |
| 188 |
Perl-compatible, is also provided. This uses a different algorithm for the |
Perl-compatible, is also provided. This uses a different algorithm for the |
| 189 |
matching. The alternative algorithm finds all possible matches (at a given |
matching. The alternative algorithm finds all possible matches (at a given |
| 314 |
.P |
.P |
| 315 |
The compiled form of a regular expression is not altered during matching, so |
The compiled form of a regular expression is not altered during matching, so |
| 316 |
the same compiled pattern can safely be used by several threads at once. |
the same compiled pattern can safely be used by several threads at once. |
| 317 |
|
.P |
| 318 |
|
If the just-in-time optimization feature is being used, it needs separate |
| 319 |
|
memory stack areas for each thread. See the |
| 320 |
|
.\" HREF |
| 321 |
|
\fBpcrejit\fP |
| 322 |
|
.\" |
| 323 |
|
documentation for more details. |
| 324 |
. |
. |
| 325 |
. |
. |
| 326 |
.SH "SAVING PRECOMPILED PATTERNS FOR LATER USE" |
.SH "SAVING PRECOMPILED PATTERNS FOR LATER USE" |
| 363 |
The output is an integer that is set to one if support for Unicode character |
The output is an integer that is set to one if support for Unicode character |
| 364 |
properties is available; otherwise it is set to zero. |
properties is available; otherwise it is set to zero. |
| 365 |
.sp |
.sp |
| 366 |
|
PCRE_CONFIG_JIT |
| 367 |
|
.sp |
| 368 |
|
The output is an integer that is set to one if support for just-in-time |
| 369 |
|
compiling is available; otherwise it is set to zero. |
| 370 |
|
.sp |
| 371 |
PCRE_CONFIG_NEWLINE |
PCRE_CONFIG_NEWLINE |
| 372 |
.sp |
.sp |
| 373 |
The output is an integer whose value specifies the default character sequence |
The output is an integer whose value specifies the default character sequence |
| 643 |
string (by default this causes the current matching alternative to fail). A |
string (by default this causes the current matching alternative to fail). A |
| 644 |
pattern such as (\e1)(a) succeeds when this option is set (assuming it can find |
pattern such as (\e1)(a) succeeds when this option is set (assuming it can find |
| 645 |
an "a" in the subject), whereas it fails by default, for Perl compatibility. |
an "a" in the subject), whereas it fails by default, for Perl compatibility. |
| 646 |
|
.P |
| 647 |
|
(3) \eU matches an upper case "U" character; by default \eU causes a compile |
| 648 |
|
time error (Perl uses \eU to upper case subsequent characters). |
| 649 |
|
.P |
| 650 |
|
(4) \eu matches a lower case "u" character unless it is followed by four |
| 651 |
|
hexadecimal digits, in which case the hexadecimal number defines the code point |
| 652 |
|
to match. By default, \eu causes a compile time error (Perl uses it to upper |
| 653 |
|
case the following character). |
| 654 |
|
.P |
| 655 |
|
(5) \ex matches a lower case "x" character unless it is followed by two |
| 656 |
|
hexadecimal digits, in which case the hexadecimal number defines the code point |
| 657 |
|
to match. By default, as in Perl, a hexadecimal number is always expected after |
| 658 |
|
\ex, but it may have zero, one, or two digits (so, for example, \exz matches a |
| 659 |
|
binary zero character followed by z). |
| 660 |
.sp |
.sp |
| 661 |
PCRE_MULTILINE |
PCRE_MULTILINE |
| 662 |
.sp |
.sp |
| 759 |
available only when PCRE is built to include UTF-8 support. If not, the use |
available only when PCRE is built to include UTF-8 support. If not, the use |
| 760 |
of this option provokes an error. Details of how this option changes the |
of this option provokes an error. Details of how this option changes the |
| 761 |
behaviour of PCRE are given in the |
behaviour of PCRE are given in the |
|
.\" HTML <a href="pcre.html#utf8support"> |
|
|
.\" </a> |
|
|
section on UTF-8 support |
|
|
.\" |
|
|
in the main |
|
| 762 |
.\" HREF |
.\" HREF |
| 763 |
\fBpcre\fP |
\fBpcreunicode\fP |
| 764 |
.\" |
.\" |
| 765 |
page. |
page. |
| 766 |
.sp |
.sp |
| 830 |
34 character value in \ex{...} sequence is too large |
34 character value in \ex{...} sequence is too large |
| 831 |
35 invalid condition (?(0) |
35 invalid condition (?(0) |
| 832 |
36 \eC not allowed in lookbehind assertion |
36 \eC not allowed in lookbehind assertion |
| 833 |
37 PCRE does not support \eL, \el, \eN, \eU, or \eu |
37 PCRE does not support \eL, \el, \eN{name}, \eU, or \eu |
| 834 |
38 number after (?C is > 255 |
38 number after (?C is > 255 |
| 835 |
39 closing ) for (?C expected |
39 closing ) for (?C expected |
| 836 |
40 recursive call could loop indefinitely |
40 recursive call could loop indefinitely |
| 864 |
not allowed |
not allowed |
| 865 |
66 (*MARK) must have an argument |
66 (*MARK) must have an argument |
| 866 |
67 this version of PCRE is not compiled with PCRE_UCP support |
67 this version of PCRE is not compiled with PCRE_UCP support |
| 867 |
|
68 \ec must be followed by an ASCII character |
| 868 |
|
69 \ek is not followed by a braced, angle-bracketed, or quoted name |
| 869 |
.sp |
.sp |
| 870 |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may |
| 871 |
be used if the limits were changed when PCRE was built. |
be used if the limits were changed when PCRE was built. |
| 872 |
. |
. |
| 873 |
. |
. |
| 874 |
|
.\" HTML <a name="studyingapattern"></a> |
| 875 |
.SH "STUDYING A PATTERN" |
.SH "STUDYING A PATTERN" |
| 876 |
.rs |
.rs |
| 877 |
.sp |
.sp |
| 902 |
wants to pass any of the other fields to \fBpcre_exec()\fP or |
wants to pass any of the other fields to \fBpcre_exec()\fP or |
| 903 |
\fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block. |
\fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block. |
| 904 |
.P |
.P |
| 905 |
The second argument of \fBpcre_study()\fP contains option bits. At present, no |
The second argument of \fBpcre_study()\fP contains option bits. There is only |
| 906 |
options are defined, and this argument should always be zero. |
one option: PCRE_STUDY_JIT_COMPILE. If this is set, and the just-in-time |
| 907 |
|
compiler is available, the pattern is further compiled into machine code that |
| 908 |
|
executes much faster than the \fBpcre_exec()\fP matching function. If |
| 909 |
|
the just-in-time compiler is not available, this option is ignored. All other |
| 910 |
|
bits in the \fIoptions\fP argument must be zero. |
| 911 |
|
.P |
| 912 |
|
JIT compilation is a heavyweight optimization. It can take some time for |
| 913 |
|
patterns to be analyzed, and for one-off matches and simple patterns the |
| 914 |
|
benefit of faster execution might be offset by a much slower study time. |
| 915 |
|
Not all patterns can be optimized by the JIT compiler. For those that cannot be |
| 916 |
|
handled, matching automatically falls back to the \fBpcre_exec()\fP |
| 917 |
|
interpreter. For more details, see the |
| 918 |
|
.\" HREF |
| 919 |
|
\fBpcrejit\fP |
| 920 |
|
.\" |
| 921 |
|
documentation. |
| 922 |
.P |
.P |
| 923 |
The third argument for \fBpcre_study()\fP is a pointer for an error message. If |
The third argument for \fBpcre_study()\fP is a pointer for an error message. If |
| 924 |
studying succeeds (even if no data is returned), the variable it points to is |
studying succeeds (even if no data is returned), the variable it points to is |
| 927 |
should test the error pointer for NULL after calling \fBpcre_study()\fP, to be |
should test the error pointer for NULL after calling \fBpcre_study()\fP, to be |
| 928 |
sure that it has run successfully. |
sure that it has run successfully. |
| 929 |
.P |
.P |
| 930 |
This is a typical call to \fBpcre_study\fP(): |
When you are finished with a pattern, you can free the memory used for the |
| 931 |
|
study data by calling \fBpcre_free_study()\fP. This function was added to the |
| 932 |
|
API for release 8.20. For earlier versions, the memory could be freed with |
| 933 |
|
\fBpcre_free()\fP, just like the pattern itself. This will still work in cases |
| 934 |
|
where PCRE_STUDY_JIT_COMPILE is not used, but it is advisable to change to the |
| 935 |
|
new function when convenient. |
| 936 |
|
.P |
| 937 |
|
This is a typical way in which \fBpcre_study\fP() is used (except that in a |
| 938 |
|
real application there should be tests for errors): |
| 939 |
.sp |
.sp |
| 940 |
pcre_extra *pe; |
int rc; |
| 941 |
pe = pcre_study( |
pcre *re; |
| 942 |
|
pcre_extra *sd; |
| 943 |
|
re = pcre_compile("pattern", 0, &error, &erroroffset, NULL); |
| 944 |
|
sd = pcre_study( |
| 945 |
re, /* result of pcre_compile() */ |
re, /* result of pcre_compile() */ |
| 946 |
0, /* no options exist */ |
0, /* no options */ |
| 947 |
&error); /* set to NULL or points to a message */ |
&error); /* set to NULL or points to a message */ |
| 948 |
|
rc = pcre_exec( /* see below for details of pcre_exec() options */ |
| 949 |
|
re, sd, "subject", 7, 0, 0, ovector, 30); |
| 950 |
|
... |
| 951 |
|
pcre_free_study(sd); |
| 952 |
|
pcre_free(re); |
| 953 |
.sp |
.sp |
| 954 |
Studying a pattern does two things: first, a lower bound for the length of |
Studying a pattern does two things: first, a lower bound for the length of |
| 955 |
subject string that is needed to match the pattern is computed. This does not |
subject string that is needed to match the pattern is computed. This does not |
| 964 |
created. This speeds up finding a position in the subject at which to start |
created. This speeds up finding a position in the subject at which to start |
| 965 |
matching. |
matching. |
| 966 |
.P |
.P |
| 967 |
The two optimizations just described can be disabled by setting the |
These two optimizations apply to both \fBpcre_exec()\fP and |
| 968 |
PCRE_NO_START_OPTIMIZE option when calling \fBpcre_exec()\fP or |
\fBpcre_dfa_exec()\fP. However, they are not used by \fBpcre_exec()\fP if |
| 969 |
|
\fBpcre_study()\fP is called with the PCRE_STUDY_JIT_COMPILE option, and |
| 970 |
|
just-in-time compiling is successful. The optimizations can be disabled by |
| 971 |
|
setting the PCRE_NO_START_OPTIMIZE option when calling \fBpcre_exec()\fP or |
| 972 |
\fBpcre_dfa_exec()\fP. You might want to do this if your pattern contains |
\fBpcre_dfa_exec()\fP. You might want to do this if your pattern contains |
| 973 |
callouts or (*MARK), and you want to make use of these facilities in cases |
callouts or (*MARK) (which cannot be handled by the JIT compiler), and you want |
| 974 |
where matching fails. See the discussion of PCRE_NO_START_OPTIMIZE |
to make use of these facilities in cases where matching fails. See the |
| 975 |
|
discussion of PCRE_NO_START_OPTIMIZE |
| 976 |
.\" HTML <a href="#execoptions"> |
.\" HTML <a href="#execoptions"> |
| 977 |
.\" </a> |
.\" </a> |
| 978 |
below. |
below. |
| 1069 |
size_t length; |
size_t length; |
| 1070 |
rc = pcre_fullinfo( |
rc = pcre_fullinfo( |
| 1071 |
re, /* result of pcre_compile() */ |
re, /* result of pcre_compile() */ |
| 1072 |
pe, /* result of pcre_study(), or NULL */ |
sd, /* result of pcre_study(), or NULL */ |
| 1073 |
PCRE_INFO_SIZE, /* what is required */ |
PCRE_INFO_SIZE, /* what is required */ |
| 1074 |
&length); /* where to put the data */ |
&length); /* where to put the data */ |
| 1075 |
.sp |
.sp |
| 1134 |
0. The fourth argument should point to an \fBint\fP variable. (?J) and |
0. The fourth argument should point to an \fBint\fP variable. (?J) and |
| 1135 |
(?-J) set and unset the local PCRE_DUPNAMES option, respectively. |
(?-J) set and unset the local PCRE_DUPNAMES option, respectively. |
| 1136 |
.sp |
.sp |
| 1137 |
|
PCRE_INFO_JIT |
| 1138 |
|
.sp |
| 1139 |
|
Return 1 if the pattern was studied with the PCRE_STUDY_JIT_COMPILE option, and |
| 1140 |
|
just-in-time compiling was successful. The fourth argument should point to an |
| 1141 |
|
\fBint\fP variable. A return value of 0 means that JIT support is not available |
| 1142 |
|
in this version of PCRE, or that the pattern was not studied with the |
| 1143 |
|
PCRE_STUDY_JIT_COMPILE option, or that the JIT compiler could not handle this |
| 1144 |
|
particular pattern. See the |
| 1145 |
|
.\" HREF |
| 1146 |
|
\fBpcrejit\fP |
| 1147 |
|
.\" |
| 1148 |
|
documentation for details of what can and cannot be handled. |
| 1149 |
|
.sp |
| 1150 |
PCRE_INFO_LASTLITERAL |
PCRE_INFO_LASTLITERAL |
| 1151 |
.sp |
.sp |
| 1152 |
Return the value of the rightmost literal byte that must exist in any matched |
Return the value of the rightmost literal byte that must exist in any matched |
| 1269 |
.sp |
.sp |
| 1270 |
PCRE_INFO_STUDYSIZE |
PCRE_INFO_STUDYSIZE |
| 1271 |
.sp |
.sp |
| 1272 |
Return the size of the data block pointed to by the \fIstudy_data\fP field in |
Return the size of the data block pointed to by the \fIstudy_data\fP field in a |
| 1273 |
a \fBpcre_extra\fP block. That is, it is the value that was passed to |
\fBpcre_extra\fP block. If \fBpcre_extra\fP is NULL, or there is no study data, |
| 1274 |
\fBpcre_malloc()\fP when PCRE was getting memory into which to place the data |
zero is returned. The fourth argument should point to a \fBsize_t\fP variable. |
| 1275 |
created by \fBpcre_study()\fP. If \fBpcre_extra\fP is NULL, or there is no |
The \fIstudy_data\fP field is set by \fBpcre_study()\fP to record information |
| 1276 |
study data, zero is returned. The fourth argument should point to a |
that will speed up matching (see the section entitled |
| 1277 |
\fBsize_t\fP variable. |
.\" HTML <a href="#studyingapattern"> |
| 1278 |
|
.\" </a> |
| 1279 |
|
"Studying a pattern" |
| 1280 |
|
.\" |
| 1281 |
|
above). The format of the \fIstudy_data\fP block is private, but its length |
| 1282 |
|
is made available via this option so that it can be saved and restored (see the |
| 1283 |
|
.\" HREF |
| 1284 |
|
\fBpcreprecompile\fP |
| 1285 |
|
.\" |
| 1286 |
|
documentation for details). |
| 1287 |
. |
. |
| 1288 |
. |
. |
| 1289 |
.SH "OBSOLETE INFO FUNCTION" |
.SH "OBSOLETE INFO FUNCTION" |
| 1345 |
The function \fBpcre_exec()\fP is called to match a subject string against a |
The function \fBpcre_exec()\fP is called to match a subject string against a |
| 1346 |
compiled pattern, which is passed in the \fIcode\fP argument. If the |
compiled pattern, which is passed in the \fIcode\fP argument. If the |
| 1347 |
pattern was studied, the result of the study should be passed in the |
pattern was studied, the result of the study should be passed in the |
| 1348 |
\fIextra\fP argument. This function is the main matching facility of the |
\fIextra\fP argument. You can call \fBpcre_exec()\fP with the same \fIcode\fP |
| 1349 |
library, and it operates in a Perl-like manner. For specialist use there is |
and \fIextra\fP arguments as many times as you like, in order to match |
| 1350 |
also an alternative matching function, which is described |
different subject strings with the same pattern. |
| 1351 |
|
.P |
| 1352 |
|
This function is the main matching facility of the library, and it operates in |
| 1353 |
|
a Perl-like manner. For specialist use there is also an alternative matching |
| 1354 |
|
function, which is described |
| 1355 |
.\" HTML <a href="#dfamatch"> |
.\" HTML <a href="#dfamatch"> |
| 1356 |
.\" </a> |
.\" </a> |
| 1357 |
below |
below |
| 1382 |
ovector, /* vector of integers for substring information */ |
ovector, /* vector of integers for substring information */ |
| 1383 |
30); /* number of elements (NOT size in bytes) */ |
30); /* number of elements (NOT size in bytes) */ |
| 1384 |
. |
. |
| 1385 |
|
. |
| 1386 |
.\" HTML <a name="extradata"></a> |
.\" HTML <a name="extradata"></a> |
| 1387 |
.SS "Extra data for \fBpcre_exec()\fR" |
.SS "Extra data for \fBpcre_exec()\fR" |
| 1388 |
.rs |
.rs |
| 1395 |
.sp |
.sp |
| 1396 |
unsigned long int \fIflags\fP; |
unsigned long int \fIflags\fP; |
| 1397 |
void *\fIstudy_data\fP; |
void *\fIstudy_data\fP; |
| 1398 |
|
void *\fIexecutable_jit\fP; |
| 1399 |
unsigned long int \fImatch_limit\fP; |
unsigned long int \fImatch_limit\fP; |
| 1400 |
unsigned long int \fImatch_limit_recursion\fP; |
unsigned long int \fImatch_limit_recursion\fP; |
| 1401 |
void *\fIcallout_data\fP; |
void *\fIcallout_data\fP; |
| 1406 |
are set. The flag bits are: |
are set. The flag bits are: |
| 1407 |
.sp |
.sp |
| 1408 |
PCRE_EXTRA_STUDY_DATA |
PCRE_EXTRA_STUDY_DATA |
| 1409 |
|
PCRE_EXTRA_EXECUTABLE_JIT |
| 1410 |
PCRE_EXTRA_MATCH_LIMIT |
PCRE_EXTRA_MATCH_LIMIT |
| 1411 |
PCRE_EXTRA_MATCH_LIMIT_RECURSION |
PCRE_EXTRA_MATCH_LIMIT_RECURSION |
| 1412 |
PCRE_EXTRA_CALLOUT_DATA |
PCRE_EXTRA_CALLOUT_DATA |
| 1413 |
PCRE_EXTRA_TABLES |
PCRE_EXTRA_TABLES |
| 1414 |
PCRE_EXTRA_MARK |
PCRE_EXTRA_MARK |
| 1415 |
.sp |
.sp |
| 1416 |
Other flag bits should be set to zero. The \fIstudy_data\fP field is set in the |
Other flag bits should be set to zero. The \fIstudy_data\fP field and sometimes |
| 1417 |
\fBpcre_extra\fP block that is returned by \fBpcre_study()\fP, together with |
the \fIexecutable_jit\fP field are set in the \fBpcre_extra\fP block that is |
| 1418 |
the appropriate flag bit. You should not set this yourself, but you may add to |
returned by \fBpcre_study()\fP, together with the appropriate flag bits. You |
| 1419 |
the block by setting the other fields and their corresponding flag bits. |
should not set these yourself, but you may add to the block by setting the |
| 1420 |
|
other fields and their corresponding flag bits. |
| 1421 |
.P |
.P |
| 1422 |
The \fImatch_limit\fP field provides a means of preventing PCRE from using up a |
The \fImatch_limit\fP field provides a means of preventing PCRE from using up a |
| 1423 |
vast amount of resources when running patterns that are not going to match, |
vast amount of resources when running patterns that are not going to match, |
| 1424 |
but which have a very large number of possibilities in their search trees. The |
but which have a very large number of possibilities in their search trees. The |
| 1425 |
classic example is a pattern that uses nested unlimited repeats. |
classic example is a pattern that uses nested unlimited repeats. |
| 1426 |
.P |
.P |
| 1427 |
Internally, PCRE uses a function called \fBmatch()\fP which it calls repeatedly |
Internally, \fBpcre_exec()\fP uses a function called \fBmatch()\fP, which it |
| 1428 |
(sometimes recursively). The limit set by \fImatch_limit\fP is imposed on the |
calls repeatedly (sometimes recursively). The limit set by \fImatch_limit\fP is |
| 1429 |
number of times this function is called during a match, which has the effect of |
imposed on the number of times this function is called during a match, which |
| 1430 |
limiting the amount of backtracking that can take place. For patterns that are |
has the effect of limiting the amount of backtracking that can take place. For |
| 1431 |
not anchored, the count restarts from zero for each position in the subject |
patterns that are not anchored, the count restarts from zero for each position |
| 1432 |
string. |
in the subject string. |
| 1433 |
|
.P |
| 1434 |
|
When \fBpcre_exec()\fP is called with a pattern that was successfully studied |
| 1435 |
|
with the PCRE_STUDY_JIT_COMPILE option, the way that the matching is executed |
| 1436 |
|
is entirely different. However, there is still the possibility of runaway |
| 1437 |
|
matching that goes on for a very long time, and so the \fImatch_limit\fP value |
| 1438 |
|
is also used in this case (but in a different way) to limit how long the |
| 1439 |
|
matching can continue. |
| 1440 |
.P |
.P |
| 1441 |
The default value for the limit can be set when PCRE is built; the default |
The default value for the limit can be set when PCRE is built; the default |
| 1442 |
default is 10 million, which handles all but the most extreme cases. You can |
default is 10 million, which handles all but the most extreme cases. You can |
| 1451 |
total number of calls, because not all calls to \fBmatch()\fP are recursive. |
total number of calls, because not all calls to \fBmatch()\fP are recursive. |
| 1452 |
This limit is of use only if it is set smaller than \fImatch_limit\fP. |
This limit is of use only if it is set smaller than \fImatch_limit\fP. |
| 1453 |
.P |
.P |
| 1454 |
Limiting the recursion depth limits the amount of stack that can be used, or, |
Limiting the recursion depth limits the amount of machine stack that can be |
| 1455 |
when PCRE has been compiled to use memory on the heap instead of the stack, the |
used, or, when PCRE has been compiled to use memory on the heap instead of the |
| 1456 |
amount of heap memory that can be used. |
stack, the amount of heap memory that can be used. This limit is not relevant, |
| 1457 |
|
and is ignored, if the pattern was successfully studied with |
| 1458 |
|
PCRE_STUDY_JIT_COMPILE. |
| 1459 |
.P |
.P |
| 1460 |
The default value for \fImatch_limit_recursion\fP can be set when PCRE is |
The default value for \fImatch_limit_recursion\fP can be set when PCRE is |
| 1461 |
built; the default default is the same value as the default for |
built; the default default is the same value as the default for |
| 1514 |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, |
| 1515 |
PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and |
PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and |
| 1516 |
PCRE_PARTIAL_HARD. |
PCRE_PARTIAL_HARD. |
| 1517 |
|
.P |
| 1518 |
|
If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option, |
| 1519 |
|
the only supported options for JIT execution are PCRE_NO_UTF8_CHECK, |
| 1520 |
|
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NOTEMPTY_ATSTART. Note in |
| 1521 |
|
particular that partial matching is not supported. If an unsupported option is |
| 1522 |
|
used, JIT execution is disabled and the normal interpretive code in |
| 1523 |
|
\fBpcre_exec()\fP is run. |
| 1524 |
.sp |
.sp |
| 1525 |
PCRE_ANCHORED |
PCRE_ANCHORED |
| 1526 |
.sp |
.sp |
| 1685 |
.\" |
.\" |
| 1686 |
page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns |
page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns |
| 1687 |
the error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is |
the error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is |
| 1688 |
a truncated UTF-8 character at the end of the subject, PCRE_ERROR_SHORTUTF8. In |
a truncated UTF-8 character at the end of the subject, PCRE_ERROR_SHORTUTF8. In |
| 1689 |
both cases, information about the precise nature of the error may also be |
both cases, information about the precise nature of the error may also be |
| 1690 |
returned (see the descriptions of these errors in the section entitled \fIError |
returned (see the descriptions of these errors in the section entitled \fIError |
| 1691 |
return values from\fP \fBpcre_exec()\fP |
return values from\fP \fBpcre_exec()\fP |
| 1827 |
.P |
.P |
| 1828 |
If the vector is too small to hold all the captured substring offsets, it is |
If the vector is too small to hold all the captured substring offsets, it is |
| 1829 |
used as far as possible (up to two-thirds of its length), and the function |
used as far as possible (up to two-thirds of its length), and the function |
| 1830 |
returns a value of zero. If the substring offsets are not of interest, |
returns a value of zero. If neither the actual string matched not any captured |
| 1831 |
\fBpcre_exec()\fP may be called with \fIovector\fP passed as NULL and |
substrings are of interest, \fBpcre_exec()\fP may be called with \fIovector\fP |
| 1832 |
\fIovecsize\fP as zero. However, if the pattern contains back references and |
passed as NULL and \fIovecsize\fP as zero. However, if the pattern contains |
| 1833 |
the \fIovector\fP is not big enough to remember the related substrings, PCRE |
back references and the \fIovector\fP is not big enough to remember the related |
| 1834 |
has to get additional memory for use during matching. Thus it is usually |
substrings, PCRE has to get additional memory for use during matching. Thus it |
| 1835 |
advisable to supply an \fIovector\fP. |
is usually advisable to supply an \fIovector\fP of reasonable size. |
| 1836 |
|
.P |
| 1837 |
|
There are some cases where zero is returned (indicating vector overflow) when |
| 1838 |
|
in fact the vector is exactly the right size for the final match. For example, |
| 1839 |
|
consider the pattern |
| 1840 |
|
.sp |
| 1841 |
|
(a)(?:(b)c|bd) |
| 1842 |
|
.sp |
| 1843 |
|
If a vector of 6 elements (allowing for only 1 captured substring) is given |
| 1844 |
|
with subject string "abd", \fBpcre_exec()\fP will try to set the second |
| 1845 |
|
captured string, thereby recording a vector overflow, before failing to match |
| 1846 |
|
"c" and backing up to try the second alternative. The zero return, however, |
| 1847 |
|
does correctly indicate that the maximum number of slots (namely 2) have been |
| 1848 |
|
filled. In similar cases where there is temporary overflow, but the final |
| 1849 |
|
number of used slots is actually less than the maximum, a non-zero value is |
| 1850 |
|
returned. |
| 1851 |
.P |
.P |
| 1852 |
The \fBpcre_fullinfo()\fP function can be used to find out how many capturing |
The \fBpcre_fullinfo()\fP function can be used to find out how many capturing |
| 1853 |
subpatterns there are in a compiled pattern. The smallest size for |
subpatterns there are in a compiled pattern. The smallest size for |
| 1868 |
number is 1, and the offsets for for the second and third capturing subpatterns |
number is 1, and the offsets for for the second and third capturing subpatterns |
| 1869 |
(assuming the vector is large enough, of course) are set to -1. |
(assuming the vector is large enough, of course) are set to -1. |
| 1870 |
.P |
.P |
| 1871 |
\fBNote\fP: Elements of \fIovector\fP that do not correspond to capturing |
\fBNote\fP: Elements in the first two-thirds of \fIovector\fP that do not |
| 1872 |
parentheses in the pattern are never changed. That is, if a pattern contains |
correspond to capturing parentheses in the pattern are never changed. That is, |
| 1873 |
\fIn\fP capturing parentheses, no more than \fIovector[0]\fP to |
if a pattern contains \fIn\fP capturing parentheses, no more than |
| 1874 |
\fIovector[2n+1]\fP are set by \fBpcre_exec()\fP. The other elements retain |
\fIovector[0]\fP to \fIovector[2n+1]\fP are set by \fBpcre_exec()\fP. The other |
| 1875 |
whatever values they previously had. |
elements (in the first two-thirds) retain whatever values they previously had. |
| 1876 |
.P |
.P |
| 1877 |
Some convenience functions are provided for extracting the captured substrings |
Some convenience functions are provided for extracting the captured substrings |
| 1878 |
as separate strings. These are described below. |
as separate strings. These are described below. |
| 1962 |
.sp |
.sp |
| 1963 |
PCRE_ERROR_BADUTF8_OFFSET (-11) |
PCRE_ERROR_BADUTF8_OFFSET (-11) |
| 1964 |
.sp |
.sp |
| 1965 |
The UTF-8 byte sequence that was passed as a subject was checked and found to |
The UTF-8 byte sequence that was passed as a subject was checked and found to |
| 1966 |
be valid (the PCRE_NO_UTF8_CHECK option was not set), but the value of |
be valid (the PCRE_NO_UTF8_CHECK option was not set), but the value of |
| 1967 |
\fIstartoffset\fP did not point to the beginning of a UTF-8 character or the |
\fIstartoffset\fP did not point to the beginning of a UTF-8 character or the |
| 1968 |
end of the subject. |
end of the subject. |
| 2017 |
.sp |
.sp |
| 2018 |
PCRE_ERROR_RECURSELOOP (-26) |
PCRE_ERROR_RECURSELOOP (-26) |
| 2019 |
.sp |
.sp |
| 2020 |
This error is returned when \fBpcre_exec()\fP detects a recursion loop within |
This error is returned when \fBpcre_exec()\fP detects a recursion loop within |
| 2021 |
the pattern. Specifically, it means that either the whole pattern or a |
the pattern. Specifically, it means that either the whole pattern or a |
| 2022 |
subpattern has been called recursively for the second time at the same position |
subpattern has been called recursively for the second time at the same position |
| 2023 |
in the subject string. Some simple patterns that might do this are detected and |
in the subject string. Some simple patterns that might do this are detected and |
| 2024 |
faulted at compile time, but more complicated cases, in particular mutual |
faulted at compile time, but more complicated cases, in particular mutual |
| 2025 |
recursions between two different subpatterns, cannot be detected until run |
recursions between two different subpatterns, cannot be detected until run |
| 2026 |
time. |
time. |
| 2027 |
|
.sp |
| 2028 |
|
PCRE_ERROR_JIT_STACKLIMIT (-27) |
| 2029 |
|
.sp |
| 2030 |
|
This error is returned when a pattern that was successfully studied using the |
| 2031 |
|
PCRE_STUDY_JIT_COMPILE option is being matched, but the memory available for |
| 2032 |
|
the just-in-time processing stack is not large enough. See the |
| 2033 |
|
.\" HREF |
| 2034 |
|
\fBpcrejit\fP |
| 2035 |
|
.\" |
| 2036 |
|
documentation for more details. |
| 2037 |
.P |
.P |
| 2038 |
Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP. |
Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP. |
| 2039 |
. |
. |
| 2042 |
.SS "Reason codes for invalid UTF-8 strings" |
.SS "Reason codes for invalid UTF-8 strings" |
| 2043 |
.rs |
.rs |
| 2044 |
.sp |
.sp |
| 2045 |
When \fBpcre_exec()\fP returns either PCRE_ERROR_BADUTF8 or |
When \fBpcre_exec()\fP returns either PCRE_ERROR_BADUTF8 or |
| 2046 |
PCRE_ERROR_SHORTUTF8, and the size of the output vector (\fIovecsize\fP) is at |
PCRE_ERROR_SHORTUTF8, and the size of the output vector (\fIovecsize\fP) is at |
| 2047 |
least 2, the offset of the start of the invalid UTF-8 character is placed in |
least 2, the offset of the start of the invalid UTF-8 character is placed in |
| 2048 |
the first output vector element (\fIovector[0]\fP) and a reason code is placed |
the first output vector element (\fIovector[0]\fP) and a reason code is placed |
| 2049 |
in the second element (\fIovector[1]\fP). The reason codes are given names in |
in the second element (\fIovector[1]\fP). The reason codes are given names in |
| 2050 |
the \fBpcre.h\fP header file: |
the \fBpcre.h\fP header file: |
| 2051 |
.sp |
.sp |
| 2055 |
PCRE_UTF8_ERR4 |
PCRE_UTF8_ERR4 |
| 2056 |
PCRE_UTF8_ERR5 |
PCRE_UTF8_ERR5 |
| 2057 |
.sp |
.sp |
| 2058 |
The string ends with a truncated UTF-8 character; the code specifies how many |
The string ends with a truncated UTF-8 character; the code specifies how many |
| 2059 |
bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be |
bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be |
| 2060 |
no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279) |
no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279) |
| 2061 |
allows for up to 6 bytes, and this is checked first; hence the possibility of |
allows for up to 6 bytes, and this is checked first; hence the possibility of |
| 2062 |
4 or 5 missing bytes. |
4 or 5 missing bytes. |
| 2063 |
.sp |
.sp |
| 2064 |
PCRE_UTF8_ERR6 |
PCRE_UTF8_ERR6 |
| 2067 |
PCRE_UTF8_ERR9 |
PCRE_UTF8_ERR9 |
| 2068 |
PCRE_UTF8_ERR10 |
PCRE_UTF8_ERR10 |
| 2069 |
.sp |
.sp |
| 2070 |
The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the |
The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the |
| 2071 |
character do not have the binary value 0b10 (that is, either the most |
character do not have the binary value 0b10 (that is, either the most |
| 2072 |
significant bit is 0, or the next bit is 1). |
significant bit is 0, or the next bit is 1). |
| 2073 |
.sp |
.sp |
| 2074 |
PCRE_UTF8_ERR11 |
PCRE_UTF8_ERR11 |
| 2075 |
PCRE_UTF8_ERR12 |
PCRE_UTF8_ERR12 |
| 2076 |
.sp |
.sp |
| 2077 |
A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long; |
A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long; |
| 2078 |
these code points are excluded by RFC 3629. |
these code points are excluded by RFC 3629. |
| 2079 |
.sp |
.sp |
| 2080 |
PCRE_UTF8_ERR13 |
PCRE_UTF8_ERR13 |
| 2081 |
.sp |
.sp |
| 2082 |
A 4-byte character has a value greater than 0x10fff; these code points are |
A 4-byte character has a value greater than 0x10fff; these code points are |
| 2083 |
excluded by RFC 3629. |
excluded by RFC 3629. |
| 2084 |
.sp |
.sp |
| 2085 |
PCRE_UTF8_ERR14 |
PCRE_UTF8_ERR14 |
| 2086 |
.sp |
.sp |
| 2087 |
A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of |
A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of |
| 2088 |
code points are reserved by RFC 3629 for use with UTF-16, and so are excluded |
code points are reserved by RFC 3629 for use with UTF-16, and so are excluded |
| 2089 |
from UTF-8. |
from UTF-8. |
| 2090 |
.sp |
.sp |
| 2091 |
PCRE_UTF8_ERR15 |
PCRE_UTF8_ERR15 |
| 2092 |
PCRE_UTF8_ERR16 |
PCRE_UTF8_ERR16 |
| 2093 |
PCRE_UTF8_ERR17 |
PCRE_UTF8_ERR17 |
| 2094 |
PCRE_UTF8_ERR18 |
PCRE_UTF8_ERR18 |
| 2095 |
PCRE_UTF8_ERR19 |
PCRE_UTF8_ERR19 |
| 2096 |
.sp |
.sp |
| 2097 |
A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a |
A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a |
| 2098 |
value that can be represented by fewer bytes, which is invalid. For example, |
value that can be represented by fewer bytes, which is invalid. For example, |
| 2099 |
the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just |
the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just |
| 2100 |
one byte. |
one byte. |
| 2101 |
.sp |
.sp |
| 2102 |
PCRE_UTF8_ERR20 |
PCRE_UTF8_ERR20 |
| 2103 |
.sp |
.sp |
| 2104 |
The two most significant bits of the first byte of a character have the binary |
The two most significant bits of the first byte of a character have the binary |
| 2105 |
value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a |
value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a |
| 2106 |
byte can only validly occur as the second or subsequent byte of a multi-byte |
byte can only validly occur as the second or subsequent byte of a multi-byte |
| 2107 |
character. |
character. |
| 2108 |
.sp |
.sp |
| 2274 |
numbers. For this reason, the use of different names for subpatterns of the |
numbers. For this reason, the use of different names for subpatterns of the |
| 2275 |
same number causes an error at compile time. |
same number causes an error at compile time. |
| 2276 |
. |
. |
| 2277 |
|
. |
| 2278 |
.SH "DUPLICATE SUBPATTERN NAMES" |
.SH "DUPLICATE SUBPATTERN NAMES" |
| 2279 |
.rs |
.rs |
| 2280 |
.sp |
.sp |
| 2474 |
The strings are returned in reverse order of length; that is, the longest |
The strings are returned in reverse order of length; that is, the longest |
| 2475 |
matching string is given first. If there were too many matches to fit into |
matching string is given first. If there were too many matches to fit into |
| 2476 |
\fIovector\fP, the yield of the function is zero, and the vector is filled with |
\fIovector\fP, the yield of the function is zero, and the vector is filled with |
| 2477 |
the longest matches. |
the longest matches. Unlike \fBpcre_exec()\fP, \fBpcre_dfa_exec()\fP can use |
| 2478 |
|
the entire \fIovector\fP for returning matched strings. |
| 2479 |
. |
. |
| 2480 |
. |
. |
| 2481 |
.SS "Error returns from \fBpcre_dfa_exec()\fP" |
.SS "Error returns from \fBpcre_dfa_exec()\fP" |
| 2505 |
PCRE_ERROR_DFA_UMLIMIT (-18) |
PCRE_ERROR_DFA_UMLIMIT (-18) |
| 2506 |
.sp |
.sp |
| 2507 |
This return is given if \fBpcre_dfa_exec()\fP is called with an \fIextra\fP |
This return is given if \fBpcre_dfa_exec()\fP is called with an \fIextra\fP |
| 2508 |
block that contains a setting of the \fImatch_limit\fP field. This is not |
block that contains a setting of the \fImatch_limit\fP or |
| 2509 |
supported (it is meaningless). |
\fImatch_limit_recursion\fP fields. This is not supported (these fields are |
| 2510 |
|
meaningless for DFA matching). |
| 2511 |
.sp |
.sp |
| 2512 |
PCRE_ERROR_DFA_WSSIZE (-19) |
PCRE_ERROR_DFA_WSSIZE (-19) |
| 2513 |
.sp |
.sp |
| 2544 |
.rs |
.rs |
| 2545 |
.sp |
.sp |
| 2546 |
.nf |
.nf |
| 2547 |
Last updated: 28 July 2011 |
Last updated: 14 November 2011 |
| 2548 |
Copyright (c) 1997-2011 University of Cambridge. |
Copyright (c) 1997-2011 University of Cambridge. |
| 2549 |
.fi |
.fi |