| 148 |
start with \fBpcre16_\fP instead of \fBpcre_\fP. For every option that has UTF8 |
start with \fBpcre16_\fP instead of \fBpcre_\fP. For every option that has UTF8 |
| 149 |
in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with |
in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with |
| 150 |
UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit |
UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit |
| 151 |
option names define the same bit values. |
option names define the same bit values. |
| 152 |
.P |
.P |
| 153 |
References to bytes and UTF-8 in this document should be read as references to |
References to bytes and UTF-8 in this document should be read as references to |
| 154 |
16-bit data quantities and UTF-16 when using the 16-bit library, unless |
16-bit data quantities and UTF-16 when using the 16-bit library, unless |
| 157 |
.\" HREF |
.\" HREF |
| 158 |
\fBpcre16\fP |
\fBpcre16\fP |
| 159 |
.\" |
.\" |
| 160 |
page. |
page. |
| 161 |
. |
. |
| 162 |
. |
. |
| 163 |
.SH "PCRE API OVERVIEW" |
.SH "PCRE API OVERVIEW" |
| 392 |
PCRE_CONFIG_UTF8 |
PCRE_CONFIG_UTF8 |
| 393 |
.sp |
.sp |
| 394 |
The output is an integer that is set to one if UTF-8 support is available; |
The output is an integer that is set to one if UTF-8 support is available; |
| 395 |
otherwise it is set to zero. If this option is given to the 16-bit version of |
otherwise it is set to zero. If this option is given to the 16-bit version of |
| 396 |
this function, \fBpcre16_config()\fP, the result is PCRE_ERROR_BADOPTION. |
this function, \fBpcre16_config()\fP, the result is PCRE_ERROR_BADOPTION. |
| 397 |
.sp |
.sp |
| 398 |
PCRE_CONFIG_UTF16 |
PCRE_CONFIG_UTF16 |
| 415 |
PCRE_CONFIG_JITTARGET |
PCRE_CONFIG_JITTARGET |
| 416 |
.sp |
.sp |
| 417 |
The output is a pointer to a zero-terminated "const char *" string. If JIT |
The output is a pointer to a zero-terminated "const char *" string. If JIT |
| 418 |
support is available, the string contains the name of the architecture for |
support is available, the string contains the name of the architecture for |
| 419 |
which the JIT compiler is configured, for example "x86 32bit (little endian + |
which the JIT compiler is configured, for example "x86 32bit (little endian + |
| 420 |
unaligned)". If JIT support is not available, the result is NULL. |
unaligned)". If JIT support is not available, the result is NULL. |
| 421 |
.sp |
.sp |
| 422 |
PCRE_CONFIG_NEWLINE |
PCRE_CONFIG_NEWLINE |
| 742 |
that any Unicode newline sequence should be recognized. The Unicode newline |
that any Unicode newline sequence should be recognized. The Unicode newline |
| 743 |
sequences are the three just mentioned, plus the single characters VT (vertical |
sequences are the three just mentioned, plus the single characters VT (vertical |
| 744 |
tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line |
tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line |
| 745 |
separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit |
separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit |
| 746 |
library, the last two are recognized only in UTF-8 mode. |
library, the last two are recognized only in UTF-8 mode. |
| 747 |
.P |
.P |
| 748 |
The newline setting in the options word uses three bits that are treated |
The newline setting in the options word uses three bits that are treated |
| 819 |
.sp |
.sp |
| 820 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
| 821 |
.sp |
.sp |
| 822 |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 |
| 823 |
string is automatically checked. There is a discussion about the |
string is automatically checked. There is a discussion about the |
| 824 |
.\" HTML <a href="pcreunicode.html#utf8strings"> |
.\" HTML <a href="pcreunicode.html#utf8strings"> |
| 825 |
.\" </a> |
.\" </a> |
| 826 |
validity of UTF-8 strings |
validity of UTF-8 strings |
| 827 |
.\" |
.\" |
| 828 |
in the |
in the |
| 829 |
.\" HREF |
.\" HREF |
| 843 |
.sp |
.sp |
| 844 |
The following table lists the error codes than may be returned by |
The following table lists the error codes than may be returned by |
| 845 |
\fBpcre_compile2()\fP, along with the error messages that may be returned by |
\fBpcre_compile2()\fP, along with the error messages that may be returned by |
| 846 |
both compiling functions. Note that error messages are always 8-bit ASCII |
both compiling functions. Note that error messages are always 8-bit ASCII |
| 847 |
strings, even in 16-bit mode. As PCRE has developed, some error codes have |
strings, even in 16-bit mode. As PCRE has developed, some error codes have |
| 848 |
fallen out of use. To avoid confusion, they have not been re-used. |
fallen out of use. To avoid confusion, they have not been re-used. |
| 849 |
.sp |
.sp |
| 917 |
65 different names for subpatterns of the same number are |
65 different names for subpatterns of the same number are |
| 918 |
not allowed |
not allowed |
| 919 |
66 (*MARK) must have an argument |
66 (*MARK) must have an argument |
| 920 |
67 this version of PCRE is not compiled with Unicode property |
67 this version of PCRE is not compiled with Unicode property |
| 921 |
support |
support |
| 922 |
68 \ec must be followed by an ASCII character |
68 \ec must be followed by an ASCII character |
| 923 |
69 \ek is not followed by a braced, angle-bracketed, or quoted name |
69 \ek is not followed by a braced, angle-bracketed, or quoted name |
| 924 |
70 internal error: unknown opcode in find_fixedlength() |
70 internal error: unknown opcode in find_fixedlength() |
| 925 |
71 \eN is not supported in a class |
71 \eN is not supported in a class |
| 926 |
72 too many forward references |
72 too many forward references |
| 927 |
73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff) |
73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff) |
| 928 |
74 invalid UTF-16 string (specifically UTF-16) |
74 invalid UTF-16 string (specifically UTF-16) |
| 929 |
.sp |
.sp |
| 930 |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may |
| 1120 |
PCRE_ERROR_NULL the argument \fIcode\fP was NULL |
PCRE_ERROR_NULL the argument \fIcode\fP was NULL |
| 1121 |
the argument \fIwhere\fP was NULL |
the argument \fIwhere\fP was NULL |
| 1122 |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
| 1123 |
PCRE_ERROR_BADENDIANNESS the pattern was compiled with different |
PCRE_ERROR_BADENDIANNESS the pattern was compiled with different |
| 1124 |
endianness |
endianness |
| 1125 |
PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid |
PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid |
| 1126 |
.sp |
.sp |
| 1127 |
The "magic number" is placed at the start of each compiled pattern as an simple |
The "magic number" is placed at the start of each compiled pattern as an simple |
| 1128 |
check against passing an arbitrary memory pointer. The endianness error can |
check against passing an arbitrary memory pointer. The endianness error can |
| 1129 |
occur if a compiled pattern is saved and reloaded on a different host. Here is |
occur if a compiled pattern is saved and reloaded on a different host. Here is |
| 1130 |
a typical call of \fBpcre_fullinfo()\fP, to obtain the length of the compiled |
a typical call of \fBpcre_fullinfo()\fP, to obtain the length of the compiled |
| 1131 |
pattern: |
pattern: |
| 1168 |
variable. |
variable. |
| 1169 |
.P |
.P |
| 1170 |
If there is a fixed first value, for example, the letter "c" from a pattern |
If there is a fixed first value, for example, the letter "c" from a pattern |
| 1171 |
such as (cat|cow|coyote), its value is returned. In the 8-bit library, the |
such as (cat|cow|coyote), its value is returned. In the 8-bit library, the |
| 1172 |
value is always less than 256; in the 16-bit library the value can be up to |
value is always less than 256; in the 16-bit library the value can be up to |
| 1173 |
0xffff. |
0xffff. |
| 1174 |
.P |
.P |
| 1175 |
If there is no fixed first value, and if either |
If there is no fixed first value, and if either |
| 1459 |
const unsigned char *\fItables\fP; |
const unsigned char *\fItables\fP; |
| 1460 |
unsigned char **\fImark\fP; |
unsigned char **\fImark\fP; |
| 1461 |
.sp |
.sp |
| 1462 |
In the 16-bit version of this structure, the \fImark\fP field has type |
In the 16-bit version of this structure, the \fImark\fP field has type |
| 1463 |
"PCRE_UCHAR16 **". |
"PCRE_UCHAR16 **". |
| 1464 |
.P |
.P |
| 1465 |
The \fIflags\fP field is a bitmap that specifies which of the other fields |
The \fIflags\fP field is a bitmap that specifies which of the other fields |
| 2092 |
.sp |
.sp |
| 2093 |
PCRE_ERROR_BADMODE (-28) |
PCRE_ERROR_BADMODE (-28) |
| 2094 |
.sp |
.sp |
| 2095 |
This error is given if a pattern that was compiled by the 8-bit library is |
This error is given if a pattern that was compiled by the 8-bit library is |
| 2096 |
passed to a 16-bit library function, or vice versa. |
passed to a 16-bit library function, or vice versa. |
| 2097 |
.sp |
.sp |
| 2098 |
PCRE_ERROR_BADENDIANNESS (-29) |
PCRE_ERROR_BADENDIANNESS (-29) |
| 2099 |
.sp |
.sp |
| 2100 |
This error is given if a pattern that was compiled and saved is reloaded on a |
This error is given if a pattern that was compiled and saved is reloaded on a |
| 2101 |
host with different endianness. The utility function |
host with different endianness. The utility function |
| 2102 |
\fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern |
\fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern |
| 2103 |
so that it runs on the new host. |
so that it runs on the new host. |
| 2104 |
.P |
.P |
| 2105 |
Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP. |
Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP. |
| 2109 |
.SS "Reason codes for invalid UTF-8 strings" |
.SS "Reason codes for invalid UTF-8 strings" |
| 2110 |
.rs |
.rs |
| 2111 |
.sp |
.sp |
| 2112 |
This section applies only to the 8-bit library. The corresponding information |
This section applies only to the 8-bit library. The corresponding information |
| 2113 |
for the 16-bit library is given in the |
for the 16-bit library is given in the |
| 2114 |
.\" HREF |
.\" HREF |
| 2115 |
\fBpcre16\fP |
\fBpcre16\fP |
| 2417 |
.rs |
.rs |
| 2418 |
.sp |
.sp |
| 2419 |
Matching certain patterns using \fBpcre_exec()\fP can use a lot of process |
Matching certain patterns using \fBpcre_exec()\fP can use a lot of process |
| 2420 |
stack, which in certain environments can be rather limited in size. Some users |
stack, which in certain environments can be rather limited in size. Some users |
| 2421 |
find it helpful to have an estimate of the amount of stack that is used by |
find it helpful to have an estimate of the amount of stack that is used by |
| 2422 |
\fBpcre_exec()\fP, to help them set recursion limits, as described in the |
\fBpcre_exec()\fP, to help them set recursion limits, as described in the |
| 2423 |
.\" HREF |
.\" HREF |
| 2424 |
\fBpcrestack\fP |
\fBpcrestack\fP |
| 2425 |
.\" |
.\" |
| 2426 |
documentation. The estimate that is output by \fBpcretest\fP when called with |
documentation. The estimate that is output by \fBpcretest\fP when called with |
| 2427 |
the \fB-m\fP and \fB-C\fP options is obtained by calling \fBpcre_exec\fP with |
the \fB-m\fP and \fB-C\fP options is obtained by calling \fBpcre_exec\fP with |
| 2428 |
the values NULL, NULL, NULL, -999, and -999 for its first five arguments. |
the values NULL, NULL, NULL, -999, and -999 for its first five arguments. |
| 2429 |
.P |
.P |
| 2430 |
Normally, if its first argument is NULL, \fBpcre_exec()\fP immediately returns |
Normally, if its first argument is NULL, \fBpcre_exec()\fP immediately returns |
| 2432 |
arguments, it returns instead a negative number whose absolute value is the |
arguments, it returns instead a negative number whose absolute value is the |
| 2433 |
approximate stack frame size in bytes. (A negative number is used so that it is |
approximate stack frame size in bytes. (A negative number is used so that it is |
| 2434 |
clear that no match has happened.) The value is approximate because in some |
clear that no match has happened.) The value is approximate because in some |
| 2435 |
cases, recursive calls to \fBpcre_exec()\fP occur when there are one or two |
cases, recursive calls to \fBpcre_exec()\fP occur when there are one or two |
| 2436 |
additional variables on the stack. |
additional variables on the stack. |
| 2437 |
.P |
.P |
| 2438 |
If PCRE has been compiled to use the heap instead of the stack for recursion, |
If PCRE has been compiled to use the heap instead of the stack for recursion, |
| 2439 |
the value returned is the size of each block that is obtained from the heap. |
the value returned is the size of each block that is obtained from the heap. |
| 2440 |
. |
. |
| 2441 |
. |
. |