| 601 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
| 602 |
.sp |
.sp |
| 603 |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
| 604 |
automatically checked. If an invalid UTF-8 sequence of bytes is found, |
automatically checked. Note that the check is for a syntactically valid UTF-8 |
| 605 |
\fBpcre_compile()\fP returns an error. If you already know that your pattern is |
byte string, as defined by RFC 2279. It is \fInot\fP a check for a UTF-8 string |
| 606 |
valid, and you want to skip this check for performance reasons, you can set the |
of assigned or allowable Unicode code points. |
| 607 |
PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid |
.P |
| 608 |
UTF-8 string as a pattern is undefined. It may cause your program to crash. |
If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP returns an |
| 609 |
Note that this option can also be passed to \fBpcre_exec()\fP and |
error. If you already know that your pattern is valid, and you want to skip |
| 610 |
\fBpcre_dfa_exec()\fP, to suppress the UTF-8 validity checking of subject |
this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. |
| 611 |
strings. |
When it is set, the effect of passing an invalid UTF-8 string as a pattern is |
| 612 |
|
undefined. It may cause your program to crash. Note that this option can also |
| 613 |
|
be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the UTF-8 |
| 614 |
|
validity checking of subject strings. |
| 615 |
. |
. |
| 616 |
. |
. |
| 617 |
.SH "COMPILATION ERROR CODES" |
.SH "COMPILATION ERROR CODES" |
| 1234 |
.sp |
.sp |
| 1235 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 |
| 1236 |
string is automatically checked when \fBpcre_exec()\fP is subsequently called. |
string is automatically checked when \fBpcre_exec()\fP is subsequently called. |
| 1237 |
The value of \fIstartoffset\fP is also checked to ensure that it points to the |
Note that the check is for a syntactically valid UTF-8 byte string, as defined |
| 1238 |
start of a UTF-8 character. If an invalid UTF-8 sequence of bytes is found, |
by RFC 2279. It is \fInot\fP a check for a UTF-8 string of assigned or |
| 1239 |
\fBpcre_exec()\fP returns the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP |
allowable Unicode code points. The value of \fIstartoffset\fP is also checked |
| 1240 |
contains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. |
to ensure that it points to the start of a UTF-8 character. If an invalid UTF-8 |
| 1241 |
|
sequence of bytes is found, \fBpcre_exec()\fP returns the error |
| 1242 |
|
PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value, |
| 1243 |
|
PCRE_ERROR_BADUTF8_OFFSET is returned. |
| 1244 |
.P |
.P |
| 1245 |
If you already know that your subject is valid, and you want to skip these |
If you already know that your subject is valid, and you want to skip these |
| 1246 |
checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when |
checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when |
| 1874 |
.rs |
.rs |
| 1875 |
.sp |
.sp |
| 1876 |
.nf |
.nf |
| 1877 |
Last updated: 30 July 2007 |
Last updated: 07 August 2007 |
| 1878 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 1879 |
.fi |
.fi |