| 601 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
| 602 |
.sp |
.sp |
| 603 |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
| 604 |
automatically checked. Note that the check is for a syntactically valid UTF-8 |
automatically checked. There is a discussion about the |
| 605 |
byte string, as defined by RFC 2279. It is \fInot\fP a check for a UTF-8 string |
.\" HTML <a href="pcre.html#utf8strings"> |
| 606 |
of assigned or allowable Unicode code points. |
.\" </a> |
| 607 |
.P |
validity of UTF-8 strings |
| 608 |
If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP returns an |
.\" |
| 609 |
error. If you already know that your pattern is valid, and you want to skip |
in the main |
| 610 |
this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. |
.\" HREF |
| 611 |
When it is set, the effect of passing an invalid UTF-8 string as a pattern is |
\fBpcre\fP |
| 612 |
undefined. It may cause your program to crash. Note that this option can also |
.\" |
| 613 |
be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the UTF-8 |
page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP |
| 614 |
validity checking of subject strings. |
returns an error. If you already know that your pattern is valid, and you want |
| 615 |
|
to skip this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK |
| 616 |
|
option. When it is set, the effect of passing an invalid UTF-8 string as a |
| 617 |
|
pattern is undefined. It may cause your program to crash. Note that this option |
| 618 |
|
can also be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress |
| 619 |
|
the UTF-8 validity checking of subject strings. |
| 620 |
. |
. |
| 621 |
. |
. |
| 622 |
.SH "COMPILATION ERROR CODES" |
.SH "COMPILATION ERROR CODES" |
| 1239 |
.sp |
.sp |
| 1240 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 |
| 1241 |
string is automatically checked when \fBpcre_exec()\fP is subsequently called. |
string is automatically checked when \fBpcre_exec()\fP is subsequently called. |
| 1242 |
Note that the check is for a syntactically valid UTF-8 byte string, as defined |
The value of \fIstartoffset\fP is also checked to ensure that it points to the |
| 1243 |
by RFC 2279. It is \fInot\fP a check for a UTF-8 string of assigned or |
start of a UTF-8 character. There is a discussion about the validity of UTF-8 |
| 1244 |
allowable Unicode code points. The value of \fIstartoffset\fP is also checked |
strings in the |
| 1245 |
to ensure that it points to the start of a UTF-8 character. If an invalid UTF-8 |
.\" HTML <a href="pcre.html#utf8strings"> |
| 1246 |
sequence of bytes is found, \fBpcre_exec()\fP returns the error |
.\" </a> |
| 1247 |
PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value, |
section on UTF-8 support |
| 1248 |
|
.\" |
| 1249 |
|
in the main |
| 1250 |
|
.\" HREF |
| 1251 |
|
\fBpcre\fP |
| 1252 |
|
.\" |
| 1253 |
|
page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns |
| 1254 |
|
the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value, |
| 1255 |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
| 1256 |
.P |
.P |
| 1257 |
If you already know that your subject is valid, and you want to skip these |
If you already know that your subject is valid, and you want to skip these |
| 1886 |
.rs |
.rs |
| 1887 |
.sp |
.sp |
| 1888 |
.nf |
.nf |
| 1889 |
Last updated: 07 August 2007 |
Last updated: 09 August 2007 |
| 1890 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 1891 |
.fi |
.fi |