--- code/trunk/doc/html/pcreapi.html 2007/02/24 21:40:20 70 +++ code/trunk/doc/html/pcreapi.html 2007/02/24 21:40:24 71 @@ -442,6 +442,21 @@ pcre page.

+

+

+  PCRE_NO_UTF8_CHECK
+
+

+

+When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is +automatically checked. If an invalid UTF-8 sequence of bytes is found, +pcre_compile() returns an error. If you already know that your pattern is +valid, and you want to skip this check for performance reasons, you can set the +PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid +UTF-8 string as a pattern is undefined. It may cause your program to crash. +Note that there is a similar option for suppressing the checking of subject +strings passed to pcre_exec(). +


STUDYING A PATTERN

pcre_extra *pcre_study(const pcre *code, int options, @@ -862,6 +877,15 @@ unachored at matching time.

+When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8 +string is automatically checked. If an invalid UTF-8 sequence of bytes is +found, pcre_exec() returns the error PCRE_ERROR_BADUTF8. If you already +know that your subject is valid, and you want to skip this check for +performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling +pcre_exec(). When this option is set, the effect of passing an invalid +UTF-8 string as a subject is undefined. It may cause your program to crash. +

+

There are also three further options that can be set only at matching time:

@@ -1106,6 +1130,14 @@ use by callout functions that want to yield a distinctive error code. See the pcrecallout documentation for details.

+

+

+  PCRE_ERROR_BADUTF8       (-10)
+
+

+

+A string that contains an invalid UTF-8 byte sequence was passed as a subject. +


EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

int pcre_copy_substring(const char *subject, int *ovector, @@ -1257,6 +1289,6 @@ appropriate.

-Last updated: 03 February 2003 +Last updated: 20 August 2003
Copyright © 1997-2003 University of Cambridge.