| 116 |
|
|
| 117 |
The following comments apply when PCRE is running in UTF-8 mode: |
The following comments apply when PCRE is running in UTF-8 mode: |
| 118 |
|
|
| 119 |
1. PCRE assumes that the strings it is given contain valid UTF-8 codes. It does |
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects |
| 120 |
not diagnose invalid UTF-8 strings. If you pass invalid UTF-8 strings to PCRE, |
are checked for validity on entry to the relevant functions. If an invalid |
| 121 |
the results are undefined. |
UTF-8 string is passed, an error return is given. In some situations, you may |
| 122 |
|
already know that your strings are valid, and therefore want to skip these |
| 123 |
|
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag |
| 124 |
|
at compile time or at run time, PCRE assumes that the pattern or subject it |
| 125 |
|
is given (respectively) contains only valid UTF-8 codes. In this case, it does |
| 126 |
|
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to |
| 127 |
|
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program |
| 128 |
|
may crash. |
| 129 |
|
|
| 130 |
2. In a pattern, the escape sequence \\x{...}, where the contents of the braces |
2. In a pattern, the escape sequence \\x{...}, where the contents of the braces |
| 131 |
is a string of hexadecimal digits, is interpreted as a UTF-8 character whose |
is a string of hexadecimal digits, is interpreted as a UTF-8 character whose |
| 169 |
Phone: +44 1223 334714 |
Phone: +44 1223 334714 |
| 170 |
|
|
| 171 |
.in 0 |
.in 0 |
| 172 |
Last updated: 04 February 2003 |
Last updated: 20 August 2003 |
| 173 |
.br |
.br |
| 174 |
Copyright (c) 1997-2003 University of Cambridge. |
Copyright (c) 1997-2003 University of Cambridge. |