| 33 |
The current implementation of PCRE corresponds approximately with Perl 5.12, |
The current implementation of PCRE corresponds approximately with Perl 5.12, |
| 34 |
including support for UTF-8 encoded strings and Unicode general category |
including support for UTF-8 encoded strings and Unicode general category |
| 35 |
properties. However, UTF-8 and Unicode support has to be explicitly enabled; it |
properties. However, UTF-8 and Unicode support has to be explicitly enabled; it |
| 36 |
is not the default. The Unicode tables correspond to Unicode release 5.2.0. |
is not the default. The Unicode tables correspond to Unicode release 6.0.0. |
| 37 |
</P> |
</P> |
| 38 |
<P> |
<P> |
| 39 |
In addition to the Perl-compatible matching function, PCRE contains an |
In addition to the Perl-compatible matching function, PCRE contains an |
| 207 |
UTF-8.) |
UTF-8.) |
| 208 |
</P> |
</P> |
| 209 |
<P> |
<P> |
| 210 |
If an invalid UTF-8 string is passed to PCRE, an error return |
If an invalid UTF-8 string is passed to PCRE, an error return is given. At |
| 211 |
(PCRE_ERROR_BADUTF8) is given. In some situations, you may already know that |
compile time, the only additional information is the offset to the first byte |
| 212 |
your strings are valid, and therefore want to skip these checks in order to |
of the failing character. The runtime functions (<b>pcre_exec()</b> and |
| 213 |
improve performance. If you set the PCRE_NO_UTF8_CHECK flag at compile time or |
<b>pcre_dfa_exec()</b>), pass back this information as well as a more detailed |
| 214 |
at run time, PCRE assumes that the pattern or subject it is given |
reason code if the caller has provided memory in which to do this. |
| 215 |
(respectively) contains only valid UTF-8 codes. In this case, it does not |
</P> |
| 216 |
diagnose an invalid UTF-8 string. |
<P> |
| 217 |
|
In some situations, you may already know that your strings are valid, and |
| 218 |
|
therefore want to skip these checks in order to improve performance. If you set |
| 219 |
|
the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that |
| 220 |
|
the pattern or subject it is given (respectively) contains only valid UTF-8 |
| 221 |
|
codes. In this case, it does not diagnose an invalid UTF-8 string. |
| 222 |
</P> |
</P> |
| 223 |
<P> |
<P> |
| 224 |
If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what |
If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what |
| 310 |
</P> |
</P> |
| 311 |
<br><a name="SEC6" href="#TOC1">REVISION</a><br> |
<br><a name="SEC6" href="#TOC1">REVISION</a><br> |
| 312 |
<P> |
<P> |
| 313 |
Last updated: 13 November 2010 |
Last updated: 07 May 2011 |
| 314 |
<br> |
<br> |
| 315 |
Copyright © 1997-2010 University of Cambridge. |
Copyright © 1997-2011 University of Cambridge. |
| 316 |
<br> |
<br> |
| 317 |
<p> |
<p> |
| 318 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |