| 63 |
The original operation of PCRE was on strings of one-byte characters. However, |
The original operation of PCRE was on strings of one-byte characters. However, |
| 64 |
there is now also support for UTF-8 character strings. To use this, you must |
there is now also support for UTF-8 character strings. To use this, you must |
| 65 |
build PCRE to include UTF-8 support, and then call <b>pcre_compile()</b> with |
build PCRE to include UTF-8 support, and then call <b>pcre_compile()</b> with |
| 66 |
the PCRE_UTF8 option. How this affects pattern matching is mentioned in several |
the PCRE_UTF8 option. There is also a special sequence that can be given at the |
| 67 |
places below. There is also a summary of UTF-8 features in the |
start of a pattern: |
| 68 |
|
<pre> |
| 69 |
|
(*UTF8) |
| 70 |
|
</pre> |
| 71 |
|
Starting a pattern with this sequence is equivalent to setting the PCRE_UTF8 |
| 72 |
|
option. This feature is not Perl-compatible. How setting UTF-8 mode affects |
| 73 |
|
pattern matching is mentioned in several places below. There is also a summary |
| 74 |
|
of UTF-8 features in the |
| 75 |
<a href="pcre.html#utf8support">section on UTF-8 support</a> |
<a href="pcre.html#utf8support">section on UTF-8 support</a> |
| 76 |
in the main |
in the main |
| 77 |
<a href="pcre.html"><b>pcre</b></a> |
<a href="pcre.html"><b>pcre</b></a> |
| 1038 |
J, U and X respectively. |
J, U and X respectively. |
| 1039 |
</P> |
</P> |
| 1040 |
<P> |
<P> |
| 1041 |
When an option change occurs at top level (that is, not inside subpattern |
When one of these option changes occurs at top level (that is, not inside |
| 1042 |
parentheses), the change applies to the remainder of the pattern that follows. |
subpattern parentheses), the change applies to the remainder of the pattern |
| 1043 |
If the change is placed right at the start of a pattern, PCRE extracts it into |
that follows. If the change is placed right at the start of a pattern, PCRE |
| 1044 |
the global options (and it will therefore show up in data extracted by the |
extracts it into the global options (and it will therefore show up in data |
| 1045 |
<b>pcre_fullinfo()</b> function). |
extracted by the <b>pcre_fullinfo()</b> function). |
| 1046 |
</P> |
</P> |
| 1047 |
<P> |
<P> |
| 1048 |
An option change within a subpattern (see below for a description of |
An option change within a subpattern (see below for a description of |
| 1065 |
<P> |
<P> |
| 1066 |
<b>Note:</b> There are other PCRE-specific options that can be set by the |
<b>Note:</b> There are other PCRE-specific options that can be set by the |
| 1067 |
application when the compile or match functions are called. In some cases the |
application when the compile or match functions are called. In some cases the |
| 1068 |
pattern can contain special leading sequences to override what the application |
pattern can contain special leading sequences such as (*CRLF) to override what |
| 1069 |
has set or what has been defaulted. Details are given in the section entitled |
the application has set or what has been defaulted. Details are given in the |
| 1070 |
|
section entitled |
| 1071 |
<a href="#newlineseq">"Newline sequences"</a> |
<a href="#newlineseq">"Newline sequences"</a> |
| 1072 |
above. |
above. There is also the (*UTF8) leading sequence that can be used to set UTF-8 |
| 1073 |
|
mode; this is equivalent to setting the PCRE_UTF8 option. |
| 1074 |
<a name="subpattern"></a></P> |
<a name="subpattern"></a></P> |
| 1075 |
<br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br> |
<br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br> |
| 1076 |
<P> |
<P> |
| 2253 |
</P> |
</P> |
| 2254 |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
| 2255 |
<P> |
<P> |
| 2256 |
Last updated: 18 March 2009 |
Last updated: 11 April 2009 |
| 2257 |
<br> |
<br> |
| 2258 |
Copyright © 1997-2009 University of Cambridge. |
Copyright © 1997-2009 University of Cambridge. |
| 2259 |
<br> |
<br> |