| 23 |
The original operation of PCRE was on strings of one-byte characters. However, |
The original operation of PCRE was on strings of one-byte characters. However, |
| 24 |
there is now also support for UTF-8 character strings. To use this, you must |
there is now also support for UTF-8 character strings. To use this, you must |
| 25 |
build PCRE to include UTF-8 support, and then call \fBpcre_compile()\fP with |
build PCRE to include UTF-8 support, and then call \fBpcre_compile()\fP with |
| 26 |
the PCRE_UTF8 option. How this affects pattern matching is mentioned in several |
the PCRE_UTF8 option. There is also a special sequence that can be given at the |
| 27 |
places below. There is also a summary of UTF-8 features in the |
start of a pattern: |
| 28 |
|
.sp |
| 29 |
|
(*UTF8) |
| 30 |
|
.sp |
| 31 |
|
Starting a pattern with this sequence is equivalent to setting the PCRE_UTF8 |
| 32 |
|
option. This feature is not Perl-compatible. How setting UTF-8 mode affects |
| 33 |
|
pattern matching is mentioned in several places below. There is also a summary |
| 34 |
|
of UTF-8 features in the |
| 35 |
.\" HTML <a href="pcre.html#utf8support"> |
.\" HTML <a href="pcre.html#utf8support"> |
| 36 |
.\" </a> |
.\" </a> |
| 37 |
section on UTF-8 support |
section on UTF-8 support |
| 1039 |
changed in the same way as the Perl-compatible options by using the characters |
changed in the same way as the Perl-compatible options by using the characters |
| 1040 |
J, U and X respectively. |
J, U and X respectively. |
| 1041 |
.P |
.P |
| 1042 |
When an option change occurs at top level (that is, not inside subpattern |
When one of these option changes occurs at top level (that is, not inside |
| 1043 |
parentheses), the change applies to the remainder of the pattern that follows. |
subpattern parentheses), the change applies to the remainder of the pattern |
| 1044 |
If the change is placed right at the start of a pattern, PCRE extracts it into |
that follows. If the change is placed right at the start of a pattern, PCRE |
| 1045 |
the global options (and it will therefore show up in data extracted by the |
extracts it into the global options (and it will therefore show up in data |
| 1046 |
\fBpcre_fullinfo()\fP function). |
extracted by the \fBpcre_fullinfo()\fP function). |
| 1047 |
.P |
.P |
| 1048 |
An option change within a subpattern (see below for a description of |
An option change within a subpattern (see below for a description of |
| 1049 |
subpatterns) affects only that part of the current pattern that follows it, so |
subpatterns) affects only that part of the current pattern that follows it, so |
| 1064 |
.P |
.P |
| 1065 |
\fBNote:\fP There are other PCRE-specific options that can be set by the |
\fBNote:\fP There are other PCRE-specific options that can be set by the |
| 1066 |
application when the compile or match functions are called. In some cases the |
application when the compile or match functions are called. In some cases the |
| 1067 |
pattern can contain special leading sequences to override what the application |
pattern can contain special leading sequences such as (*CRLF) to override what |
| 1068 |
has set or what has been defaulted. Details are given in the section entitled |
the application has set or what has been defaulted. Details are given in the |
| 1069 |
|
section entitled |
| 1070 |
.\" HTML <a href="#newlineseq"> |
.\" HTML <a href="#newlineseq"> |
| 1071 |
.\" </a> |
.\" </a> |
| 1072 |
"Newline sequences" |
"Newline sequences" |
| 1073 |
.\" |
.\" |
| 1074 |
above. |
above. There is also the (*UTF8) leading sequence that can be used to set UTF-8 |
| 1075 |
|
mode; this is equivalent to setting the PCRE_UTF8 option. |
| 1076 |
. |
. |
| 1077 |
. |
. |
| 1078 |
.\" HTML <a name="subpattern"></a> |
.\" HTML <a name="subpattern"></a> |
| 2254 |
.rs |
.rs |
| 2255 |
.sp |
.sp |
| 2256 |
.nf |
.nf |
| 2257 |
Last updated: 18 March 2009 |
Last updated: 11 April 2009 |
| 2258 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
| 2259 |
.fi |
.fi |