--- code/trunk/doc/html/pcrepattern.html 2009/04/11 14:22:17 415 +++ code/trunk/doc/html/pcrepattern.html 2009/04/11 14:34:02 416 @@ -63,8 +63,15 @@ The original operation of PCRE was on strings of one-byte characters. However, there is now also support for UTF-8 character strings. To use this, you must build PCRE to include UTF-8 support, and then call pcre_compile() with -the PCRE_UTF8 option. How this affects pattern matching is mentioned in several -places below. There is also a summary of UTF-8 features in the +the PCRE_UTF8 option. There is also a special sequence that can be given at the +start of a pattern: +
+ (*UTF8) ++Starting a pattern with this sequence is equivalent to setting the PCRE_UTF8 +option. This feature is not Perl-compatible. How setting UTF-8 mode affects +pattern matching is mentioned in several places below. There is also a summary +of UTF-8 features in the section on UTF-8 support in the main pcre @@ -1031,11 +1038,11 @@ J, U and X respectively.
-When an option change occurs at top level (that is, not inside subpattern -parentheses), the change applies to the remainder of the pattern that follows. -If the change is placed right at the start of a pattern, PCRE extracts it into -the global options (and it will therefore show up in data extracted by the -pcre_fullinfo() function). +When one of these option changes occurs at top level (that is, not inside +subpattern parentheses), the change applies to the remainder of the pattern +that follows. If the change is placed right at the start of a pattern, PCRE +extracts it into the global options (and it will therefore show up in data +extracted by the pcre_fullinfo() function).
An option change within a subpattern (see below for a description of @@ -1058,10 +1065,12 @@
Note: There are other PCRE-specific options that can be set by the application when the compile or match functions are called. In some cases the -pattern can contain special leading sequences to override what the application -has set or what has been defaulted. Details are given in the section entitled +pattern can contain special leading sequences such as (*CRLF) to override what +the application has set or what has been defaulted. Details are given in the +section entitled "Newline sequences" -above. +above. There is also the (*UTF8) leading sequence that can be used to set UTF-8 +mode; this is equivalent to setting the PCRE_UTF8 option.
@@ -2244,7 +2253,7 @@
-Last updated: 18 March 2009
+Last updated: 11 April 2009
Copyright © 1997-2009 University of Cambridge.