/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 572 by ph10, Wed Nov 17 17:55:57 2010 UTC revision 575 by ph10, Sun Nov 21 12:55:42 2010 UTC
# Line 182  The following sections describe the use Line 182  The following sections describe the use
182  .rs  .rs
183  .sp  .sp
184  The backslash character has several uses. Firstly, if it is followed by a  The backslash character has several uses. Firstly, if it is followed by a
185  non-alphanumeric character, it takes away any special meaning that character  character that is not a number or a letter, it takes away any special meaning
186  may have. This use of backslash as an escape character applies both inside and  that character may have. This use of backslash as an escape character applies
187  outside character classes.  both inside and outside character classes.
188  .P  .P
189  For example, if you want to match a * character, you write \e* in the pattern.  For example, if you want to match a * character, you write \e* in the pattern.
190  This escaping action applies whether or not the following character would  This escaping action applies whether or not the following character would
# Line 192  otherwise be interpreted as a metacharac Line 192  otherwise be interpreted as a metacharac
192  non-alphanumeric with backslash to specify that it stands for itself. In  non-alphanumeric with backslash to specify that it stands for itself. In
193  particular, if you want to match a backslash, you write \e\e.  particular, if you want to match a backslash, you write \e\e.
194  .P  .P
195    In UTF-8 mode, only ASCII numbers and letters have any special meaning after a
196    backslash. All other characters (in particular, those whose codepoints are
197    greater than 127) are treated as literals.
198    .P
199  If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the  If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the
200  pattern (other than in a character class) and characters between a # outside  pattern (other than in a character class) and characters between a # outside
201  a character class and the next newline are ignored. An escaping backslash can  a character class and the next newline are ignored. An escaping backslash can
# Line 225  but when a pattern is being prepared by Line 229  but when a pattern is being prepared by
229  one of the following escape sequences than the binary character it represents:  one of the following escape sequences than the binary character it represents:
230  .sp  .sp
231    \ea        alarm, that is, the BEL character (hex 07)    \ea        alarm, that is, the BEL character (hex 07)
232    \ecx       "control-x", where x is any character    \ecx       "control-x", where x is any ASCII character
233    \ee        escape (hex 1B)    \ee        escape (hex 1B)
234    \ef        formfeed (hex 0C)    \ef        formfeed (hex 0C)
235    \en        linefeed (hex 0A)    \en        linefeed (hex 0A)
# Line 237  one of the following escape sequences th Line 241  one of the following escape sequences th
241  .sp  .sp
242  The precise effect of \ecx is as follows: if x is a lower case letter, it  The precise effect of \ecx is as follows: if x is a lower case letter, it
243  is converted to upper case. Then bit 6 of the character (hex 40) is inverted.  is converted to upper case. Then bit 6 of the character (hex 40) is inverted.
244  Thus \ecz becomes hex 1A, but \ec{ becomes hex 3B, while \ec; becomes hex  Thus \ecz becomes hex 1A (z is 7A), but \ec{ becomes hex 3B ({ is 7B), while
245  7B.  \ec; becomes hex 7B (; is 3B). If the byte following \ec has a value greater
246    than 127, a compile-time error occurs. This locks out non-ASCII characters in
247    both byte mode and UTF-8 mode. (When PCRE is compiled in EBCDIC mode, all byte
248    values are valid. A lower case letter is converted to upper case, and then the
249    0xc0 bits are flipped.)
250  .P  .P
251  After \ex, from zero to two hexadecimal digits are read (letters can be in  After \ex, from zero to two hexadecimal digits are read (letters can be in
252  upper or lower case). Any number of hexadecimal digits may appear between \ex{  upper or lower case). Any number of hexadecimal digits may appear between \ex{
# Line 1044  characters in both cases. In UTF-8 mode, Line 1052  characters in both cases. In UTF-8 mode,
1052  characters with values greater than 128 only when it is compiled with Unicode  characters with values greater than 128 only when it is compiled with Unicode
1053  property support.  property support.
1054  .P  .P
1055  The character types \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev, \eV, \ew, and  The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev,
1056  \eW may also appear in a character class, and add the characters that they  \eV, \ew, and \eW may appear in a character class, and add the characters that
1057  match to the class. For example, [\edABCDEF] matches any hexadecimal digit. A  they match to the class. For example, [\edABCDEF] matches any hexadecimal
1058  circumflex can conveniently be used with the upper case character types to  digit. In UTF-8 mode, the PCRE_UCP option affects the meanings of \ed, \es, \ew
1059    and their upper case partners, just as it does when they appear outside a
1060    character class, as described in the section entitled
1061    .\" HTML <a href="#genericchartypes">
1062    .\" </a>
1063    "Generic character types"
1064    .\"
1065    above. The escape sequence \eb has a different meaning inside a character
1066    class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX
1067    are not special inside a character class. Like any other unrecognized escape
1068    sequences, they are treated as the literal characters "B", "N", "R", and "X" by
1069    default, but cause an error if the PCRE_EXTRA option is set.
1070    .P
1071    A circumflex can conveniently be used with the upper case character types to
1072  specify a more restricted set of characters than the matching lower case type.  specify a more restricted set of characters than the matching lower case type.
1073  For example, the class [^\eW_] matches any letter or digit, but not underscore.  For example, the class [^\eW_] matches any letter or digit, but not underscore,
1074    whereas [\ew] includes underscore. A positive character class should be read as
1075    "something OR something OR ..." and a negative class as "NOT something AND NOT
1076    something AND NOT ...".
1077  .P  .P
1078  The only metacharacters that are recognized in character classes are backslash,  The only metacharacters that are recognized in character classes are backslash,
1079  hyphen (only where it can be interpreted as specifying a range), circumflex  hyphen (only where it can be interpreted as specifying a range), circumflex
# Line 2718  Cambridge CB2 3QH, England. Line 2742  Cambridge CB2 3QH, England.
2742  .rs  .rs
2743  .sp  .sp
2744  .nf  .nf
2745  Last updated: 17 November 2010  Last updated: 21 November 2010
2746  Copyright (c) 1997-2010 University of Cambridge.  Copyright (c) 1997-2010 University of Cambridge.
2747  .fi  .fi

Legend:
Removed from v.572  
changed lines
  Added in v.575

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12