/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 738 by ph10, Fri Oct 21 09:04:01 2011 UTC revision 745 by ph10, Mon Nov 14 11:41:03 2011 UTC
# Line 241  one of the following escape sequences th Line 241  one of the following escape sequences th
241    \et        tab (hex 09)    \et        tab (hex 09)
242    \eddd      character with octal code ddd, or back reference    \eddd      character with octal code ddd, or back reference
243    \exhh      character with hex code hh    \exhh      character with hex code hh
244    \ex{hhh..} character with hex code hhh..    \ex{hhh..} character with hex code hhh.. (non-JavaScript mode)
245      \euhhhh    character with hex code hhhh (JavaScript mode only)
246  .sp  .sp
247  The precise effect of \ecx is as follows: if x is a lower case letter, it  The precise effect of \ecx is as follows: if x is a lower case letter, it
248  is converted to upper case. Then bit 6 of the character (hex 40) is inverted.  is converted to upper case. Then bit 6 of the character (hex 40) is inverted.
# Line 252  both byte mode and UTF-8 mode. (When PCR Line 253  both byte mode and UTF-8 mode. (When PCR
253  values are valid. A lower case letter is converted to upper case, and then the  values are valid. A lower case letter is converted to upper case, and then the
254  0xc0 bits are flipped.)  0xc0 bits are flipped.)
255  .P  .P
256  After \ex, from zero to two hexadecimal digits are read (letters can be in  By default, after \ex, from zero to two hexadecimal digits are read (letters
257  upper or lower case). Any number of hexadecimal digits may appear between \ex{  can be in upper or lower case). Any number of hexadecimal digits may appear
258  and }, but the value of the character code must be less than 256 in non-UTF-8  between \ex{ and }, but the value of the character code must be less than 256
259  mode, and less than 2**31 in UTF-8 mode. That is, the maximum value in  in non-UTF-8 mode, and less than 2**31 in UTF-8 mode. That is, the maximum
260  hexadecimal is 7FFFFFFF. Note that this is bigger than the largest Unicode code  value in hexadecimal is 7FFFFFFF. Note that this is bigger than the largest
261  point, which is 10FFFF.  Unicode code point, which is 10FFFF.
262  .P  .P
263  If characters other than hexadecimal digits appear between \ex{ and }, or if  If characters other than hexadecimal digits appear between \ex{ and }, or if
264  there is no terminating }, this form of escape is not recognized. Instead, the  there is no terminating }, this form of escape is not recognized. Instead, the
265  initial \ex will be interpreted as a basic hexadecimal escape, with no  initial \ex will be interpreted as a basic hexadecimal escape, with no
266  following digits, giving a character whose value is zero.  following digits, giving a character whose value is zero.
267  .P  .P
268    If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \ex is
269    as just described only when it is followed by two hexadecimal digits.
270    Otherwise, it matches a literal "x" character. In JavaScript mode, support for
271    code points greater than 256 is provided by \eu, which must be followed by
272    four hexadecimal digits; otherwise it matches a literal "u" character.
273    .P
274  Characters whose value is less than 256 can be defined by either of the two  Characters whose value is less than 256 can be defined by either of the two
275  syntaxes for \ex. There is no difference in the way they are handled. For  syntaxes for \ex (or by \eu in JavaScript mode). There is no difference in the
276  example, \exdc is exactly the same as \ex{dc}.  way they are handled. For example, \exdc is exactly the same as \ex{dc} (or
277    \eu00dc in JavaScript mode).
278  .P  .P
279  After \e0 up to two further octal digits are read. If there are fewer than two  After \e0 up to two further octal digits are read. If there are fewer than two
280  digits, just those that are present are used. Thus the sequence \e0\ex\e07  digits, just those that are present are used. Thus the sequence \e0\ex\e07
# Line 328  unrecognized escape sequences, they are Line 336  unrecognized escape sequences, they are
336  set. Outside a character class, these sequences have different meanings.  set. Outside a character class, these sequences have different meanings.
337  .  .
338  .  .
339    .SS "Unsupported escape sequences"
340    .rs
341    .sp
342    In Perl, the sequences \el, \eL, \eu, and \eU are recognized by its string
343    handler and used to modify the case of following characters. By default, PCRE
344    does not support these escape sequences. However, if the PCRE_JAVASCRIPT_COMPAT
345    option is set, \eU matches a "U" character, and \eu can be used to define a
346    character by code point, as described in the previous section.
347    .
348    .
349  .SS "Absolute and relative back references"  .SS "Absolute and relative back references"
350  .rs  .rs
351  .sp  .sp
# Line 387  This is the same as Line 405  This is the same as
405  .\" </a>  .\" </a>
406  the "." metacharacter  the "." metacharacter
407  .\"  .\"
408  when PCRE_DOTALL is not set.  when PCRE_DOTALL is not set. Perl also uses \eN to match characters by name;
409    PCRE does not support this.
410  .P  .P
411  Each pair of lower and upper case escape sequences partitions the complete set  Each pair of lower and upper case escape sequences partitions the complete set
412  of characters into two disjoint sets. Any given character matches one, and only  of characters into two disjoint sets. Any given character matches one, and only
# Line 964  special meaning in a character class. Line 983  special meaning in a character class.
983  .P  .P
984  The escape sequence \eN behaves like a dot, except that it is not affected by  The escape sequence \eN behaves like a dot, except that it is not affected by
985  the PCRE_DOTALL option. In other words, it matches any character except one  the PCRE_DOTALL option. In other words, it matches any character except one
986  that signifies the end of a line.  that signifies the end of a line. Perl also uses \eN to match characters by
987    name; PCRE does not support this.
988  .  .
989  .  .
990  .SH "MATCHING A SINGLE BYTE"  .SH "MATCHING A SINGLE BYTE"
# Line 2854  Cambridge CB2 3QH, England. Line 2874  Cambridge CB2 3QH, England.
2874  .rs  .rs
2875  .sp  .sp
2876  .nf  .nf
2877  Last updated: 19 October 2011  Last updated: 14 November 2011
2878  Copyright (c) 1997-2011 University of Cambridge.  Copyright (c) 1997-2011 University of Cambridge.
2879  .fi  .fi

Legend:
Removed from v.738  
changed lines
  Added in v.745

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12