/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 181 by ph10, Wed Jun 13 08:44:34 2007 UTC revision 182 by ph10, Wed Jun 13 15:09:54 2007 UTC
# Line 260  parenthesized subpatterns. Line 260  parenthesized subpatterns.
260  Another use of backslash is for specifying generic character types. The  Another use of backslash is for specifying generic character types. The
261  following are always recognized:  following are always recognized:
262  .sp  .sp
263    \ed     any decimal digit    \ed     any decimal digit
264    \eD     any character that is not a decimal digit    \eD     any character that is not a decimal digit
265    \eh     any horizontal whitespace character    \eh     any horizontal whitespace character
266    \eH     any character that is not a horizontal whitespace character    \eH     any character that is not a horizontal whitespace character
267    \es     any whitespace character    \es     any whitespace character
268    \eS     any character that is not a whitespace character    \eS     any character that is not a whitespace character
269    \ev     any vertical whitespace character    \ev     any vertical whitespace character
270    \eV     any character that is not a vertical whitespace character    \eV     any character that is not a vertical whitespace character
271    \ew     any "word" character    \ew     any "word" character
272    \eW     any "non-word" character    \eW     any "non-word" character
273  .sp  .sp
# Line 287  does. Line 287  does.
287  .P  .P
288  In UTF-8 mode, characters with values greater than 128 never match \ed, \es, or  In UTF-8 mode, characters with values greater than 128 never match \ed, \es, or
289  \ew, and always match \eD, \eS, and \eW. This is true even when Unicode  \ew, and always match \eD, \eS, and \eW. This is true even when Unicode
290  character property support is available. These sequences retain their original  character property support is available. These sequences retain their original
291  meanings from before UTF-8 support was available, mainly for efficiency  meanings from before UTF-8 support was available, mainly for efficiency
292  reasons.  reasons.
293  .P  .P
294  The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the  The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the
295  other sequences, these do match certain high-valued codepoints in UTF-8 mode.  other sequences, these do match certain high-valued codepoints in UTF-8 mode.
296  The horizontal space characters are:  The horizontal space characters are:
297  .sp  .sp
# Line 1001  the above patterns match "SUNDAY" as wel Line 1001  the above patterns match "SUNDAY" as wel
1001  .SH "DUPLICATE SUBPATTERN NUMBERS"  .SH "DUPLICATE SUBPATTERN NUMBERS"
1002  .rs  .rs
1003  .sp  .sp
1004  Perl 5.10 introduced a feature whereby each alternative in a subpattern uses  Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
1005  the same numbers for its capturing parentheses. Such a subpattern starts with  the same numbers for its capturing parentheses. Such a subpattern starts with
1006  (?| and is itself a non-capturing subpattern. For example, consider this  (?| and is itself a non-capturing subpattern. For example, consider this
1007  pattern:  pattern:
1008  .sp  .sp
1009    (?|(Sat)ur|(Sun))day    (?|(Sat)ur|(Sun))day
1010  .sp  .sp
1011  Because the two alternatives are inside a (?| group, both sets of capturing  Because the two alternatives are inside a (?| group, both sets of capturing
1012  parentheses are numbered one. Thus, when the pattern matches, you can look  parentheses are numbered one. Thus, when the pattern matches, you can look
1013  at captured substring number one, whichever alternative matched. This construct  at captured substring number one, whichever alternative matched. This construct
1014  is useful when you want to capture part, but not all, of one of a number of  is useful when you want to capture part, but not all, of one of a number of
1015  alternatives. Inside a (?| group, parentheses are numbered as usual, but the  alternatives. Inside a (?| group, parentheses are numbered as usual, but the
1016  number is reset at the start of each branch. The numbers of any capturing  number is reset at the start of each branch. The numbers of any capturing
1017  buffers that follow the subpattern start after the highest number used in any  buffers that follow the subpattern start after the highest number used in any
1018  branch. The following example is taken from the Perl documentation.  branch. The following example is taken from the Perl documentation.
1019  The numbers underneath show in which buffer the captured content will be  The numbers underneath show in which buffer the captured content will be
1020  stored.  stored.
1021  .sp  .sp
1022    # before  ---------------branch-reset----------- after    # before  ---------------branch-reset----------- after
1023    / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x    / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
1024    # 1            2         2  3        2     3     4    # 1            2         2  3        2     3     4
1025  .sp  .sp
1026  A backreference or a recursive call to a numbered subpattern always refers to  A backreference or a recursive call to a numbered subpattern always refers to
1027  the first one in the pattern with the given number.  the first one in the pattern with the given number.
1028  .P  .P
# Line 1079  abbreviation. This pattern (ignoring the Line 1079  abbreviation. This pattern (ignoring the
1079    (?<DN>Sat)(?:urday)?    (?<DN>Sat)(?:urday)?
1080  .sp  .sp
1081  There are five capturing substrings, but only one is ever set after a match.  There are five capturing substrings, but only one is ever set after a match.
1082  (An alternative way of solving this problem is to use a "branch reset"  (An alternative way of solving this problem is to use a "branch reset"
1083  subpattern, as described in the previous section.)  subpattern, as described in the previous section.)
1084  .P  .P
1085  The convenience function for extracting the data by name returns the substring  The convenience function for extracting the data by name returns the substring

Legend:
Removed from v.181  
changed lines
  Added in v.182

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12