| 260 |
Another use of backslash is for specifying generic character types. The |
Another use of backslash is for specifying generic character types. The |
| 261 |
following are always recognized: |
following are always recognized: |
| 262 |
.sp |
.sp |
| 263 |
\ed any decimal digit |
\ed any decimal digit |
| 264 |
\eD any character that is not a decimal digit |
\eD any character that is not a decimal digit |
| 265 |
\eh any horizontal whitespace character |
\eh any horizontal whitespace character |
| 266 |
\eH any character that is not a horizontal whitespace character |
\eH any character that is not a horizontal whitespace character |
| 267 |
\es any whitespace character |
\es any whitespace character |
| 268 |
\eS any character that is not a whitespace character |
\eS any character that is not a whitespace character |
| 269 |
\ev any vertical whitespace character |
\ev any vertical whitespace character |
| 270 |
\eV any character that is not a vertical whitespace character |
\eV any character that is not a vertical whitespace character |
| 271 |
\ew any "word" character |
\ew any "word" character |
| 272 |
\eW any "non-word" character |
\eW any "non-word" character |
| 273 |
.sp |
.sp |
| 287 |
.P |
.P |
| 288 |
In UTF-8 mode, characters with values greater than 128 never match \ed, \es, or |
In UTF-8 mode, characters with values greater than 128 never match \ed, \es, or |
| 289 |
\ew, and always match \eD, \eS, and \eW. This is true even when Unicode |
\ew, and always match \eD, \eS, and \eW. This is true even when Unicode |
| 290 |
character property support is available. These sequences retain their original |
character property support is available. These sequences retain their original |
| 291 |
meanings from before UTF-8 support was available, mainly for efficiency |
meanings from before UTF-8 support was available, mainly for efficiency |
| 292 |
reasons. |
reasons. |
| 293 |
.P |
.P |
| 294 |
The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the |
The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the |
| 295 |
other sequences, these do match certain high-valued codepoints in UTF-8 mode. |
other sequences, these do match certain high-valued codepoints in UTF-8 mode. |
| 296 |
The horizontal space characters are: |
The horizontal space characters are: |
| 297 |
.sp |
.sp |
| 1001 |
.SH "DUPLICATE SUBPATTERN NUMBERS" |
.SH "DUPLICATE SUBPATTERN NUMBERS" |
| 1002 |
.rs |
.rs |
| 1003 |
.sp |
.sp |
| 1004 |
Perl 5.10 introduced a feature whereby each alternative in a subpattern uses |
Perl 5.10 introduced a feature whereby each alternative in a subpattern uses |
| 1005 |
the same numbers for its capturing parentheses. Such a subpattern starts with |
the same numbers for its capturing parentheses. Such a subpattern starts with |
| 1006 |
(?| and is itself a non-capturing subpattern. For example, consider this |
(?| and is itself a non-capturing subpattern. For example, consider this |
| 1007 |
pattern: |
pattern: |
| 1008 |
.sp |
.sp |
| 1009 |
(?|(Sat)ur|(Sun))day |
(?|(Sat)ur|(Sun))day |
| 1010 |
.sp |
.sp |
| 1011 |
Because the two alternatives are inside a (?| group, both sets of capturing |
Because the two alternatives are inside a (?| group, both sets of capturing |
| 1012 |
parentheses are numbered one. Thus, when the pattern matches, you can look |
parentheses are numbered one. Thus, when the pattern matches, you can look |
| 1013 |
at captured substring number one, whichever alternative matched. This construct |
at captured substring number one, whichever alternative matched. This construct |
| 1014 |
is useful when you want to capture part, but not all, of one of a number of |
is useful when you want to capture part, but not all, of one of a number of |
| 1015 |
alternatives. Inside a (?| group, parentheses are numbered as usual, but the |
alternatives. Inside a (?| group, parentheses are numbered as usual, but the |
| 1016 |
number is reset at the start of each branch. The numbers of any capturing |
number is reset at the start of each branch. The numbers of any capturing |
| 1017 |
buffers that follow the subpattern start after the highest number used in any |
buffers that follow the subpattern start after the highest number used in any |
| 1018 |
branch. The following example is taken from the Perl documentation. |
branch. The following example is taken from the Perl documentation. |
| 1019 |
The numbers underneath show in which buffer the captured content will be |
The numbers underneath show in which buffer the captured content will be |
| 1020 |
stored. |
stored. |
| 1021 |
.sp |
.sp |
| 1022 |
# before ---------------branch-reset----------- after |
# before ---------------branch-reset----------- after |
| 1023 |
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x |
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x |
| 1024 |
# 1 2 2 3 2 3 4 |
# 1 2 2 3 2 3 4 |
| 1025 |
.sp |
.sp |
| 1026 |
A backreference or a recursive call to a numbered subpattern always refers to |
A backreference or a recursive call to a numbered subpattern always refers to |
| 1027 |
the first one in the pattern with the given number. |
the first one in the pattern with the given number. |
| 1028 |
.P |
.P |
| 1079 |
(?<DN>Sat)(?:urday)? |
(?<DN>Sat)(?:urday)? |
| 1080 |
.sp |
.sp |
| 1081 |
There are five capturing substrings, but only one is ever set after a match. |
There are five capturing substrings, but only one is ever set after a match. |
| 1082 |
(An alternative way of solving this problem is to use a "branch reset" |
(An alternative way of solving this problem is to use a "branch reset" |
| 1083 |
subpattern, as described in the previous section.) |
subpattern, as described in the previous section.) |
| 1084 |
.P |
.P |
| 1085 |
The convenience function for extracting the data by name returns the substring |
The convenience function for extracting the data by name returns the substring |