| 26 |
.\" |
.\" |
| 27 |
page. |
page. |
| 28 |
.P |
.P |
| 29 |
|
The remainder of this document discusses the patterns that are supported by |
| 30 |
|
PCRE when its main matching function, \fBpcre_exec()\fP, is used. |
| 31 |
|
From release 6.0, PCRE offers a second matching function, |
| 32 |
|
\fBpcre_dfa_exec()\fP, which matches using a different algorithm that is not |
| 33 |
|
Perl-compatible. The advantages and disadvantages of the alternative function, |
| 34 |
|
and how it differs from the normal function, are discussed in the |
| 35 |
|
.\" HREF |
| 36 |
|
\fBpcrematching\fP |
| 37 |
|
.\" |
| 38 |
|
page. |
| 39 |
|
.P |
| 40 |
A regular expression is a pattern that is matched against a subject string from |
A regular expression is a pattern that is matched against a subject string from |
| 41 |
left to right. Most characters stand for themselves in a pattern, and match the |
left to right. Most characters stand for themselves in a pattern, and match the |
| 42 |
corresponding characters in the subject. As a trivial example, the pattern |
corresponding characters in the subject. As a trivial example, the pattern |
| 43 |
.sp |
.sp |
| 44 |
The quick brown fox |
The quick brown fox |
| 45 |
.sp |
.sp |
| 46 |
matches a portion of a subject string that is identical to itself. The power of |
matches a portion of a subject string that is identical to itself. When |
| 47 |
regular expressions comes from the ability to include alternatives and |
caseless matching is specified (the PCRE_CASELESS option), letters are matched |
| 48 |
repetitions in the pattern. These are encoded in the pattern by the use of |
independently of case. In UTF-8 mode, PCRE always understands the concept of |
| 49 |
|
case for characters whose values are less than 128, so caseless matching is |
| 50 |
|
always possible. For characters with higher values, the concept of case is |
| 51 |
|
supported if PCRE is compiled with Unicode property support, but not otherwise. |
| 52 |
|
If you want to use caseless matching for characters 128 and above, you must |
| 53 |
|
ensure that PCRE is compiled with Unicode property support as well as with |
| 54 |
|
UTF-8 support. |
| 55 |
|
.P |
| 56 |
|
The power of regular expressions comes from the ability to include alternatives |
| 57 |
|
and repetitions in the pattern. These are encoded in the pattern by the use of |
| 58 |
\fImetacharacters\fP, which do not stand for themselves but instead are |
\fImetacharacters\fP, which do not stand for themselves but instead are |
| 59 |
interpreted in some special way. |
interpreted in some special way. |
| 60 |
.P |
.P |
| 547 |
When caseless matching is set, any letters in a class represent both their |
When caseless matching is set, any letters in a class represent both their |
| 548 |
upper case and lower case versions, so for example, a caseless [aeiou] matches |
upper case and lower case versions, so for example, a caseless [aeiou] matches |
| 549 |
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a |
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a |
| 550 |
caseful version would. When running in UTF-8 mode, PCRE supports the concept of |
caseful version would. In UTF-8 mode, PCRE always understands the concept of |
| 551 |
case for characters with values greater than 128 only when it is compiled with |
case for characters whose values are less than 128, so caseless matching is |
| 552 |
Unicode property support. |
always possible. For characters with higher values, the concept of case is |
| 553 |
|
supported if PCRE is compiled with Unicode property support, but not otherwise. |
| 554 |
|
If you want to use caseless matching for characters 128 and above, you must |
| 555 |
|
ensure that PCRE is compiled with Unicode property support as well as with |
| 556 |
|
UTF-8 support. |
| 557 |
.P |
.P |
| 558 |
The newline character is never treated in any special way in character classes, |
The newline character is never treated in any special way in character classes, |
| 559 |
whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class |
whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class |
| 1475 |
documentation. |
documentation. |
| 1476 |
.P |
.P |
| 1477 |
.in 0 |
.in 0 |
| 1478 |
Last updated: 09 September 2004 |
Last updated: 28 February 2005 |
| 1479 |
.br |
.br |
| 1480 |
Copyright (c) 1997-2004 University of Cambridge. |
Copyright (c) 1997-2005 University of Cambridge. |