| 55 |
page. |
page. |
| 56 |
</P> |
</P> |
| 57 |
<P> |
<P> |
| 58 |
|
The remainder of this document discusses the patterns that are supported by |
| 59 |
|
PCRE when its main matching function, <b>pcre_exec()</b>, is used. |
| 60 |
|
From release 6.0, PCRE offers a second matching function, |
| 61 |
|
<b>pcre_dfa_exec()</b>, which matches using a different algorithm that is not |
| 62 |
|
Perl-compatible. The advantages and disadvantages of the alternative function, |
| 63 |
|
and how it differs from the normal function, are discussed in the |
| 64 |
|
<a href="pcrematching.html"><b>pcrematching</b></a> |
| 65 |
|
page. |
| 66 |
|
</P> |
| 67 |
|
<P> |
| 68 |
A regular expression is a pattern that is matched against a subject string from |
A regular expression is a pattern that is matched against a subject string from |
| 69 |
left to right. Most characters stand for themselves in a pattern, and match the |
left to right. Most characters stand for themselves in a pattern, and match the |
| 70 |
corresponding characters in the subject. As a trivial example, the pattern |
corresponding characters in the subject. As a trivial example, the pattern |
| 71 |
<pre> |
<pre> |
| 72 |
The quick brown fox |
The quick brown fox |
| 73 |
</pre> |
</pre> |
| 74 |
matches a portion of a subject string that is identical to itself. The power of |
matches a portion of a subject string that is identical to itself. When |
| 75 |
regular expressions comes from the ability to include alternatives and |
caseless matching is specified (the PCRE_CASELESS option), letters are matched |
| 76 |
repetitions in the pattern. These are encoded in the pattern by the use of |
independently of case. In UTF-8 mode, PCRE always understands the concept of |
| 77 |
|
case for characters whose values are less than 128, so caseless matching is |
| 78 |
|
always possible. For characters with higher values, the concept of case is |
| 79 |
|
supported if PCRE is compiled with Unicode property support, but not otherwise. |
| 80 |
|
If you want to use caseless matching for characters 128 and above, you must |
| 81 |
|
ensure that PCRE is compiled with Unicode property support as well as with |
| 82 |
|
UTF-8 support. |
| 83 |
|
</P> |
| 84 |
|
<P> |
| 85 |
|
The power of regular expressions comes from the ability to include alternatives |
| 86 |
|
and repetitions in the pattern. These are encoded in the pattern by the use of |
| 87 |
<i>metacharacters</i>, which do not stand for themselves but instead are |
<i>metacharacters</i>, which do not stand for themselves but instead are |
| 88 |
interpreted in some special way. |
interpreted in some special way. |
| 89 |
</P> |
</P> |
| 556 |
When caseless matching is set, any letters in a class represent both their |
When caseless matching is set, any letters in a class represent both their |
| 557 |
upper case and lower case versions, so for example, a caseless [aeiou] matches |
upper case and lower case versions, so for example, a caseless [aeiou] matches |
| 558 |
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a |
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a |
| 559 |
caseful version would. When running in UTF-8 mode, PCRE supports the concept of |
caseful version would. In UTF-8 mode, PCRE always understands the concept of |
| 560 |
case for characters with values greater than 128 only when it is compiled with |
case for characters whose values are less than 128, so caseless matching is |
| 561 |
Unicode property support. |
always possible. For characters with higher values, the concept of case is |
| 562 |
|
supported if PCRE is compiled with Unicode property support, but not otherwise. |
| 563 |
|
If you want to use caseless matching for characters 128 and above, you must |
| 564 |
|
ensure that PCRE is compiled with Unicode property support as well as with |
| 565 |
|
UTF-8 support. |
| 566 |
</P> |
</P> |
| 567 |
<P> |
<P> |
| 568 |
The newline character is never treated in any special way in character classes, |
The newline character is never treated in any special way in character classes, |
| 1486 |
documentation. |
documentation. |
| 1487 |
</P> |
</P> |
| 1488 |
<P> |
<P> |
| 1489 |
Last updated: 09 September 2004 |
Last updated: 28 February 2005 |
| 1490 |
<br> |
<br> |
| 1491 |
Copyright © 1997-2004 University of Cambridge. |
Copyright © 1997-2005 University of Cambridge. |
| 1492 |
<p> |
<p> |
| 1493 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |
| 1494 |
</p> |
</p> |