| 61 |
</P> |
</P> |
| 62 |
<br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br> |
<br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br> |
| 63 |
<P> |
<P> |
| 64 |
In the terminology of Jeffrey Friedl's book \fIMastering Regular |
In the terminology of Jeffrey Friedl's book "Mastering Regular |
| 65 |
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a |
Expressions", the standard algorithm is an "NFA algorithm". It conducts a |
| 66 |
depth-first search of the pattern tree. That is, it proceeds along a single |
depth-first search of the pattern tree. That is, it proceeds along a single |
| 67 |
path through the tree, checking that the subject matches what is required. When |
path through the tree, checking that the subject matches what is required. When |
| 68 |
there is a mismatch, the algorithm tries any alternatives at the current point, |
there is a mismatch, the algorithm tries any alternatives at the current point, |
| 148 |
condition or test for a specific group recursion are not supported. |
condition or test for a specific group recursion are not supported. |
| 149 |
</P> |
</P> |
| 150 |
<P> |
<P> |
| 151 |
5. Callouts are supported, but the value of the <i>capture_top</i> field is |
5. Because many paths through the tree may be active, the \K escape sequence, |
| 152 |
|
which resets the start of the match when encountered (but may be on some paths |
| 153 |
|
and not on others), is not supported. It causes an error if encountered. |
| 154 |
|
</P> |
| 155 |
|
<P> |
| 156 |
|
6. Callouts are supported, but the value of the <i>capture_top</i> field is |
| 157 |
always 1, and the value of the <i>capture_last</i> field is always -1. |
always 1, and the value of the <i>capture_last</i> field is always -1. |
| 158 |
</P> |
</P> |
| 159 |
<P> |
<P> |
| 160 |
6. |
7. The \C escape sequence, which (in the standard algorithm) matches a single |
|
The \C escape sequence, which (in the standard algorithm) matches a single |
|
| 161 |
byte, even in UTF-8 mode, is not supported because the alternative algorithm |
byte, even in UTF-8 mode, is not supported because the alternative algorithm |
| 162 |
moves through the subject string one character at a time, for all active paths |
moves through the subject string one character at a time, for all active paths |
| 163 |
through the tree. |
through the tree. |
| 164 |
</P> |
</P> |
| 165 |
|
<P> |
| 166 |
|
8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not |
| 167 |
|
supported. (*FAIL) is supported, and behaves like a failing negative assertion. |
| 168 |
|
</P> |
| 169 |
<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br> |
<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br> |
| 170 |
<P> |
<P> |
| 171 |
Using the alternative matching algorithm provides the following advantages: |
Using the alternative matching algorithm provides the following advantages: |
| 177 |
callouts. |
callouts. |
| 178 |
</P> |
</P> |
| 179 |
<P> |
<P> |
| 180 |
2. There is much better support for partial matching. The restrictions on the |
2. Because the alternative algorithm scans the subject string just once, and |
|
content of the pattern that apply when using the standard algorithm for partial |
|
|
matching do not apply to the alternative algorithm. For non-anchored patterns, |
|
|
the starting position of a partial match is available. |
|
|
</P> |
|
|
<P> |
|
|
3. Because the alternative algorithm scans the subject string just once, and |
|
| 181 |
never needs to backtrack, it is possible to pass very long subject strings to |
never needs to backtrack, it is possible to pass very long subject strings to |
| 182 |
the matching function in several pieces, checking for partial matching each |
the matching function in several pieces, checking for partial matching each |
| 183 |
time. |
time. |
| 209 |
</P> |
</P> |
| 210 |
<br><a name="SEC8" href="#TOC1">REVISION</a><br> |
<br><a name="SEC8" href="#TOC1">REVISION</a><br> |
| 211 |
<P> |
<P> |
| 212 |
Last updated: 06 March 2007 |
Last updated: 25 August 2009 |
| 213 |
<br> |
<br> |
| 214 |
Copyright © 1997-2007 University of Cambridge. |
Copyright © 1997-2009 University of Cambridge. |
| 215 |
<br> |
<br> |
| 216 |
<p> |
<p> |
| 217 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |