# Diff of /code/trunk/doc/html/pcrematching.html

revision 111 by ph10, Thu Mar 8 16:53:09 2007 UTC revision 429 by ph10, Tue Sep 1 16:10:16 2009 UTC
# Line 61  correspond to the two matching algorithm Line 61  correspond to the two matching algorithm
61  </P>  </P>
62  <br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>  <br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>
63  <P>  <P>
64  In the terminology of Jeffrey Friedl's book \fIMastering Regular  In the terminology of Jeffrey Friedl's book "Mastering Regular
65  Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a  Expressions", the standard algorithm is an "NFA algorithm". It conducts a
66  depth-first search of the pattern tree. That is, it proceeds along a single  depth-first search of the pattern tree. That is, it proceeds along a single
67  path through the tree, checking that the subject matches what is required. When  path through the tree, checking that the subject matches what is required. When
68  there is a mismatch, the algorithm tries any alternatives at the current point,  there is a mismatch, the algorithm tries any alternatives at the current point,
# Line 148  not supported, and cause errors if encou Line 148  not supported, and cause errors if encou
148  condition or test for a specific group recursion are not supported.  condition or test for a specific group recursion are not supported.
149  </P>  </P>
150  <P>  <P>
151  5. Callouts are supported, but the value of the <i>capture_top</i> field is  5. Because many paths through the tree may be active, the \K escape sequence,
152    which resets the start of the match when encountered (but may be on some paths
153    and not on others), is not supported. It causes an error if encountered.
154    </P>
155    <P>
156    6. Callouts are supported, but the value of the <i>capture_top</i> field is
157  always 1, and the value of the <i>capture_last</i> field is always -1.  always 1, and the value of the <i>capture_last</i> field is always -1.
158  </P>  </P>
159  <P>  <P>
160  6.  7. The \C escape sequence, which (in the standard algorithm) matches a single
The \C escape sequence, which (in the standard algorithm) matches a single
161  byte, even in UTF-8 mode, is not supported because the alternative algorithm  byte, even in UTF-8 mode, is not supported because the alternative algorithm
162  moves through the subject string one character at a time, for all active paths  moves through the subject string one character at a time, for all active paths
163  through the tree.  through the tree.
164  </P>  </P>
165    <P>
166    8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
167    supported. (*FAIL) is supported, and behaves like a failing negative assertion.
168    </P>
169  <br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>  <br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
170  <P>  <P>
171  Using the alternative matching algorithm provides the following advantages:  Using the alternative matching algorithm provides the following advantages:
# Line 169  match using the standard algorithm, you Line 177  match using the standard algorithm, you
177  callouts.  callouts.
178  </P>  </P>
179  <P>  <P>
180  2. There is much better support for partial matching. The restrictions on the  2. Because the alternative algorithm scans the subject string just once, and
content of the pattern that apply when using the standard algorithm for partial
matching do not apply to the alternative algorithm. For non-anchored patterns,
the starting position of a partial match is available.
</P>
<P>
3. Because the alternative algorithm scans the subject string just once, and
181  never needs to backtrack, it is possible to pass very long subject strings to  never needs to backtrack, it is possible to pass very long subject strings to
182  the matching function in several pieces, checking for partial matching each  the matching function in several pieces, checking for partial matching each
183  time.  time.
# Line 207  Cambridge CB2 3QH, England. Line 209  Cambridge CB2 3QH, England.
209  </P>  </P>
210  <br><a name="SEC8" href="#TOC1">REVISION</a><br>  <br><a name="SEC8" href="#TOC1">REVISION</a><br>
211  <P>  <P>
212  Last updated: 06 March 2007  Last updated: 25 August 2009
213  <br>  <br>
214  Copyright &copy; 1997-2007 University of Cambridge.  Copyright &copy; 1997-2009 University of Cambridge.
215  <br>  <br>
216  <p>  <p>