| 165 |
</P> |
</P> |
| 166 |
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br> |
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br> |
| 167 |
<P> |
<P> |
| 168 |
If a pattern ends with one of sequences \w or \W, which test for word |
If a pattern ends with one of sequences \b or \B, which test for word |
| 169 |
boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive |
boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive |
| 170 |
results. Consider this pattern: |
results. Consider this pattern: |
| 171 |
<pre> |
<pre> |
| 269 |
data> The date is 23ja\P |
data> The date is 23ja\P |
| 270 |
Partial match: 23ja |
Partial match: 23ja |
| 271 |
</pre> |
</pre> |
| 272 |
The this stage, an application could discard the text preceding "23ja", add on |
At this stage, an application could discard the text preceding "23ja", add on |
| 273 |
text from the next segment, and call <b>pcre_exec()</b> again. Unlike |
text from the next segment, and call <b>pcre_exec()</b> again. Unlike |
| 274 |
<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and |
<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and |
| 275 |
the complete matching process occurs for each call, so more memory and more |
the complete matching process occurs for each call, so more memory and more |
| 347 |
<P> |
<P> |
| 348 |
4. Patterns that contain alternatives at the top level which do not all |
4. Patterns that contain alternatives at the top level which do not all |
| 349 |
start with the same pattern item may not work as expected when |
start with the same pattern item may not work as expected when |
| 350 |
<b>pcre_dfa_exec()</b> is used. For example, consider this pattern: |
PCRE_DFA_RESTART is used with <b>pcre_dfa_exec()</b>. For example, consider this |
| 351 |
|
pattern: |
| 352 |
<pre> |
<pre> |
| 353 |
1234|3789 |
1234|3789 |
| 354 |
</pre> |
</pre> |
| 364 |
1234|ABCD |
1234|ABCD |
| 365 |
</pre> |
</pre> |
| 366 |
where no string can be a partial match for both alternatives. This is not a |
where no string can be a partial match for both alternatives. This is not a |
| 367 |
problem if \fPpcre_exec()\fP is used, because the entire match has to be rerun |
problem if <b>pcre_exec()</b> is used, because the entire match has to be rerun |
| 368 |
each time: |
each time: |
| 369 |
<pre> |
<pre> |
| 370 |
re> /1234|3789/ |
re> /1234|3789/ |
| 372 |
Partial match: 123 |
Partial match: 123 |
| 373 |
data> 1237890 |
data> 1237890 |
| 374 |
0: 3789 |
0: 3789 |
| 375 |
|
</pre> |
| 376 |
</PRE> |
Of course, instead of using PCRE_DFA_PARTIAL, the same technique of re-running |
| 377 |
|
the entire match can also be used with <b>pcre_dfa_exec()</b>. Another |
| 378 |
|
possibility is to work with two buffers. If a partial match at offset <i>n</i> |
| 379 |
|
in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on |
| 380 |
|
the second buffer, you can then try a new match starting at offset <i>n+1</i> in |
| 381 |
|
the first buffer. |
| 382 |
</P> |
</P> |
| 383 |
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br> |
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br> |
| 384 |
<P> |
<P> |
| 391 |
</P> |
</P> |
| 392 |
<br><a name="SEC11" href="#TOC1">REVISION</a><br> |
<br><a name="SEC11" href="#TOC1">REVISION</a><br> |
| 393 |
<P> |
<P> |
| 394 |
Last updated: 29 September 2009 |
Last updated: 19 October 2009 |
| 395 |
<br> |
<br> |
| 396 |
Copyright © 1997-2009 University of Cambridge. |
Copyright © 1997-2009 University of Cambridge. |
| 397 |
<br> |
<br> |