| 32 |
though the details differ between the two matching functions. If both options |
though the details differ between the two matching functions. If both options |
| 33 |
are set, PCRE_PARTIAL_HARD takes precedence. |
are set, PCRE_PARTIAL_HARD takes precedence. |
| 34 |
.P |
.P |
| 35 |
Setting a partial matching option disables one of PCRE's optimizations. PCRE |
Setting a partial matching option disables two of PCRE's optimizations. PCRE |
| 36 |
remembers the last literal byte in a pattern, and abandons matching immediately |
remembers the last literal byte in a pattern, and abandons matching immediately |
| 37 |
if such a byte is not present in the subject string. This optimization cannot |
if such a byte is not present in the subject string. This optimization cannot |
| 38 |
be used for a subject string that might match only partially. |
be used for a subject string that might match only partially. If the pattern |
| 39 |
|
was studied, PCRE knows the minimum length of a matching string, and does not |
| 40 |
|
bother to run the matching function on shorter strings. This optimization is |
| 41 |
|
also disabled for partial matching. |
| 42 |
. |
. |
| 43 |
. |
. |
| 44 |
.SH "PARTIAL MATCHING USING pcre_exec()" |
.SH "PARTIAL MATCHING USING pcre_exec()" |
| 56 |
vector, the first of them is set to the offset of the earliest character that |
vector, the first of them is set to the offset of the earliest character that |
| 57 |
was inspected when the partial match was found. For convenience, the second |
was inspected when the partial match was found. For convenience, the second |
| 58 |
offset points to the end of the string so that a substring can easily be |
offset points to the end of the string so that a substring can easily be |
| 59 |
extracted. |
identified. |
| 60 |
.P |
.P |
| 61 |
For the majority of patterns, the first offset identifies the start of the |
For the majority of patterns, the first offset identifies the start of the |
| 62 |
partially matched string. However, for patterns that contain lookbehind |
partially matched string. However, for patterns that contain lookbehind |
| 139 |
.SH "PARTIAL MATCHING AND WORD BOUNDARIES" |
.SH "PARTIAL MATCHING AND WORD BOUNDARIES" |
| 140 |
.rs |
.rs |
| 141 |
.sp |
.sp |
| 142 |
If a pattern ends with one of sequences \ew or \eW, which test for word |
If a pattern ends with one of sequences \eb or \eB, which test for word |
| 143 |
boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive |
boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive |
| 144 |
results. Consider this pattern: |
results. Consider this pattern: |
| 145 |
.sp |
.sp |
| 247 |
data> The date is 23ja\eP |
data> The date is 23ja\eP |
| 248 |
Partial match: 23ja |
Partial match: 23ja |
| 249 |
.sp |
.sp |
| 250 |
The this stage, an application could discard the text preceding "23ja", add on |
At this stage, an application could discard the text preceding "23ja", add on |
| 251 |
text from the next segment, and call \fBpcre_exec()\fP again. Unlike |
text from the next segment, and call \fBpcre_exec()\fP again. Unlike |
| 252 |
\fBpcre_dfa_exec()\fP, the entire matching string must always be available, and |
\fBpcre_dfa_exec()\fP, the entire matching string must always be available, and |
| 253 |
the complete matching process occurs for each call, so more memory and more |
the complete matching process occurs for each call, so more memory and more |
| 320 |
.P |
.P |
| 321 |
4. Patterns that contain alternatives at the top level which do not all |
4. Patterns that contain alternatives at the top level which do not all |
| 322 |
start with the same pattern item may not work as expected when |
start with the same pattern item may not work as expected when |
| 323 |
\fBpcre_dfa_exec()\fP is used. For example, consider this pattern: |
PCRE_DFA_RESTART is used with \fBpcre_dfa_exec()\fP. For example, consider this |
| 324 |
|
pattern: |
| 325 |
.sp |
.sp |
| 326 |
1234|3789 |
1234|3789 |
| 327 |
.sp |
.sp |
| 337 |
1234|ABCD |
1234|ABCD |
| 338 |
.sp |
.sp |
| 339 |
where no string can be a partial match for both alternatives. This is not a |
where no string can be a partial match for both alternatives. This is not a |
| 340 |
problem if \fPpcre_exec()\fP is used, because the entire match has to be rerun |
problem if \fBpcre_exec()\fP is used, because the entire match has to be rerun |
| 341 |
each time: |
each time: |
| 342 |
.sp |
.sp |
| 343 |
re> /1234|3789/ |
re> /1234|3789/ |
| 346 |
data> 1237890 |
data> 1237890 |
| 347 |
0: 3789 |
0: 3789 |
| 348 |
.sp |
.sp |
| 349 |
|
Of course, instead of using PCRE_DFA_PARTIAL, the same technique of re-running |
| 350 |
|
the entire match can also be used with \fBpcre_dfa_exec()\fP. Another |
| 351 |
|
possibility is to work with two buffers. If a partial match at offset \fIn\fP |
| 352 |
|
in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on |
| 353 |
|
the second buffer, you can then try a new match starting at offset \fIn+1\fP in |
| 354 |
|
the first buffer. |
| 355 |
. |
. |
| 356 |
. |
. |
| 357 |
.SH AUTHOR |
.SH AUTHOR |
| 368 |
.rs |
.rs |
| 369 |
.sp |
.sp |
| 370 |
.nf |
.nf |
| 371 |
Last updated: 05 September 2009 |
Last updated: 19 October 2009 |
| 372 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
| 373 |
.fi |
.fi |