| 49 |
more characters are needed. However, at least one character in the subject must |
more characters are needed. However, at least one character in the subject must |
| 50 |
have been inspected. This character need not form part of the final matched |
have been inspected. This character need not form part of the final matched |
| 51 |
string; lookbehind assertions and the \eK escape sequence provide ways of |
string; lookbehind assertions and the \eK escape sequence provide ways of |
| 52 |
inspecting characters before the start of a matched substring. The requirement |
inspecting characters before the start of a matched substring. The requirement |
| 53 |
for inspecting at least one character exists because an empty string can always |
for inspecting at least one character exists because an empty string can always |
| 54 |
be matched; without such a restriction there would always be a partial match of |
be matched; without such a restriction there would always be a partial match of |
| 55 |
an empty string at the end of the subject. |
an empty string at the end of the subject. |
| 56 |
.P |
.P |
| 57 |
If there are at least two slots in the offsets vector when \fBpcre_exec()\fP |
If there are at least two slots in the offsets vector when \fBpcre_exec()\fP |
| 58 |
returns with a partial match, the first slot is set to the offset of the |
returns with a partial match, the first slot is set to the offset of the |
| 59 |
earliest character that was inspected when the partial match was found. For |
earliest character that was inspected when the partial match was found. For |
| 60 |
convenience, the second offset points to the end of the subject so that a |
convenience, the second offset points to the end of the subject so that a |
| 61 |
substring can easily be identified. |
substring can easily be identified. |
| 62 |
.P |
.P |
| 63 |
For the majority of patterns, the first offset identifies the start of the |
For the majority of patterns, the first offset identifies the start of the |
| 64 |
partially matched string. However, for patterns that contain lookbehind |
partially matched string. However, for patterns that contain lookbehind |
| 73 |
with extra characters added to the subject. |
with extra characters added to the subject. |
| 74 |
.P |
.P |
| 75 |
What happens when a partial match is identified depends on which of the two |
What happens when a partial match is identified depends on which of the two |
| 76 |
partial matching options are set. |
partial matching options are set. |
| 77 |
. |
. |
| 78 |
. |
. |
| 79 |
.SS "PCRE_PARTIAL_SOFT with pcre_exec()" |
.SS "PCRE_PARTIAL_SOFT with pcre_exec()" |
| 84 |
alternatives in the pattern are tried. If no complete match can be found, |
alternatives in the pattern are tried. If no complete match can be found, |
| 85 |
\fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. |
\fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. |
| 86 |
.P |
.P |
| 87 |
This option is "soft" because it prefers a complete match over a partial match. |
This option is "soft" because it prefers a complete match over a partial match. |
| 88 |
All the various matching items in a pattern behave as if the subject string is |
All the various matching items in a pattern behave as if the subject string is |
| 89 |
potentially complete. For example, \ez, \eZ, and $ match at the end of the |
potentially complete. For example, \ez, \eZ, and $ match at the end of the |
| 90 |
subject, as normal, and for \eb and \eB the end of the subject is treated as a |
subject, as normal, and for \eb and \eB the end of the subject is treated as a |
| 91 |
non-alphanumeric. |
non-alphanumeric. |
| 92 |
.P |
.P |
| 93 |
If there is more than one partial match, the first one that was found provides |
If there is more than one partial match, the first one that was found provides |
| 108 |
.sp |
.sp |
| 109 |
If PCRE_PARTIAL_HARD is set for \fBpcre_exec()\fP, it returns |
If PCRE_PARTIAL_HARD is set for \fBpcre_exec()\fP, it returns |
| 110 |
PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to |
PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to |
| 111 |
search for possible complete matches. This option is "hard" because it prefers |
search for possible complete matches. This option is "hard" because it prefers |
| 112 |
an earlier partial match over a later complete match. For this reason, the |
an earlier partial match over a later complete match. For this reason, the |
| 113 |
assumption is made that the end of the supplied subject string may not be the |
assumption is made that the end of the supplied subject string may not be the |
| 114 |
true end of the available data, and so, if \ez, \eZ, \eb, \eB, or $ are |
true end of the available data, and so, if \ez, \eZ, \eb, \eB, or $ are |
| 115 |
encountered at the end of the subject, the result is PCRE_ERROR_PARTIAL. |
encountered at the end of the subject, the result is PCRE_ERROR_PARTIAL. |
| 116 |
.P |
.P |
| 117 |
Setting PCRE_PARTIAL_HARD also affects the way \fBpcre_exec()\fP checks UTF-8 |
Setting PCRE_PARTIAL_HARD also affects the way \fBpcre_exec()\fP checks UTF-8 |
| 118 |
subject strings for validity. Normally, an invalid UTF-8 sequence causes the |
subject strings for validity. Normally, an invalid UTF-8 sequence causes the |
| 119 |
error PCRE_ERROR_BADUTF8. However, in the special case of a truncated UTF-8 |
error PCRE_ERROR_BADUTF8. However, in the special case of a truncated UTF-8 |
| 120 |
character at the end of the subject, PCRE_ERROR_SHORTUTF8 is returned when |
character at the end of the subject, PCRE_ERROR_SHORTUTF8 is returned when |
| 121 |
PCRE_PARTIAL_HARD is set. |
PCRE_PARTIAL_HARD is set. |
| 122 |
. |
. |
| 123 |
. |
. |
| 280 |
matching. Unlike \fBpcre_dfa_exec()\fP, it is not possible to restart the |
matching. Unlike \fBpcre_dfa_exec()\fP, it is not possible to restart the |
| 281 |
previous match with a new segment of data. Instead, new data must be added to |
previous match with a new segment of data. Instead, new data must be added to |
| 282 |
the previous subject string, and the entire match re-run, starting from the |
the previous subject string, and the entire match re-run, starting from the |
| 283 |
point where the partial match occurred. Earlier data can be discarded. It is |
point where the partial match occurred. Earlier data can be discarded. It is |
| 284 |
best to use PCRE_PARTIAL_HARD in this situation, because it does not treat the |
best to use PCRE_PARTIAL_HARD in this situation, because it does not treat the |
| 285 |
end of a segment as the end of the subject when matching \ez, \eZ, \eb, \eB, |
end of a segment as the end of the subject when matching \ez, \eZ, \eb, \eB, |
| 286 |
and $. Consider an unanchored pattern that matches dates: |
and $. Consider an unanchored pattern that matches dates: |
| 287 |
.sp |
.sp |
| 309 |
.P |
.P |
| 310 |
1. If the pattern contains a test for the beginning of a line, you need to pass |
1. If the pattern contains a test for the beginning of a line, you need to pass |
| 311 |
the PCRE_NOTBOL option when the subject string for any call does start at the |
the PCRE_NOTBOL option when the subject string for any call does start at the |
| 312 |
beginning of a line. There is also a PCRE_NOTEOL option, but in practice when |
beginning of a line. There is also a PCRE_NOTEOL option, but in practice when |
| 313 |
doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which |
doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which |
| 314 |
includes the effect of PCRE_NOTEOL. |
includes the effect of PCRE_NOTEOL. |
| 315 |
.P |
.P |
| 316 |
2. Lookbehind assertions at the start of a pattern are catered for in the |
2. Lookbehind assertions at the start of a pattern are catered for in the |