| 1 |
nigel |
75 |
.TH PCRE 3 |
| 2 |
|
|
.SH NAME |
| 3 |
|
|
PCRE - Perl-compatible regular expressions |
| 4 |
|
|
.SH "PARTIAL MATCHING IN PCRE" |
| 5 |
|
|
.rs |
| 6 |
|
|
.sp |
| 7 |
|
|
In normal use of PCRE, if the subject string that is passed to |
| 8 |
|
|
\fBpcre_exec()\fP matches as far as it goes, but is too short to match the |
| 9 |
|
|
entire pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where |
| 10 |
|
|
it might be helpful to distinguish this case from other cases in which there is |
| 11 |
|
|
no match. |
| 12 |
|
|
.P |
| 13 |
|
|
Consider, for example, an application where a human is required to type in data |
| 14 |
|
|
for a field with specific formatting requirements. An example might be a date |
| 15 |
|
|
in the form \fIddmmmyy\fP, defined by this pattern: |
| 16 |
|
|
.sp |
| 17 |
|
|
^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$ |
| 18 |
|
|
.sp |
| 19 |
|
|
If the application sees the user's keystrokes one by one, and can check that |
| 20 |
|
|
what has been typed so far is potentially valid, it is able to raise an error |
| 21 |
|
|
as soon as a mistake is made, possibly beeping and not reflecting the |
| 22 |
|
|
character that has been typed. This immediate feedback is likely to be a better |
| 23 |
|
|
user interface than a check that is delayed until the entire string has been |
| 24 |
|
|
entered. |
| 25 |
|
|
.P |
| 26 |
|
|
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL |
| 27 |
|
|
option, which can be set when calling \fBpcre_exec()\fP. When this is done, the |
| 28 |
|
|
return code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any |
| 29 |
|
|
time during the matching process the entire subject string matched part of the |
| 30 |
|
|
pattern. No captured data is set when this occurs. |
| 31 |
|
|
.P |
| 32 |
|
|
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the |
| 33 |
|
|
last literal byte in a pattern, and abandons matching immediately if such a |
| 34 |
|
|
byte is not present in the subject string. This optimization cannot be used |
| 35 |
|
|
for a subject string that might match only partially. |
| 36 |
|
|
. |
| 37 |
|
|
. |
| 38 |
|
|
.SH "RESTRICTED PATTERNS FOR PCRE_PARTIAL" |
| 39 |
|
|
.rs |
| 40 |
|
|
.sp |
| 41 |
|
|
Because of the way certain internal optimizations are implemented in PCRE, the |
| 42 |
|
|
PCRE_PARTIAL option cannot be used with all patterns. Repeated single |
| 43 |
|
|
characters such as |
| 44 |
|
|
.sp |
| 45 |
|
|
a{2,4} |
| 46 |
|
|
.sp |
| 47 |
|
|
and repeated single metasequences such as |
| 48 |
|
|
.sp |
| 49 |
|
|
\ed+ |
| 50 |
|
|
.sp |
| 51 |
|
|
are not permitted if the maximum number of occurrences is greater than one. |
| 52 |
|
|
Optional items such as \ed? (where the maximum is one) are permitted. |
| 53 |
|
|
Quantifiers with any values are permitted after parentheses, so the invalid |
| 54 |
|
|
examples above can be coded thus: |
| 55 |
|
|
.sp |
| 56 |
|
|
(a){2,4} |
| 57 |
|
|
(\ed)+ |
| 58 |
|
|
.sp |
| 59 |
|
|
These constructions run more slowly, but for the kinds of application that are |
| 60 |
|
|
envisaged for this facility, this is not felt to be a major restriction. |
| 61 |
|
|
.P |
| 62 |
|
|
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions, |
| 63 |
|
|
\fBpcre_exec()\fP returns the error code PCRE_ERROR_BADPARTIAL (-13). |
| 64 |
|
|
. |
| 65 |
|
|
. |
| 66 |
|
|
.SH "EXAMPLE OF PARTIAL MATCHING USING PCRETEST" |
| 67 |
|
|
.rs |
| 68 |
|
|
.sp |
| 69 |
|
|
If the escape sequence \eP is present in a \fBpcretest\fP data line, the |
| 70 |
|
|
PCRE_PARTIAL flag is used for the match. Here is a run of \fBpcretest\fP that |
| 71 |
|
|
uses the date example quoted above: |
| 72 |
|
|
.sp |
| 73 |
|
|
re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ |
| 74 |
|
|
data> 25jun04\P |
| 75 |
|
|
0: 25jun04 |
| 76 |
|
|
1: jun |
| 77 |
|
|
data> 25dec3\P |
| 78 |
|
|
Partial match |
| 79 |
|
|
data> 3ju\P |
| 80 |
|
|
Partial match |
| 81 |
|
|
data> 3juj\P |
| 82 |
|
|
No match |
| 83 |
|
|
data> j\P |
| 84 |
|
|
No match |
| 85 |
|
|
.sp |
| 86 |
|
|
The first data string is matched completely, so \fBpcretest\fP shows the |
| 87 |
|
|
matched substrings. The remaining four strings do not match the complete |
| 88 |
|
|
pattern, but the first two are partial matches. |
| 89 |
|
|
. |
| 90 |
|
|
. |
| 91 |
|
|
.P |
| 92 |
|
|
.in 0 |
| 93 |
|
|
Last updated: 08 September 2004 |
| 94 |
|
|
.br |
| 95 |
|
|
Copyright (c) 1997-2004 University of Cambridge. |