/[pcre]/code/trunk/doc/html/pcrepartial.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcrepartial.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 99 - (hide annotations) (download) (as text)
Tue Mar 6 12:27:42 2007 UTC (7 years, 6 months ago) by ph10
File MIME type: text/html
File size: 9733 byte(s)
1. Move the comment about version numbers from pcre.h.in to configure.ac 
because that's where they are now set.
2. Update all the man pages to remove the use of .br and .in because this
causes trouble for some HTML converters. Also standardised the final sections 
giving author information and revision date.
3. Update the maintain/132html man page converter to handle .nf/.fi and to barf 
at .br/.in.

1 nigel 75 <html>
2     <head>
3     <title>pcrepartial specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6     <h1>pcrepartial man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10 ph10 99 <p>
11 nigel 75 This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14 ph10 99 <br>
15 nigel 75 <ul>
16     <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
17     <li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
18     <li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
19 nigel 77 <li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
20 ph10 99 <li><a name="TOC5" href="#SEC5">AUTHOR</a>
21     <li><a name="TOC6" href="#SEC6">REVISION</a>
22 nigel 75 </ul>
23     <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
24     <P>
25     In normal use of PCRE, if the subject string that is passed to
26 nigel 77 <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
27     too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
28     are circumstances where it might be helpful to distinguish this case from other
29     cases in which there is no match.
30 nigel 75 </P>
31     <P>
32     Consider, for example, an application where a human is required to type in data
33     for a field with specific formatting requirements. An example might be a date
34     in the form <i>ddmmmyy</i>, defined by this pattern:
35     <pre>
36     ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
37     </pre>
38     If the application sees the user's keystrokes one by one, and can check that
39     what has been typed so far is potentially valid, it is able to raise an error
40     as soon as a mistake is made, possibly beeping and not reflecting the
41     character that has been typed. This immediate feedback is likely to be a better
42     user interface than a check that is delayed until the entire string has been
43     entered.
44     </P>
45     <P>
46     PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
47 nigel 77 option, which can be set when calling <b>pcre_exec()</b> or
48     <b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
49     code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
50     during the matching process the last part of the subject string matched part of
51     the pattern. Unfortunately, for non-anchored matching, it is not possible to
52     obtain the position of the start of the partial match. No captured data is set
53     when PCRE_ERROR_PARTIAL is returned.
54 nigel 75 </P>
55     <P>
56 nigel 77 When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
57     PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
58     subject is reached, there have been no complete matches, but there is still at
59     least one matching possibility. The portion of the string that provided the
60     partial match is set as the first matching string.
61     </P>
62     <P>
63 nigel 75 Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
64     last literal byte in a pattern, and abandons matching immediately if such a
65     byte is not present in the subject string. This optimization cannot be used
66     for a subject string that might match only partially.
67     </P>
68     <br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
69     <P>
70 nigel 77 Because of the way certain internal optimizations are implemented in the
71     <b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
72     patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
73     For <b>pcre_exec()</b>, repeated single characters such as
74 nigel 75 <pre>
75     a{2,4}
76     </pre>
77     and repeated single metasequences such as
78     <pre>
79     \d+
80     </pre>
81     are not permitted if the maximum number of occurrences is greater than one.
82     Optional items such as \d? (where the maximum is one) are permitted.
83     Quantifiers with any values are permitted after parentheses, so the invalid
84     examples above can be coded thus:
85     <pre>
86     (a){2,4}
87     (\d)+
88     </pre>
89     These constructions run more slowly, but for the kinds of application that are
90     envisaged for this facility, this is not felt to be a major restriction.
91     </P>
92     <P>
93     If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
94     <b>pcre_exec()</b> returns the error code PCRE_ERROR_BADPARTIAL (-13).
95     </P>
96     <br><a name="SEC3" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
97     <P>
98     If the escape sequence \P is present in a <b>pcretest</b> data line, the
99     PCRE_PARTIAL flag is used for the match. Here is a run of <b>pcretest</b> that
100     uses the date example quoted above:
101     <pre>
102     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
103     data&#62; 25jun04\P
104     0: 25jun04
105     1: jun
106     data&#62; 25dec3\P
107     Partial match
108     data&#62; 3ju\P
109     Partial match
110     data&#62; 3juj\P
111     No match
112     data&#62; j\P
113     No match
114     </pre>
115     The first data string is matched completely, so <b>pcretest</b> shows the
116     matched substrings. The remaining four strings do not match the complete
117 nigel 93 pattern, but the first two are partial matches. The same test, using
118     <b>pcre_dfa_exec()</b> matching (by means of the \D escape sequence), produces
119     the following output:
120 nigel 77 <pre>
121     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
122     data&#62; 25jun04\P\D
123     0: 25jun04
124     data&#62; 23dec3\P\D
125     Partial match: 23dec3
126     data&#62; 3ju\P\D
127     Partial match: 3ju
128     data&#62; 3juj\P\D
129     No match
130     data&#62; j\P\D
131     No match
132     </pre>
133     Notice that in this case the portion of the string that was matched is made
134     available.
135 nigel 75 </P>
136 nigel 77 <br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
137 nigel 75 <P>
138 nigel 77 When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
139     to continue the match by providing additional subject data and calling
140 nigel 93 <b>pcre_dfa_exec()</b> again with the same compiled regular expression, this
141     time setting the PCRE_DFA_RESTART option. You must also pass the same working
142     space as before, because this is where details of the previous partial match
143     are stored. Here is an example using <b>pcretest</b>, using the \R escape
144     sequence to set the PCRE_DFA_RESTART option (\P and \D are as above):
145 nigel 77 <pre>
146     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
147     data&#62; 23ja\P\D
148     Partial match: 23ja
149     data&#62; n05\R\D
150     0: n05
151     </pre>
152     The first call has "23ja" as the subject, and requests partial matching; the
153     second call has "n05" as the subject for the continued (restarted) match.
154     Notice that when the match is complete, only the last part is shown; PCRE does
155     not retain the previously partially-matched string. It is up to the calling
156     program to do that if it needs to.
157     </P>
158     <P>
159 nigel 93 You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial matching
160     over multiple segments. This facility can be used to pass very long subject
161     strings to <b>pcre_dfa_exec()</b>. However, some care is needed for certain
162     types of pattern.
163 nigel 77 </P>
164     <P>
165     1. If the pattern contains tests for the beginning or end of a line, you need
166     to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
167     subject string for any call does not contain the beginning or end of a line.
168     </P>
169     <P>
170     2. If the pattern contains backward assertions (including \b or \B), you need
171     to arrange for some overlap in the subject strings to allow for this. For
172 nigel 93 example, you could pass the subject in chunks that are 500 bytes long, but in
173 nigel 77 a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
174     bytes at the start of the buffer.
175     </P>
176     <P>
177     3. Matching a subject string that is split into multiple segments does not
178     always produce exactly the same result as matching over one single long string.
179     The difference arises when there are multiple matching possibilities, because a
180     partial match result is given only when there are no completed matches in a
181     call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
182     been found, continuation to a new subject segment is no longer possible.
183     Consider this <b>pcretest</b> example:
184     <pre>
185     re&#62; /dog(sbody)?/
186     data&#62; do\P\D
187     Partial match: do
188     data&#62; gsb\R\P\D
189     0: g
190     data&#62; dogsbody\D
191     0: dogsbody
192     1: dog
193     </pre>
194     The pattern matches the words "dog" or "dogsbody". When the subject is
195     presented in several parts ("do" and "gsb" being the first two) the match stops
196     when "dog" has been found, and it is not possible to continue. On the other
197     hand, if "dogsbody" is presented as a single string, both matches are found.
198     </P>
199     <P>
200     Because of this phenomenon, it does not usually make sense to end a pattern
201     that is going to be matched in this way with a variable repeat.
202     </P>
203     <P>
204 nigel 87 4. Patterns that contain alternatives at the top level which do not all
205     start with the same pattern item may not work as expected. For example,
206     consider this pattern:
207     <pre>
208     1234|3789
209     </pre>
210     If the first part of the subject is "ABC123", a partial match of the first
211     alternative is found at offset 3. There is no partial match for the second
212     alternative, because such a match does not start at the same point in the
213     subject string. Attempting to continue with the string "789" does not yield a
214     match because only those alternatives that match at one point in the subject
215     are remembered. The problem arises because the start of the second alternative
216     matches within the first alternative. There is no problem with anchored
217     patterns or patterns such as:
218     <pre>
219     1234|ABCD
220     </pre>
221     where no string can be a partial match for both alternatives.
222     </P>
223 ph10 99 <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
224 nigel 87 <P>
225 ph10 99 Philip Hazel
226 nigel 75 <br>
227 ph10 99 University Computing Service
228     <br>
229     Cambridge CB2 3QH, England.
230     <br>
231     </P>
232     <br><a name="SEC6" href="#TOC1">REVISION</a><br>
233     <P>
234     Last updated: 06 March 2007
235     <br>
236     Copyright &copy; 1997-2007 University of Cambridge.
237     <br>
238 nigel 75 <p>
239     Return to the <a href="index.html">PCRE index page</a>.
240     </p>

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12