/[pcre]/code/trunk/doc/html/pcrepartial.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepartial.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 148 by ph10, Mon Apr 16 13:25:10 2007 UTC
# Line 16  man page, in case the conversion went wr Line 16  man page, in case the conversion went wr
16  <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>  <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
17  <li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>  <li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
18  <li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>  <li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
19    <li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
20    <li><a name="TOC5" href="#SEC5">AUTHOR</a>
21    <li><a name="TOC6" href="#SEC6">REVISION</a>
22  </ul>  </ul>
23  <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>  <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
24  <P>  <P>
25  In normal use of PCRE, if the subject string that is passed to  In normal use of PCRE, if the subject string that is passed to
26  <b>pcre_exec()</b> matches as far as it goes, but is too short to match the  <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
27  entire pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where  too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
28  it might be helpful to distinguish this case from other cases in which there is  are circumstances where it might be helpful to distinguish this case from other
29  no match.  cases in which there is no match.
30  </P>  </P>
31  <P>  <P>
32  Consider, for example, an application where a human is required to type in data  Consider, for example, an application where a human is required to type in data
# Line 41  entered. Line 44  entered.
44  </P>  </P>
45  <P>  <P>
46  PCRE supports the concept of partial matching by means of the PCRE_PARTIAL  PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
47  option, which can be set when calling <b>pcre_exec()</b>. When this is done, the  option, which can be set when calling <b>pcre_exec()</b> or
48  return code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any  <b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
49  time during the matching process the entire subject string matched part of the  code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
50  pattern. No captured data is set when this occurs.  during the matching process the last part of the subject string matched part of
51    the pattern. Unfortunately, for non-anchored matching, it is not possible to
52    obtain the position of the start of the partial match. No captured data is set
53    when PCRE_ERROR_PARTIAL is returned.
54    </P>
55    <P>
56    When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
57    PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
58    subject is reached, there have been no complete matches, but there is still at
59    least one matching possibility. The portion of the string that provided the
60    partial match is set as the first matching string.
61  </P>  </P>
62  <P>  <P>
63  Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the  Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
# Line 54  for a subject string that might match on Line 67  for a subject string that might match on
67  </P>  </P>
68  <br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>  <br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
69  <P>  <P>
70  Because of the way certain internal optimizations are implemented in PCRE, the  Because of the way certain internal optimizations are implemented in the
71  PCRE_PARTIAL option cannot be used with all patterns. Repeated single  <b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
72  characters such as  patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
73    For <b>pcre_exec()</b>, repeated single characters such as
74  <pre>  <pre>
75    a{2,4}    a{2,4}
76  </pre>  </pre>
# Line 100  uses the date example quoted above: Line 114  uses the date example quoted above:
114  </pre>  </pre>
115  The first data string is matched completely, so <b>pcretest</b> shows the  The first data string is matched completely, so <b>pcretest</b> shows the
116  matched substrings. The remaining four strings do not match the complete  matched substrings. The remaining four strings do not match the complete
117  pattern, but the first two are partial matches.  pattern, but the first two are partial matches. The same test, using
118    <b>pcre_dfa_exec()</b> matching (by means of the \D escape sequence), produces
119    the following output:
120    <pre>
121        re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
122      data&#62; 25jun04\P\D
123       0: 25jun04
124      data&#62; 23dec3\P\D
125      Partial match: 23dec3
126      data&#62; 3ju\P\D
127      Partial match: 3ju
128      data&#62; 3juj\P\D
129      No match
130      data&#62; j\P\D
131      No match
132    </pre>
133    Notice that in this case the portion of the string that was matched is made
134    available.
135    </P>
136    <br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
137    <P>
138    When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
139    to continue the match by providing additional subject data and calling
140    <b>pcre_dfa_exec()</b> again with the same compiled regular expression, this
141    time setting the PCRE_DFA_RESTART option. You must also pass the same working
142    space as before, because this is where details of the previous partial match
143    are stored. Here is an example using <b>pcretest</b>, using the \R escape
144    sequence to set the PCRE_DFA_RESTART option (\P and \D are as above):
145    <pre>
146        re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
147      data&#62; 23ja\P\D
148      Partial match: 23ja
149      data&#62; n05\R\D
150       0: n05
151    </pre>
152    The first call has "23ja" as the subject, and requests partial matching; the
153    second call has "n05" as the subject for the continued (restarted) match.
154    Notice that when the match is complete, only the last part is shown; PCRE does
155    not retain the previously partially-matched string. It is up to the calling
156    program to do that if it needs to.
157    </P>
158    <P>
159    You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial matching
160    over multiple segments. This facility can be used to pass very long subject
161    strings to <b>pcre_dfa_exec()</b>. However, some care is needed for certain
162    types of pattern.
163    </P>
164    <P>
165    1. If the pattern contains tests for the beginning or end of a line, you need
166    to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
167    subject string for any call does not contain the beginning or end of a line.
168    </P>
169    <P>
170    2. If the pattern contains backward assertions (including \b or \B), you need
171    to arrange for some overlap in the subject strings to allow for this. For
172    example, you could pass the subject in chunks that are 500 bytes long, but in
173    a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
174    bytes at the start of the buffer.
175    </P>
176    <P>
177    3. Matching a subject string that is split into multiple segments does not
178    always produce exactly the same result as matching over one single long string.
179    The difference arises when there are multiple matching possibilities, because a
180    partial match result is given only when there are no completed matches in a
181    call to <b>pcre_dfa_exec()</b>. This means that as soon as the shortest match has
182    been found, continuation to a new subject segment is no longer possible.
183    Consider this <b>pcretest</b> example:
184    <pre>
185        re&#62; /dog(sbody)?/
186      data&#62; do\P\D
187      Partial match: do
188      data&#62; gsb\R\P\D
189       0: g
190      data&#62; dogsbody\D
191       0: dogsbody
192       1: dog
193    </pre>
194    The pattern matches the words "dog" or "dogsbody". When the subject is
195    presented in several parts ("do" and "gsb" being the first two) the match stops
196    when "dog" has been found, and it is not possible to continue. On the other
197    hand, if "dogsbody" is presented as a single string, both matches are found.
198  </P>  </P>
199  <P>  <P>
200  Last updated: 08 September 2004  Because of this phenomenon, it does not usually make sense to end a pattern
201    that is going to be matched in this way with a variable repeat.
202    </P>
203    <P>
204    4. Patterns that contain alternatives at the top level which do not all
205    start with the same pattern item may not work as expected. For example,
206    consider this pattern:
207    <pre>
208      1234|3789
209    </pre>
210    If the first part of the subject is "ABC123", a partial match of the first
211    alternative is found at offset 3. There is no partial match for the second
212    alternative, because such a match does not start at the same point in the
213    subject string. Attempting to continue with the string "789" does not yield a
214    match because only those alternatives that match at one point in the subject
215    are remembered. The problem arises because the start of the second alternative
216    matches within the first alternative. There is no problem with anchored
217    patterns or patterns such as:
218    <pre>
219      1234|ABCD
220    </pre>
221    where no string can be a partial match for both alternatives.
222    </P>
223    <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
224    <P>
225    Philip Hazel
226    <br>
227    University Computing Service
228    <br>
229    Cambridge CB2 3QH, England.
230    <br>
231    </P>
232    <br><a name="SEC6" href="#TOC1">REVISION</a><br>
233    <P>
234    Last updated: 06 March 2007
235    <br>
236    Copyright &copy; 1997-2007 University of Cambridge.
237  <br>  <br>
 Copyright &copy; 1997-2004 University of Cambridge.  
238  <p>  <p>
239  Return to the <a href="index.html">PCRE index page</a>.  Return to the <a href="index.html">PCRE index page</a>.
240  </p>  </p>

Legend:
Removed from v.75  
changed lines
  Added in v.148

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12