/[pcre]/code/trunk/doc/html/pcrepartial.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepartial.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 76 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 77 by nigel, Sat Feb 24 21:40:45 2007 UTC
# Line 16  man page, in case the conversion went wr Line 16  man page, in case the conversion went wr
16  <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>  <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
17  <li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>  <li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
18  <li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>  <li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
19    <li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
20  </ul>  </ul>
21  <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>  <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
22  <P>  <P>
23  In normal use of PCRE, if the subject string that is passed to  In normal use of PCRE, if the subject string that is passed to
24  <b>pcre_exec()</b> matches as far as it goes, but is too short to match the  <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
25  entire pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where  too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
26  it might be helpful to distinguish this case from other cases in which there is  are circumstances where it might be helpful to distinguish this case from other
27  no match.  cases in which there is no match.
28  </P>  </P>
29  <P>  <P>
30  Consider, for example, an application where a human is required to type in data  Consider, for example, an application where a human is required to type in data
# Line 41  entered. Line 42  entered.
42  </P>  </P>
43  <P>  <P>
44  PCRE supports the concept of partial matching by means of the PCRE_PARTIAL  PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
45  option, which can be set when calling <b>pcre_exec()</b>. When this is done, the  option, which can be set when calling <b>pcre_exec()</b> or
46  return code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any  <b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
47  time during the matching process the entire subject string matched part of the  code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
48  pattern. No captured data is set when this occurs.  during the matching process the last part of the subject string matched part of
49    the pattern. Unfortunately, for non-anchored matching, it is not possible to
50    obtain the position of the start of the partial match. No captured data is set
51    when PCRE_ERROR_PARTIAL is returned.
52    </P>
53    <P>
54    When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
55    PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
56    subject is reached, there have been no complete matches, but there is still at
57    least one matching possibility. The portion of the string that provided the
58    partial match is set as the first matching string.
59  </P>  </P>
60  <P>  <P>
61  Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the  Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
# Line 54  for a subject string that might match on Line 65  for a subject string that might match on
65  </P>  </P>
66  <br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>  <br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
67  <P>  <P>
68  Because of the way certain internal optimizations are implemented in PCRE, the  Because of the way certain internal optimizations are implemented in the
69  PCRE_PARTIAL option cannot be used with all patterns. Repeated single  <b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
70  characters such as  patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
71    For <b>pcre_exec()</b>, repeated single characters such as
72  <pre>  <pre>
73    a{2,4}    a{2,4}
74  </pre>  </pre>
# Line 100  uses the date example quoted above: Line 112  uses the date example quoted above:
112  </pre>  </pre>
113  The first data string is matched completely, so <b>pcretest</b> shows the  The first data string is matched completely, so <b>pcretest</b> shows the
114  matched substrings. The remaining four strings do not match the complete  matched substrings. The remaining four strings do not match the complete
115  pattern, but the first two are partial matches.  pattern, but the first two are partial matches. The same test, using DFA
116    matching (by means of the \D escape sequence), produces the following output:
117    <pre>
118        re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
119      data&#62; 25jun04\P\D
120       0: 25jun04
121      data&#62; 23dec3\P\D
122      Partial match: 23dec3
123      data&#62; 3ju\P\D
124      Partial match: 3ju
125      data&#62; 3juj\P\D
126      No match
127      data&#62; j\P\D
128      No match
129    </pre>
130    Notice that in this case the portion of the string that was matched is made
131    available.
132    </P>
133    <br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
134    <P>
135    When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
136    to continue the match by providing additional subject data and calling
137    <b>pcre_dfa_exec()</b> again with the PCRE_DFA_RESTART option and the same
138    working space (where details of the previous partial match are stored). Here is
139    an example using <b>pcretest</b>, where the \R escape sequence sets the
140    PCRE_DFA_RESTART option and the \D escape sequence requests the use of
141    <b>pcre_dfa_exec()</b>:
142    <pre>
143        re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
144      data&#62; 23ja\P\D
145      Partial match: 23ja
146      data&#62; n05\R\D
147       0: n05
148    </pre>
149    The first call has "23ja" as the subject, and requests partial matching; the
150    second call has "n05" as the subject for the continued (restarted) match.
151    Notice that when the match is complete, only the last part is shown; PCRE does
152    not retain the previously partially-matched string. It is up to the calling
153    program to do that if it needs to.
154    </P>
155    <P>
156    This facility can be used to pass very long subject strings to
157    <b>pcre_dfa_exec()</b>. However, some care is needed for certain types of
158    pattern.
159    </P>
160    <P>
161    1. If the pattern contains tests for the beginning or end of a line, you need
162    to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
163    subject string for any call does not contain the beginning or end of a line.
164    </P>
165    <P>
166    2. If the pattern contains backward assertions (including \b or \B), you need
167    to arrange for some overlap in the subject strings to allow for this. For
168    example, you could pass the subject in chunks that were 500 bytes long, but in
169    a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
170    bytes at the start of the buffer.
171    </P>
172    <P>
173    3. Matching a subject string that is split into multiple segments does not
174    always produce exactly the same result as matching over one single long string.
175    The difference arises when there are multiple matching possibilities, because a
176    partial match result is given only when there are no completed matches in a
177    call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
178    been found, continuation to a new subject segment is no longer possible.
179    Consider this <b>pcretest</b> example:
180    <pre>
181        re&#62; /dog(sbody)?/
182      data&#62; do\P\D
183      Partial match: do
184      data&#62; gsb\R\P\D
185       0: g
186      data&#62; dogsbody\D
187       0: dogsbody
188       1: dog
189    </pre>
190    The pattern matches the words "dog" or "dogsbody". When the subject is
191    presented in several parts ("do" and "gsb" being the first two) the match stops
192    when "dog" has been found, and it is not possible to continue. On the other
193    hand, if "dogsbody" is presented as a single string, both matches are found.
194    </P>
195    <P>
196    Because of this phenomenon, it does not usually make sense to end a pattern
197    that is going to be matched in this way with a variable repeat.
198  </P>  </P>
199  <P>  <P>
200  Last updated: 08 September 2004  Last updated: 28 February 2005
201  <br>  <br>
202  Copyright &copy; 1997-2004 University of Cambridge.  Copyright &copy; 1997-2005 University of Cambridge.
203  <p>  <p>
204  Return to the <a href="index.html">PCRE index page</a>.  Return to the <a href="index.html">PCRE index page</a>.
205  </p>  </p>

Legend:
Removed from v.76  
changed lines
  Added in v.77

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12