/[pcre]/code/trunk/doc/html/pcrepattern.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepattern.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 507 by ph10, Wed Mar 10 16:08:01 2010 UTC revision 512 by ph10, Tue Mar 30 11:11:52 2010 UTC
# Line 2289  may cause matching to proceed, to backtr Line 2289  may cause matching to proceed, to backtr
2289  description of the interface to the callout function is given in the  description of the interface to the callout function is given in the
2290  <a href="pcrecallout.html"><b>pcrecallout</b></a>  <a href="pcrecallout.html"><b>pcrecallout</b></a>
2291  documentation.  documentation.
2292  </P>  <a name="backtrackcontrol"></a></P>
2293  <br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>  <br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
2294  <P>  <P>
2295  Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which  Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
# Line 2313  processed as anchored at the point where Line 2313  processed as anchored at the point where
2313  </P>  </P>
2314  <P>  <P>
2315  The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
2316  parenthesis followed by an asterisk. In Perl, they are generally of the form  parenthesis followed by an asterisk. They are generally of the form
2317  (*VERB:ARG) but PCRE does not support the use of arguments, so its general  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
2318  form is just (*VERB). Any number of these verbs may occur in a pattern. There  depending on whether or not an argument is present. An name is a sequence of
2319  are two kinds:  letters, digits, and underscores. If the name is empty, that is, if the closing
2320    parenthesis immediately follows the colon, the effect is as if the colon were
2321    not there. Any number of these verbs may occur in a pattern.
2322    </P>
2323    <P>
2324    PCRE contains some optimizations that are used to speed up matching by running
2325    some checks at the start of each match attempt. For example, it may know the
2326    minimum length of matching subject, or that a particular character must be
2327    present. When one of these optimizations suppresses the running of a match, any
2328    included backtracking verbs will not, of course, be processed. You can suppress
2329    the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
2330    when calling <b>pcre_exec()</b>.
2331  </P>  </P>
2332  <br><b>  <br><b>
2333  Verbs that act immediately  Verbs that act immediately
2334  </b><br>  </b><br>
2335  <P>  <P>
2336  The following verbs act as soon as they are encountered:  The following verbs act as soon as they are encountered. They may not be
2337    followed by a name.
2338  <pre>  <pre>
2339     (*ACCEPT)     (*ACCEPT)
2340  </pre>  </pre>
# Line 2350  A match with the string "aaaa" always fa Line 2362  A match with the string "aaaa" always fa
2362  each backtrack happens (in this example, 10 times).  each backtrack happens (in this example, 10 times).
2363  </P>  </P>
2364  <br><b>  <br><b>
2365    Recording which path was taken
2366    </b><br>
2367    <P>
2368    There is one verb whose main purpose is to track how a match was arrived at,
2369    though it also has a secondary use in conjunction with advancing the match
2370    starting point (see (*SKIP) below).
2371    <pre>
2372      (*MARK:NAME) or (*:NAME)
2373    </pre>
2374    A name is always required with this verb. There may be as many instances of
2375    (*MARK) as you like in a pattern, and their names do not have to be unique.
2376    </P>
2377    <P>
2378    When a match succeeds, the name of the last-encountered (*MARK) is passed back
2379    to the caller via the <i>pcre_extra</i> data structure, as described in the
2380    <a href="pcreapi.html#extradata">section on <i>pcre_extra</i></a>
2381    in the
2382    <a href="pcreapi.html"><b>pcreapi</b></a>
2383    documentation. No data is returned for a partial match. Here is an example of
2384    <b>pcretest</b> output, where the /K modifier requests the retrieval and
2385    outputting of (*MARK) data:
2386    <pre>
2387      /X(*MARK:A)Y|X(*MARK:B)Z/K
2388      XY
2389       0: XY
2390      MK: A
2391      XZ
2392       0: XZ
2393      MK: B
2394    </pre>
2395    The (*MARK) name is tagged with "MK:" in this output, and in this example it
2396    indicates which of the two alternatives matched. This is a more efficient way
2397    of obtaining this information than putting each alternative in its own
2398    capturing parentheses.
2399    </P>
2400    <P>
2401    A name may also be returned after a failed match if the final path through the
2402    pattern involves (*MARK). However, unless (*MARK) used in conjunction with
2403    (*COMMIT), this is unlikely to happen for an unanchored pattern because, as the
2404    starting point for matching is advanced, the final check is often with an empty
2405    string, causing a failure before (*MARK) is reached. For example:
2406    <pre>
2407      /X(*MARK:A)Y|X(*MARK:B)Z/K
2408      XP
2409      No match
2410    </pre>
2411    There are three potential starting points for this match (starting with X,
2412    starting with P, and with an empty string). If the pattern is anchored, the
2413    result is different:
2414    <pre>
2415      /^X(*MARK:A)Y|^X(*MARK:B)Z/K
2416      XP
2417      No match, mark = B
2418    </pre>
2419    PCRE's start-of-match optimizations can also interfere with this. For example,
2420    if, as a result of a call to <b>pcre_study()</b>, it knows the minimum
2421    subject length for a match, a shorter subject will not be scanned at all.
2422    </P>
2423    <P>
2424    Note that similar anomalies (though different in detail) exist in Perl, no
2425    doubt for the same reasons. The use of (*MARK) data after a failed match of an
2426    unanchored pattern is not recommended, unless (*COMMIT) is involved.
2427    </P>
2428    <br><b>
2429  Verbs that act after backtracking  Verbs that act after backtracking
2430  </b><br>  </b><br>
2431  <P>  <P>
2432  The following verbs do nothing when they are encountered. Matching continues  The following verbs do nothing when they are encountered. Matching continues
2433  with what follows, but if there is no subsequent match, a failure is forced.  with what follows, but if there is no subsequent match, causing a backtrack to
2434  The verbs differ in exactly what kind of failure occurs.  the verb, a failure is forced. That is, backtracking cannot pass to the left of
2435    the verb. However, when one of these verbs appears inside an atomic group, its
2436    effect is confined to that group, because once the group has been matched,
2437    there is never any backtracking into it. In this situation, backtracking can
2438    "jump back" to the left of the entire atomic group. (Remember also, as stated
2439    above, that this localization also applies in subroutine calls and assertions.)
2440    </P>
2441    <P>
2442    These verbs differ in exactly what kind of failure occurs when backtracking
2443    reaches them.
2444  <pre>  <pre>
2445    (*COMMIT)    (*COMMIT)
2446  </pre>  </pre>
2447  This verb causes the whole match to fail outright if the rest of the pattern  This verb, which may not be followed by a name, causes the whole match to fail
2448  does not match. Even if the pattern is unanchored, no further attempts to find  outright if the rest of the pattern does not match. Even if the pattern is
2449  a match by advancing the starting point take place. Once (*COMMIT) has been  unanchored, no further attempts to find a match by advancing the starting point
2450  passed, <b>pcre_exec()</b> is committed to finding a match at the current  take place. Once (*COMMIT) has been passed, <b>pcre_exec()</b> is committed to
2451  starting point, or not at all. For example:  finding a match at the current starting point, or not at all. For example:
2452  <pre>  <pre>
2453    a+(*COMMIT)b    a+(*COMMIT)b
2454  </pre>  </pre>
2455  This matches "xxaab" but not "aacaab". It can be thought of as a kind of  This matches "xxaab" but not "aacaab". It can be thought of as a kind of
2456  dynamic anchor, or "I've started, so I must finish."  dynamic anchor, or "I've started, so I must finish." The name of the most
2457  <pre>  recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
2458    (*PRUNE)  match failure.
2459  </pre>  </P>
2460  This verb causes the match to fail at the current position if the rest of the  <P>
2461  pattern does not match. If the pattern is unanchored, the normal "bumpalong"  Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
2462  advance to the next starting character then happens. Backtracking can occur as  unless PCRE's start-of-match optimizations are turned off, as shown in this
2463  usual to the left of (*PRUNE), or when matching to the right of (*PRUNE), but  <b>pcretest</b> example:
2464  if there is no match to the right, backtracking cannot cross (*PRUNE).  <pre>
2465  In simple cases, the use of (*PRUNE) is just an alternative to an atomic    /(*COMMIT)abc/
2466  group or possessive quantifier, but there are some uses of (*PRUNE) that cannot    xyzabc
2467  be expressed in any other way.     0: abc
2468      xyzabc\Y
2469      No match
2470    </pre>
2471    PCRE knows that any match must start with "a", so the optimization skips along
2472    the subject to "a" before running the first match attempt, which succeeds. When
2473    the optimization is disabled by the \Y escape in the second subject, the match
2474    starts at "x" and so the (*COMMIT) causes it to fail without trying any other
2475    starting points.
2476    <pre>
2477      (*PRUNE) or (*PRUNE:NAME)
2478    </pre>
2479    This verb causes the match to fail at the current starting position in the
2480    subject if the rest of the pattern does not match. If the pattern is
2481    unanchored, the normal "bumpalong" advance to the next starting character then
2482    happens. Backtracking can occur as usual to the left of (*PRUNE), before it is
2483    reached, or when matching to the right of (*PRUNE), but if there is no match to
2484    the right, backtracking cannot cross (*PRUNE). In simple cases, the use of
2485    (*PRUNE) is just an alternative to an atomic group or possessive quantifier,
2486    but there are some uses of (*PRUNE) that cannot be expressed in any other way.
2487    The behaviour of (*PRUNE:NAME) is the same as (*MARK:NAME)(*PRUNE) when the
2488    match fails completely; the name is passed back if this is the final attempt.
2489    (*PRUNE:NAME) does not pass back a name if the match succeeds. In an anchored
2490    pattern (*PRUNE) has the same effect as (*COMMIT).
2491  <pre>  <pre>
2492    (*SKIP)    (*SKIP)
2493  </pre>  </pre>
2494  This verb is like (*PRUNE), except that if the pattern is unanchored, the  This verb, when given without a name, is like (*PRUNE), except that if the
2495  "bumpalong" advance is not to the next character, but to the position in the  pattern is unanchored, the "bumpalong" advance is not to the next character,
2496  subject where (*SKIP) was encountered. (*SKIP) signifies that whatever text  but to the position in the subject where (*SKIP) was encountered. (*SKIP)
2497  was matched leading up to it cannot be part of a successful match. Consider:  signifies that whatever text was matched leading up to it cannot be part of a
2498    successful match. Consider:
2499  <pre>  <pre>
2500    a+(*SKIP)b    a+(*SKIP)b
2501  </pre>  </pre>
# Line 2397  effect as this example; although it woul Line 2506  effect as this example; although it woul
2506  first match attempt, the second attempt would start at the second character  first match attempt, the second attempt would start at the second character
2507  instead of skipping on to "c".  instead of skipping on to "c".
2508  <pre>  <pre>
2509    (*THEN)    (*SKIP:NAME)
2510    </pre>
2511    When (*SKIP) has an associated name, its behaviour is modified. If the
2512    following pattern fails to match, the previous path through the pattern is
2513    searched for the most recent (*MARK) that has the same name. If one is found,
2514    the "bumpalong" advance is to the subject position that corresponds to that
2515    (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a
2516    matching name is found, normal "bumpalong" of one character happens (the
2517    (*SKIP) is ignored).
2518    <pre>
2519      (*THEN) or (*THEN:NAME)
2520  </pre>  </pre>
2521  This verb causes a skip to the next alternation if the rest of the pattern does  This verb causes a skip to the next alternation if the rest of the pattern does
2522  not match. That is, it cancels pending backtracking, but only within the  not match. That is, it cancels pending backtracking, but only within the
# Line 2408  for a pattern-based if-then-else block: Line 2527  for a pattern-based if-then-else block:
2527  </pre>  </pre>
2528  If the COND1 pattern matches, FOO is tried (and possibly further items after  If the COND1 pattern matches, FOO is tried (and possibly further items after
2529  the end of the group if FOO succeeds); on failure the matcher skips to the  the end of the group if FOO succeeds); on failure the matcher skips to the
2530  second alternative and tries COND2, without backtracking into COND1. If (*THEN)  second alternative and tries COND2, without backtracking into COND1. The
2531  is used outside of any alternation, it acts exactly like (*PRUNE).  behaviour of (*THEN:NAME) is exactly the same as (*MARK:NAME)(*THEN) if the
2532    overall match fails. If (*THEN) is not directly inside an alternation, it acts
2533    like (*PRUNE).
2534  </P>  </P>
2535  <br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>
2536  <P>  <P>
# Line 2427  Cambridge CB2 3QH, England. Line 2548  Cambridge CB2 3QH, England.
2548  </P>  </P>
2549  <br><a name="SEC28" href="#TOC1">REVISION</a><br>  <br><a name="SEC28" href="#TOC1">REVISION</a><br>
2550  <P>  <P>
2551  Last updated: 06 March 2010  Last updated: 27 March 2010
2552  <br>  <br>
2553  Copyright &copy; 1997-2010 University of Cambridge.  Copyright &copy; 1997-2010 University of Cambridge.
2554  <br>  <br>

Legend:
Removed from v.507  
changed lines
  Added in v.512

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12