Re: [pcre-dev] Multisegment matching with pcre_exec()

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] Multisegment matching with pcre_exec()
On Sun, 24 May 2009, ND wrote:

> What do you think about adding following PCRE behavior:
>
> The return code PCRE_ERROR_MULTISEGMENT raised, and matching abandons
> immediately if at any time during the matching process PCRE needs to
> check (not bumpalong) the next symbol of subject string, but discovers
> an end of string. An extra parameter - last_bumpalong_offset - is
> returned.
>
> IMHO, it will allow to organize true multisegment matching.


No. Multisegment matching is impossible with pcre_exec() because it has
to be able to backtrack to any part of the string. Consider

^(a.*z|something else)

If it reads "a", then lots of characters, but no "z", it then has to
backtrack right to the start of the string so that it can look for
"something else". That is the way Perl-style, depth-first, matching
works.

To do multi-segment matching, you need a searching strategy that scans
the data string just once. This is provided by pcre_dfa_exec(). However,
that imposes restrictions, such as no support for capturing parentheses.

Philip

--
Philip Hazel