Re: [pcre-dev] 'Hard' partial matching don't work with some …

Top Page
Delete this message
Author: ND
Date:  
To: Pcre-dev
Subject: Re: [pcre-dev] 'Hard' partial matching don't work with some assertions

> As for "correcting": I have been writing software for over 40 years, and
> one think I learned very early on was that making an incompatible change
> always causes a problem for *somebody*, however much you think "nobody
> will notice this change". That is why I try very hard not to make
> incompatible changes, and introduce new options instead. That is why I
> added "hard" rather than change the way the previous partial worked.


> I can imagine
> that somebody who is using partial matching would want to be sure of
> finding a longer partial match rather than a shorter complete match. For
> example, the pattern abc(def?) applied to the string "abc".
>

Your example demonstraits that the "abc" is the first segment and user
suppose that second may arrive. This is case of multisegment string. IMHO
there are no other implementations of partial 'hard' option. And if we
consider this viewpoint than PCRE behaviour with lookaheads '\z', '\Z',
'$', '\b' is a imperfection, and correction (no adding new functionality)
needed. And from this point of view there are no "incompatible changes" -
there are bug correction. IMHO a bug is not in programm realization stage
but in conception formulating stage.

You consider that there are other implementations that equal to
multisegment string matching at all but want a little difference: that
lookaheads must works without really trying to lookaheading to next
possible string segment. Are such implementations may exists?

There are my arguments. But it will be your selection.

I offer that in 'hard' partial mode (or in some new mode if you though
select to create it):
1. applying '\z', '\Z', '\b', '$' at the end position of subject string
must (in respect of 2.) produce partial match
2. if and only if (offset of the earliest character that was inspected
when the partial match candidate was found) less than
(end-of-subject-string offset), than partial match can be an empty string

PS Adding 'hard' option in 2009 was great thing. Thanx. I applyed PCRE to
analyze data flow. Data is transferred by chunks, and my apllication don't
have beforehand knowing when it ends. But application doing realtime
analyzis of arrived parts and doing actions accordingly. So important
practical implementation of PCRE was born with 'hard' option appearance -
possibility to analyze multisegment strings and endless data flows. There
is wide spectrum of such data, and first of all - internet and net
transmissions. But recently I discover a bugs in my application flow
analyzis. Cause some lookahead assertions are not really lookahead and
don't try to view ahead. So now my application can't "be sure of finding a
longer partial match rather than a shorter complete match" (your words).


> That is slightly odd. I would expect them BOTH to return MATCH, with the
> first returning "t" and the second "" (which it does). I have made a
> note to investigate this when I next work on PCRE (not soon).
>

In purposes of multisegment string matching they both must return
'ERROR_PARTIAL' as described.

Thanx.