Re: [pcre-dev] 'Hard' partial matching don't work with some …

Top Page
Delete this message
Author: Philip Hazel
Date:  
To: ND
CC: Pcre-dev
Subject: Re: [pcre-dev] 'Hard' partial matching don't work with some assertions
On Sun, 29 Aug 2010, ND wrote:

> No. May be my English is poor, sorry. I don't ask about how it works. What
> practical tasks needs to use PCRE with this form of 'hard' option?


As all I ever actually use regex for is searching in text editors or
grep, I don't know of any actual application. However, I can imagine
that somebody who is using partial matching would want to be sure of
finding a longer partial match rather than a shorter complete match. For
example, the pattern abc(def?) applied to the string "abc".

> And present 'hard' option needs very little correction to deal right
> with certain lookahead assertions.


Send me a patch, and I'll look at it. I suspect it is more than "a very
little correction". The code of PCRE is the trickiest I have ever
written.

As for "correcting": I have been writing software for over 40 years, and
one think I learned very early on was that making an incompatible change
always causes a problem for *somebody*, however much you think "nobody
will notice this change". That is why I try very hard not to make
incompatible changes, and introduce new options instead. That is why I
added "hard" rather than change the way the previous partial worked.

> As example I really don't understant *practical reasons* WHY for subject
> string 'cat' analogous patterns returns different result:
> pattern 't\b' returns 'ERROR_PARTIAL'
> pattern 't\K\b' returns 'MATCH'


That is slightly odd. I would expect them BOTH to return MATCH, with the
first returning "t" and the second "" (which it does). I have made a
note to investigate this when I next work on PCRE (not soon).

> I read in documentation that partial match can never be an empty string.
> But WHY? What idea is? What benefits are?


You can always match an empty string at the end of the subject. So
(except for an anchored pattern), you would always get a partial match.
That seemed like a bad idea.

Philip

--
Philip Hazel