Re: [pcre-dev] Oniguruma subroutines

Top Page
Delete this message
Author: Sheri
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] Oniguruma subroutines
Philip Hazel wrote:
> On Wed, 21 May 2008, Sheri wrote:
>
>
>> This pattern (which works) is cited in pcrepattern:
>>
>> (abc)(?i:\g<-1>)
>>
>> but the documentation also says that case sensitivity can't be
>> controlled in the subroutine call.
>>
>> so I tried:
>>
>> (abc)(\g<-1>)
>>
>> and got a compilation error, that a recursive call could loop
>> indefinitely. (?)
>>
>
> To remove the case sensitivity setting, you need
>
> (abc)(?:\g<-1>)
>
> The ?: is important; it is specifying that the second parentheses are
> not capturing. Alternatively, try
>
> (abc)(\g<-2>)
>
> The point is that a negative number refers to the nth most recently
> opened capturing parentheses. So with (abc)(\g<-1>) you are indeed
> making a recursive call to the second set.
>
> I hope that makes sense.
>
> Philip
>
>

Thanks Philip,

I was trying to make it capturing, so your second suggestion answers my
question. I didn't realize that a negative number could end up referring
to the current subpattern if it is capturing. I suppose it is necessary
that the 1st subpattern itself be capturing in order to give it the
capacity to serve as a subroutine.

Suppose one's goal is to capture each repeat of (abc) and it is unknown
how many there will be. Is this about the best that can be done?:

1. Choose the maximum number of repeats that will be supported
2. Construct the repeating pattern
3. Add supported repeats as (/g<1>)?
Example (abc)(\g1)?(\g1)?(\g1)?(\g1)?(\g1)?(\g1)?(\g1)?(\g1)?(\g1)?

Is there a workable approach if the repeating pattern itself has desired
subpatterns? With the following I find the "b" captured only once:
/(a(b)c)(\g1)?/ with data abcabc

Regards,
Sheri