| 135 |
Back references |
Back references |
| 136 |
--------------- |
--------------- |
| 137 |
|
|
| 138 |
OP_REF is followed by a single byte containing the reference number. |
OP_REF is followed by two bytes containing the reference number. |
| 139 |
|
|
| 140 |
|
|
| 141 |
Repeating character classes and back references |
Repeating character classes and back references |
| 163 |
|
|
| 164 |
A pair of non-capturing (round) brackets is wrapped round each expression at |
A pair of non-capturing (round) brackets is wrapped round each expression at |
| 165 |
compile time, so alternation always happens in the context of brackets. |
compile time, so alternation always happens in the context of brackets. |
| 166 |
|
|
| 167 |
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use |
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use |
| 168 |
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English |
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English |
| 169 |
speakers, including myself, can be round, square, curly, or pointy. Hence this |
speakers, including myself, can be round, square, curly, or pointy. Hence this |
| 170 |
usage.] |
usage.] |
| 171 |
|
|
| 172 |
|
Originally PCRE was limited to 99 capturing brackets (so as not to use up all |
| 173 |
|
the opcodes). From release 3.5, there is no limit. What happens is that the |
| 174 |
|
first ones, up to EXTRACT_BASIC_MAX are handled with separate opcodes, as |
| 175 |
|
above. If there are more, the opcode is set to EXTRACT_BASIC_MAX+1, and the |
| 176 |
|
first operation in the bracket is OP_BRANUMBER, followed by a 2-byte bracket |
| 177 |
|
number. This opcode is ignored while matching, but is fished out when handling |
| 178 |
|
the bracket itself. (They could have all been done like this, but I was making |
| 179 |
|
minimal changes.) |
| 180 |
|
|
| 181 |
A bracket opcode is followed by two bytes which give the offset to the next |
A bracket opcode is followed by two bytes which give the offset to the next |
| 182 |
alternative OP_ALT or, if there aren't any branches, to the matching KET |
alternative OP_ALT or, if there aren't any branches, to the matching KET |
| 183 |
opcode. Each OP_ALT is followed by two bytes giving the offset to the next one, |
opcode. Each OP_ALT is followed by two bytes giving the offset to the next one, |
| 201 |
A subpattern with a bounded maximum repetition is replicated in a nested |
A subpattern with a bounded maximum repetition is replicated in a nested |
| 202 |
fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before |
fashion up to the maximum number of times, with BRAZERO or BRAMINZERO before |
| 203 |
each replication after the minimum, so that, for example, (abc){2,5} is |
each replication after the minimum, so that, for example, (abc){2,5} is |
| 204 |
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 200-bracket limit does not |
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?. The 99 and 200 bracket limits do |
| 205 |
apply to these internally generated brackets. |
not apply to these internally generated brackets. |
| 206 |
|
|
| 207 |
|
|
| 208 |
Assertions |
Assertions |
| 230 |
|
|
| 231 |
These are like other subpatterns, but they start with the opcode OP_COND. If |
These are like other subpatterns, but they start with the opcode OP_COND. If |
| 232 |
the condition is a back reference, this is stored at the start of the |
the condition is a back reference, this is stored at the start of the |
| 233 |
subpattern using the opcode OP_CREF followed by one byte containing the |
subpattern using the opcode OP_CREF followed by two bytes containing the |
| 234 |
reference number. Otherwise, a conditional subpattern will always start with |
reference number. Otherwise, a conditional subpattern will always start with |
| 235 |
one of the assertions. |
one of the assertions. |
| 236 |
|
|
| 250 |
|
|
| 251 |
|
|
| 252 |
Philip Hazel |
Philip Hazel |
| 253 |
August 2000 |
August 2001 |