| 1090 |
|
|
| 1091 |
(?<=ab(c|de)) |
(?<=ab(c|de)) |
| 1092 |
|
|
| 1093 |
is not permitted, because its single branch can match two different lengths, |
is not permitted, because its single top-level branch can match two different |
| 1094 |
but it is acceptable if rewritten to use two branches: |
lengths, but it is acceptable if rewritten to use two top-level branches: |
| 1095 |
|
|
| 1096 |
(?<=abc|abde) |
(?<=abc|abde) |
| 1097 |
|
|
| 1098 |
The implementation of lookbehind assertions is, for each alternative, to |
The implementation of lookbehind assertions is, for each alternative, to |
| 1099 |
temporarily move the current position back by the fixed width and then try to |
temporarily move the current position back by the fixed width and then try to |
| 1100 |
match. If there are insufficient characters before the current position, the |
match. If there are insufficient characters before the current position, the |
| 1101 |
match is deemed to fail. |
match is deemed to fail. Lookbehinds in conjunction with once-only subpatterns |
| 1102 |
|
can be particularly useful for matching at the ends of strings; an example is |
| 1103 |
|
given at the end of the section on once-only subpatterns. |
| 1104 |
|
|
| 1105 |
Assertions can be nested in any combination. For example, |
Several assertions (of any sort) may occur in succession. For example, |
| 1106 |
|
|
| 1107 |
|
(?<=\\d{3})(?<!999)foo |
| 1108 |
|
|
| 1109 |
|
matches "foo" preceded by three digits that are not "999". Furthermore, |
| 1110 |
|
assertions can be nested in any combination. For example, |
| 1111 |
|
|
| 1112 |
(?<=(?<!foo)bar)baz |
(?<=(?<!foo)bar)baz |
| 1113 |
|
|
| 1164 |
This construction can of course contain arbitrarily complicated subpatterns, |
This construction can of course contain arbitrarily complicated subpatterns, |
| 1165 |
and it can be nested. |
and it can be nested. |
| 1166 |
|
|
| 1167 |
|
Once-only subpatterns can be used in conjunction with lookbehind assertions to |
| 1168 |
|
specify efficient matching at the end of the subject string. Consider a simple |
| 1169 |
|
pattern such as |
| 1170 |
|
|
| 1171 |
|
abcd$ |
| 1172 |
|
|
| 1173 |
|
when applied to a long string which does not match it. Because matching |
| 1174 |
|
proceeds from left to right, PCRE will look for each "a" in the subject and |
| 1175 |
|
then see if what follows matches the rest of the pattern. If the pattern is |
| 1176 |
|
specified as |
| 1177 |
|
|
| 1178 |
|
.*abcd$ |
| 1179 |
|
|
| 1180 |
|
then the initial .* matches the entire string at first, but when this fails, it |
| 1181 |
|
backtracks to match all but the last character, then all but the last two |
| 1182 |
|
characters, and so on. Once again the search for "a" covers the entire string, |
| 1183 |
|
from right to left, so we are no better off. However, if the pattern is written |
| 1184 |
|
as |
| 1185 |
|
|
| 1186 |
|
(?>.*)(?<=abcd) |
| 1187 |
|
|
| 1188 |
|
then there can be no backtracking for the .* item; it can match only the entire |
| 1189 |
|
string. The subsequent lookbehind assertion does a single test on the last four |
| 1190 |
|
characters. If it fails, the match fails immediately. For long strings, this |
| 1191 |
|
approach makes a significant difference to the processing time. |
| 1192 |
|
|
| 1193 |
|
|
| 1194 |
.SH CONDITIONAL SUBPATTERNS |
.SH CONDITIONAL SUBPATTERNS |
| 1195 |
It is possible to cause the matching process to obey a subpattern |
It is possible to cause the matching process to obey a subpattern |
| 1269 |
.br |
.br |
| 1270 |
Phone: +44 1223 334714 |
Phone: +44 1223 334714 |
| 1271 |
|
|
| 1272 |
Copyright (c) 1998 University of Cambridge. |
Copyright (c) 1997-1999 University of Cambridge. |