| 205 |
string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
| 206 |
PCRE_ANCHORED flags set in order to search for another, non-empty, |
PCRE_ANCHORED flags set in order to search for another, non-empty, |
| 207 |
match at the same point. If this second match fails, the start offset |
match at the same point. If this second match fails, the start offset |
| 208 |
is advanced by one character, and the normal match is retried. This |
is advanced, and the normal match is retried. This imitates the way |
| 209 |
imitates the way Perl handles such cases when using the /g modifier or |
Perl handles such cases when using the /g modifier or the split() func- |
| 210 |
the split() function. |
tion. Normally, the start offset is advanced by one character, but if |
| 211 |
|
the newline convention recognizes CRLF as a newline, and the current |
| 212 |
|
character is CR followed by LF, an advance of two is used. |
| 213 |
|
|
| 214 |
Other modifiers |
Other modifiers |
| 215 |
|
|
| 372 |
or pcre_dfa_exec() |
or pcre_dfa_exec() |
| 373 |
\? pass the PCRE_NO_UTF8_CHECK option to |
\? pass the PCRE_NO_UTF8_CHECK option to |
| 374 |
pcre_exec() or pcre_dfa_exec() |
pcre_exec() or pcre_dfa_exec() |
| 375 |
\>dd start the match at offset dd (any number of digits); |
\>dd start the match at offset dd (optional "-"; then |
| 376 |
this sets the startoffset argument for pcre_exec() |
any number of digits); this sets the startoffset |
| 377 |
or pcre_dfa_exec() |
argument for pcre_exec() or pcre_dfa_exec() |
| 378 |
\<cr> pass the PCRE_NEWLINE_CR option to pcre_exec() |
\<cr> pass the PCRE_NEWLINE_CR option to pcre_exec() |
| 379 |
or pcre_dfa_exec() |
or pcre_dfa_exec() |
| 380 |
\<lf> pass the PCRE_NEWLINE_LF option to pcre_exec() |
\<lf> pass the PCRE_NEWLINE_LF option to pcre_exec() |
| 451 |
matched the whole pattern. Otherwise, it outputs "No match" when the |
matched the whole pattern. Otherwise, it outputs "No match" when the |
| 452 |
return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by the par- |
return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by the par- |
| 453 |
tially matching substring when pcre_exec() returns PCRE_ERROR_PARTIAL. |
tially matching substring when pcre_exec() returns PCRE_ERROR_PARTIAL. |
| 454 |
For any other returns, it outputs the PCRE negative error number. Here |
(Note that this is the entire substring that was inspected during the |
| 455 |
is an example of an interactive pcretest run. |
partial match; it may include characters before the actual match start |
| 456 |
|
if a lookbehind assertion, \K, \b, or \B was involved.) For any other |
| 457 |
|
returns, it outputs the PCRE negative error number. Here is an example |
| 458 |
|
of an interactive pcretest run. |
| 459 |
|
|
| 460 |
$ pcretest |
$ pcretest |
| 461 |
PCRE version 7.0 30-Nov-2006 |
PCRE version 7.0 30-Nov-2006 |
| 467 |
data> xyz |
data> xyz |
| 468 |
No match |
No match |
| 469 |
|
|
| 470 |
Note that unset capturing substrings that are not followed by one that |
Note that unset capturing substrings that are not followed by one that |
| 471 |
is set are not returned by pcre_exec(), and are not shown by pcretest. |
is set are not returned by pcre_exec(), and are not shown by pcretest. |
| 472 |
In the following example, there are two capturing substrings, but when |
In the following example, there are two capturing substrings, but when |
| 473 |
the first data line is matched, the second, unset substring is not |
the first data line is matched, the second, unset substring is not |
| 474 |
shown. An "internal" unset substring is shown as "<unset>", as for the |
shown. An "internal" unset substring is shown as "<unset>", as for the |
| 475 |
second data line. |
second data line. |
| 476 |
|
|
| 477 |
re> /(a)|(b)/ |
re> /(a)|(b)/ |
| 483 |
1: <unset> |
1: <unset> |
| 484 |
2: b |
2: b |
| 485 |
|
|
| 486 |
If the strings contain any non-printing characters, they are output as |
If the strings contain any non-printing characters, they are output as |
| 487 |
\0x escapes, or as \x{...} escapes if the /8 modifier was present on |
\0x escapes, or as \x{...} escapes if the /8 modifier was present on |
| 488 |
the pattern. See below for the definition of non-printing characters. |
the pattern. See below for the definition of non-printing characters. |
| 489 |
If the pattern has the /+ modifier, the output for substring 0 is fol- |
If the pattern has the /+ modifier, the output for substring 0 is fol- |
| 490 |
lowed by the the rest of the subject string, identified by "0+" like |
lowed by the the rest of the subject string, identified by "0+" like |
| 491 |
this: |
this: |
| 492 |
|
|
| 493 |
re> /cat/+ |
re> /cat/+ |
| 495 |
0: cat |
0: cat |
| 496 |
0+ aract |
0+ aract |
| 497 |
|
|
| 498 |
If the pattern has the /g or /G modifier, the results of successive |
If the pattern has the /g or /G modifier, the results of successive |
| 499 |
matching attempts are output in sequence, like this: |
matching attempts are output in sequence, like this: |
| 500 |
|
|
| 501 |
re> /\Bi(\w\w)/g |
re> /\Bi(\w\w)/g |
| 509 |
|
|
| 510 |
"No match" is output only if the first match attempt fails. |
"No match" is output only if the first match attempt fails. |
| 511 |
|
|
| 512 |
If any of the sequences \C, \G, or \L are present in a data line that |
If any of the sequences \C, \G, or \L are present in a data line that |
| 513 |
is successfully matched, the substrings extracted by the convenience |
is successfully matched, the substrings extracted by the convenience |
| 514 |
functions are output with C, G, or L after the string number instead of |
functions are output with C, G, or L after the string number instead of |
| 515 |
a colon. This is in addition to the normal full list. The string length |
a colon. This is in addition to the normal full list. The string length |
| 516 |
(that is, the return from the extraction function) is given in paren- |
(that is, the return from the extraction function) is given in paren- |
| 517 |
theses after each string for \C and \G. |
theses after each string for \C and \G. |
| 518 |
|
|
| 519 |
Note that whereas patterns can be continued over several lines (a plain |
Note that whereas patterns can be continued over several lines (a plain |
| 520 |
">" prompt is used for continuations), data lines may not. However new- |
">" prompt is used for continuations), data lines may not. However new- |
| 521 |
lines can be included in data by means of the \n escape (or \r, \r\n, |
lines can be included in data by means of the \n escape (or \r, \r\n, |
| 522 |
etc., depending on the newline sequence setting). |
etc., depending on the newline sequence setting). |
| 523 |
|
|
| 524 |
|
|
| 525 |
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION |
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION |
| 526 |
|
|
| 527 |
When the alternative matching function, pcre_dfa_exec(), is used (by |
When the alternative matching function, pcre_dfa_exec(), is used (by |
| 528 |
means of the \D escape sequence or the -dfa command line option), the |
means of the \D escape sequence or the -dfa command line option), the |
| 529 |
output consists of a list of all the matches that start at the first |
output consists of a list of all the matches that start at the first |
| 530 |
point in the subject where there is at least one match. For example: |
point in the subject where there is at least one match. For example: |
| 531 |
|
|
| 532 |
re> /(tang|tangerine|tan)/ |
re> /(tang|tangerine|tan)/ |
| 535 |
1: tang |
1: tang |
| 536 |
2: tan |
2: tan |
| 537 |
|
|
| 538 |
(Using the normal matching function on this data finds only "tang".) |
(Using the normal matching function on this data finds only "tang".) |
| 539 |
The longest matching string is always given first (and numbered zero). |
The longest matching string is always given first (and numbered zero). |
| 540 |
After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol- |
After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol- |
| 541 |
lowed by the partially matching substring. |
lowed by the partially matching substring. (Note that this is the |
| 542 |
|
entire substring that was inspected during the partial match; it may |
| 543 |
|
include characters before the actual match start if a lookbehind asser- |
| 544 |
|
tion, \K, \b, or \B was involved.) |
| 545 |
|
|
| 546 |
If /g is present on the pattern, the search for further matches resumes |
If /g is present on the pattern, the search for further matches resumes |
| 547 |
at the end of the longest match. For example: |
at the end of the longest match. For example: |
| 700 |
|
|
| 701 |
REVISION |
REVISION |
| 702 |
|
|
| 703 |
Last updated: 14 June 2010 |
Last updated: 06 November 2010 |
| 704 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |