| 7 |
|
|
| 8 |
SYNOPSIS |
SYNOPSIS |
| 9 |
|
|
| 10 |
pcretest [options] [source] [destination] |
pcretest [options] [input file [output file]] |
| 11 |
|
|
| 12 |
pcretest was written as a test program for the PCRE regular expression |
pcretest was written as a test program for the PCRE regular expression |
| 13 |
library itself, but it can also be used for experimenting with regular |
library itself, but it can also be used for experimenting with regular |
| 14 |
expressions. This document describes the features of the test program; |
expressions. This document describes the features of the test program; |
| 15 |
for details of the regular expressions themselves, see the pcrepattern |
for details of the regular expressions themselves, see the pcrepattern |
| 16 |
documentation. For details of the PCRE library function calls and their |
documentation. For details of the PCRE library function calls and their |
| 17 |
options, see the pcreapi documentation. |
options, see the pcreapi documentation. The input for pcretest is a |
| 18 |
|
sequence of regular expression patterns and strings to be matched, as |
| 19 |
|
described below. The output shows the result of each match. Options on |
| 20 |
|
the command line and the patterns control PCRE options and exactly what |
| 21 |
|
is output. |
| 22 |
|
|
| 23 |
|
|
| 24 |
OPTIONS |
COMMAND LINE OPTIONS |
| 25 |
|
|
| 26 |
-b Behave as if each regex has the /B (show bytecode) modifier; |
-b Behave as if each pattern has the /B (show byte code) modi- |
| 27 |
the internal form is output after compilation. |
fier; the internal form is output after compilation. |
| 28 |
|
|
| 29 |
-C Output the version number of the PCRE library, and all avail- |
-C Output the version number of the PCRE library, and all avail- |
| 30 |
able information about the optional features that are |
able information about the optional features that are |
| 31 |
included, and then exit. |
included, and then exit. |
| 32 |
|
|
| 33 |
-d Behave as if each regex has the /D (debug) modifier; the |
-d Behave as if each pattern has the /D (debug) modifier; the |
| 34 |
internal form and information about the compiled pattern is |
internal form and information about the compiled pattern is |
| 35 |
output after compilation; -d is equivalent to -b -i. |
output after compilation; -d is equivalent to -b -i. |
| 36 |
|
|
| 41 |
|
|
| 42 |
-help Output a brief summary these options and then exit. |
-help Output a brief summary these options and then exit. |
| 43 |
|
|
| 44 |
-i Behave as if each regex has the /I modifier; information |
-i Behave as if each pattern has the /I modifier; information |
| 45 |
about the compiled pattern is given after compilation. |
about the compiled pattern is given after compilation. |
| 46 |
|
|
| 47 |
-M Behave as if each data line contains the \M escape sequence; |
-M Behave as if each data line contains the \M escape sequence; |
| 51 |
|
|
| 52 |
-m Output the size of each compiled pattern after it has been |
-m Output the size of each compiled pattern after it has been |
| 53 |
compiled. This is equivalent to adding /M to each regular |
compiled. This is equivalent to adding /M to each regular |
| 54 |
expression. For compatibility with earlier versions of |
expression. |
|
pcretest, -s is a synonym for -m. |
|
| 55 |
|
|
| 56 |
-o osize Set the number of elements in the output vector that is used |
-o osize Set the number of elements in the output vector that is used |
| 57 |
when calling pcre_exec() or pcre_dfa_exec() to be osize. The |
when calling pcre_exec() or pcre_dfa_exec() to be osize. The |
| 58 |
default value is 45, which is enough for 14 capturing subex- |
default value is 45, which is enough for 14 capturing subex- |
| 59 |
pressions for pcre_exec() or 22 different matches for |
pressions for pcre_exec() or 22 different matches for |
| 60 |
pcre_dfa_exec(). The vector size can be changed for individ- |
pcre_dfa_exec(). The vector size can be changed for individ- |
| 61 |
ual matching calls by including \O in the data line (see |
ual matching calls by including \O in the data line (see |
| 62 |
below). |
below). |
| 63 |
|
|
| 64 |
-p Behave as if each regex has the /P modifier; the POSIX wrap- |
-p Behave as if each pattern has the /P modifier; the POSIX |
| 65 |
per API is used to call PCRE. None of the other options has |
wrapper API is used to call PCRE. None of the other options |
| 66 |
any effect when -p is set. |
has any effect when -p is set. |
| 67 |
|
|
| 68 |
-q Do not output the version number of pcretest at the start of |
-q Do not output the version number of pcretest at the start of |
| 69 |
execution. |
execution. |
| 70 |
|
|
| 71 |
-S size On Unix-like systems, set the size of the runtime stack to |
-S size On Unix-like systems, set the size of the run-time stack to |
| 72 |
size megabytes. |
size megabytes. |
| 73 |
|
|
| 74 |
-t Run each compile, study, and match many times with a timer, |
-s Behave as if each pattern has the /S modifier; in other |
| 75 |
and output resulting time per compile or match (in millisec- |
words, force each pattern to be studied. If the /I or /D |
| 76 |
onds). Do not set -m with -t, because you will then get the |
option is present on a pattern (requesting output about the |
| 77 |
size output a zillion times, and the timing will be dis- |
compiled pattern), information about the result of studying |
| 78 |
torted. You can control the number of iterations that are |
is not included when studying is caused only by -s and nei- |
| 79 |
used for timing by following -t with a number (as a separate |
ther -i nor -d is present on the command line. This behaviour |
| 80 |
|
means that the output from tests that are run with and with- |
| 81 |
|
out -s should be identical, except when options that output |
| 82 |
|
information about the actual running of a match are set. The |
| 83 |
|
-M, -t, and -tm options, which give information about |
| 84 |
|
resources used, are likely to produce different output with |
| 85 |
|
and without -s. Output may also differ if the /C option is |
| 86 |
|
present on an individual pattern. This uses callouts to trace |
| 87 |
|
the the matching process, and this may be different between |
| 88 |
|
studied and non-studied patterns. If the pattern contains |
| 89 |
|
(*MARK) items there may also be differences, for the same |
| 90 |
|
reason. The -s command line option can be overridden for spe- |
| 91 |
|
cific patterns that should never be studied (see the /S |
| 92 |
|
option below). |
| 93 |
|
|
| 94 |
|
-t Run each compile, study, and match many times with a timer, |
| 95 |
|
and output resulting time per compile or match (in millisec- |
| 96 |
|
onds). Do not set -m with -t, because you will then get the |
| 97 |
|
size output a zillion times, and the timing will be dis- |
| 98 |
|
torted. You can control the number of iterations that are |
| 99 |
|
used for timing by following -t with a number (as a separate |
| 100 |
item on the command line). For example, "-t 1000" would iter- |
item on the command line). For example, "-t 1000" would iter- |
| 101 |
ate 1000 times. The default is to iterate 500000 times. |
ate 1000 times. The default is to iterate 500000 times. |
| 102 |
|
|
| 106 |
|
|
| 107 |
DESCRIPTION |
DESCRIPTION |
| 108 |
|
|
| 109 |
If pcretest is given two filename arguments, it reads from the first |
If pcretest is given two filename arguments, it reads from the first |
| 110 |
and writes to the second. If it is given only one filename argument, it |
and writes to the second. If it is given only one filename argument, it |
| 111 |
reads from that file and writes to stdout. Otherwise, it reads from |
reads from that file and writes to stdout. Otherwise, it reads from |
| 112 |
stdin and writes to stdout, and prompts for each line of input, using |
stdin and writes to stdout, and prompts for each line of input, using |
| 113 |
"re>" to prompt for regular expressions, and "data>" to prompt for data |
"re>" to prompt for regular expressions, and "data>" to prompt for data |
| 114 |
lines. |
lines. |
| 115 |
|
|
| 116 |
When pcretest is built, a configuration option can specify that it |
When pcretest is built, a configuration option can specify that it |
| 117 |
should be linked with the libreadline library. When this is done, if |
should be linked with the libreadline library. When this is done, if |
| 118 |
the input is from a terminal, it is read using the readline() function. |
the input is from a terminal, it is read using the readline() function. |
| 119 |
This provides line-editing and history facilities. The output from the |
This provides line-editing and history facilities. The output from the |
| 120 |
-help option states whether or not readline() will be used. |
-help option states whether or not readline() will be used. |
| 121 |
|
|
| 122 |
The program handles any number of sets of input on a single input file. |
The program handles any number of sets of input on a single input file. |
| 123 |
Each set starts with a regular expression, and continues with any num- |
Each set starts with a regular expression, and continues with any num- |
| 124 |
ber of data lines to be matched against the pattern. |
ber of data lines to be matched against the pattern. |
| 125 |
|
|
| 126 |
Each data line is matched separately and independently. If you want to |
Each data line is matched separately and independently. If you want to |
| 127 |
do multi-line matches, you have to use the \n escape sequence (or \r or |
do multi-line matches, you have to use the \n escape sequence (or \r or |
| 128 |
\r\n, etc., depending on the newline setting) in a single line of input |
\r\n, etc., depending on the newline setting) in a single line of input |
| 129 |
to encode the newline sequences. There is no limit on the length of |
to encode the newline sequences. There is no limit on the length of |
| 130 |
data lines; the input buffer is automatically extended if it is too |
data lines; the input buffer is automatically extended if it is too |
| 131 |
small. |
small. |
| 132 |
|
|
| 133 |
An empty line signals the end of the data lines, at which point a new |
An empty line signals the end of the data lines, at which point a new |
| 134 |
regular expression is read. The regular expressions are given enclosed |
regular expression is read. The regular expressions are given enclosed |
| 135 |
in any non-alphanumeric delimiters other than backslash, for example: |
in any non-alphanumeric delimiters other than backslash, for example: |
| 136 |
|
|
| 137 |
/(a|bc)x+yz/ |
/(a|bc)x+yz/ |
| 138 |
|
|
| 139 |
White space before the initial delimiter is ignored. A regular expres- |
White space before the initial delimiter is ignored. A regular expres- |
| 140 |
sion may be continued over several input lines, in which case the new- |
sion may be continued over several input lines, in which case the new- |
| 141 |
line characters are included within it. It is possible to include the |
line characters are included within it. It is possible to include the |
| 142 |
delimiter within the pattern by escaping it, for example |
delimiter within the pattern by escaping it, for example |
| 143 |
|
|
| 144 |
/abc\/def/ |
/abc\/def/ |
| 145 |
|
|
| 146 |
If you do so, the escape and the delimiter form part of the pattern, |
If you do so, the escape and the delimiter form part of the pattern, |
| 147 |
but since delimiters are always non-alphanumeric, this does not affect |
but since delimiters are always non-alphanumeric, this does not affect |
| 148 |
its interpretation. If the terminating delimiter is immediately fol- |
its interpretation. If the terminating delimiter is immediately fol- |
| 149 |
lowed by a backslash, for example, |
lowed by a backslash, for example, |
| 150 |
|
|
| 151 |
/abc/\ |
/abc/\ |
| 152 |
|
|
| 153 |
then a backslash is added to the end of the pattern. This is done to |
then a backslash is added to the end of the pattern. This is done to |
| 154 |
provide a way of testing the error condition that arises if a pattern |
provide a way of testing the error condition that arises if a pattern |
| 155 |
finishes with a backslash, because |
finishes with a backslash, because |
| 156 |
|
|
| 157 |
/abc\/ |
/abc\/ |
| 158 |
|
|
| 159 |
is interpreted as the first line of a pattern that starts with "abc/", |
is interpreted as the first line of a pattern that starts with "abc/", |
| 160 |
causing pcretest to read the next line as a continuation of the regular |
causing pcretest to read the next line as a continuation of the regular |
| 161 |
expression. |
expression. |
| 162 |
|
|
| 163 |
|
|
| 164 |
PATTERN MODIFIERS |
PATTERN MODIFIERS |
| 165 |
|
|
| 166 |
A pattern may be followed by any number of modifiers, which are mostly |
A pattern may be followed by any number of modifiers, which are mostly |
| 167 |
single characters. Following Perl usage, these are referred to below |
single characters. Following Perl usage, these are referred to below |
| 168 |
as, for example, "the /i modifier", even though the delimiter of the |
as, for example, "the /i modifier", even though the delimiter of the |
| 169 |
pattern need not always be a slash, and no slash is used when writing |
pattern need not always be a slash, and no slash is used when writing |
| 170 |
modifiers. Whitespace may appear between the final pattern delimiter |
modifiers. White space may appear between the final pattern delimiter |
| 171 |
and the first modifier, and between the modifiers themselves. |
and the first modifier, and between the modifiers themselves. |
| 172 |
|
|
| 173 |
The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, |
The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, |
| 174 |
PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre_com- |
PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre_com- |
| 175 |
pile() is called. These four modifier letters have the same effect as |
pile() is called. These four modifier letters have the same effect as |
| 176 |
they do in Perl. For example: |
they do in Perl. For example: |
| 177 |
|
|
| 178 |
/caseless/i |
/caseless/i |
| 179 |
|
|
| 180 |
The following table shows additional modifiers for setting PCRE com- |
The following table shows additional modifiers for setting PCRE com- |
| 181 |
pile-time options that do not correspond to anything in Perl: |
pile-time options that do not correspond to anything in Perl: |
| 182 |
|
|
| 183 |
/8 PCRE_UTF8 |
/8 PCRE_UTF8 |
| 201 |
/<bsr_anycrlf> PCRE_BSR_ANYCRLF |
/<bsr_anycrlf> PCRE_BSR_ANYCRLF |
| 202 |
/<bsr_unicode> PCRE_BSR_UNICODE |
/<bsr_unicode> PCRE_BSR_UNICODE |
| 203 |
|
|
| 204 |
The modifiers that are enclosed in angle brackets are literal strings |
The modifiers that are enclosed in angle brackets are literal strings |
| 205 |
as shown, including the angle brackets, but the letters can be in |
as shown, including the angle brackets, but the letters within can be |
| 206 |
either case. This example sets multiline matching with CRLF as the line |
in either case. This example sets multiline matching with CRLF as the |
| 207 |
ending sequence: |
line ending sequence: |
| 208 |
|
|
| 209 |
/^abc/m<crlf> |
/^abc/m<CRLF> |
| 210 |
|
|
| 211 |
As well as turning on the PCRE_UTF8 option, the /8 modifier also causes |
As well as turning on the PCRE_UTF8 option, the /8 modifier also causes |
| 212 |
any non-printing characters in output strings to be printed using the |
any non-printing characters in output strings to be printed using the |
| 213 |
\x{hh...} notation if they are valid UTF-8 sequences. Full details of |
\x{hh...} notation if they are valid UTF-8 sequences. Full details of |
| 214 |
the PCRE options are given in the pcreapi documentation. |
the PCRE options are given in the pcreapi documentation. |
| 215 |
|
|
| 216 |
Finding all matches in a string |
Finding all matches in a string |
| 217 |
|
|
| 218 |
Searching for all possible matches within each subject string can be |
Searching for all possible matches within each subject string can be |
| 219 |
requested by the /g or /G modifier. After finding a match, PCRE is |
requested by the /g or /G modifier. After finding a match, PCRE is |
| 220 |
called again to search the remainder of the subject string. The differ- |
called again to search the remainder of the subject string. The differ- |
| 221 |
ence between /g and /G is that the former uses the startoffset argument |
ence between /g and /G is that the former uses the startoffset argument |
| 222 |
to pcre_exec() to start searching at a new point within the entire |
to pcre_exec() to start searching at a new point within the entire |
| 223 |
string (which is in effect what Perl does), whereas the latter passes |
string (which is in effect what Perl does), whereas the latter passes |
| 224 |
over a shortened substring. This makes a difference to the matching |
over a shortened substring. This makes a difference to the matching |
| 225 |
process if the pattern begins with a lookbehind assertion (including \b |
process if the pattern begins with a lookbehind assertion (including \b |
| 226 |
or \B). |
or \B). |
| 227 |
|
|
| 228 |
If any call to pcre_exec() in a /g or /G sequence matches an empty |
If any call to pcre_exec() in a /g or /G sequence matches an empty |
| 229 |
string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
string, the next call is done with the PCRE_NOTEMPTY_ATSTART and |
| 230 |
PCRE_ANCHORED flags set in order to search for another, non-empty, |
PCRE_ANCHORED flags set in order to search for another, non-empty, |
| 231 |
match at the same point. If this second match fails, the start offset |
match at the same point. If this second match fails, the start offset |
| 232 |
is advanced, and the normal match is retried. This imitates the way |
is advanced, and the normal match is retried. This imitates the way |
| 233 |
Perl handles such cases when using the /g modifier or the split() func- |
Perl handles such cases when using the /g modifier or the split() func- |
| 234 |
tion. Normally, the start offset is advanced by one character, but if |
tion. Normally, the start offset is advanced by one character, but if |
| 235 |
the newline convention recognizes CRLF as a newline, and the current |
the newline convention recognizes CRLF as a newline, and the current |
| 236 |
character is CR followed by LF, an advance of two is used. |
character is CR followed by LF, an advance of two is used. |
| 237 |
|
|
| 238 |
Other modifiers |
Other modifiers |
| 239 |
|
|
| 240 |
There are yet more modifiers for controlling the way pcretest operates. |
There are yet more modifiers for controlling the way pcretest operates. |
| 241 |
|
|
| 242 |
The /+ modifier requests that as well as outputting the substring that |
The /+ modifier requests that as well as outputting the substring that |
| 243 |
matched the entire pattern, pcretest should in addition output the |
matched the entire pattern, pcretest should in addition output the |
| 244 |
remainder of the subject string. This is useful for tests where the |
remainder of the subject string. This is useful for tests where the |
| 245 |
subject contains multiple copies of the same substring. |
subject contains multiple copies of the same substring. If the + modi- |
| 246 |
|
fier appears twice, the same action is taken for captured substrings. |
| 247 |
|
In each case the remainder is output on the following line with a plus |
| 248 |
|
character following the capture number. |
| 249 |
|
|
| 250 |
|
The /= modifier requests that the values of all potential captured |
| 251 |
|
parentheses be output after a match by pcre_exec(). By default, only |
| 252 |
|
those up to the highest one actually used in the match are output (cor- |
| 253 |
|
responding to the return code from pcre_exec()). Values in the offsets |
| 254 |
|
vector corresponding to higher numbers should be set to -1, and these |
| 255 |
|
are output as "<unset>". This modifier gives a way of checking that |
| 256 |
|
this is happening. |
| 257 |
|
|
| 258 |
The /B modifier is a debugging feature. It requests that pcretest out- |
The /B modifier is a debugging feature. It requests that pcretest out- |
| 259 |
put a representation of the compiled byte code after compilation. Nor- |
put a representation of the compiled byte code after compilation. Nor- |
| 304 |
The /M modifier causes the size of memory block used to hold the com- |
The /M modifier causes the size of memory block used to hold the com- |
| 305 |
piled pattern to be output. |
piled pattern to be output. |
| 306 |
|
|
| 307 |
The /S modifier causes pcre_study() to be called after the expression |
If the /S modifier appears once, it causes pcre_study() to be called |
| 308 |
has been compiled, and the results used when the expression is matched. |
after the expression has been compiled, and the results used when the |
| 309 |
|
expression is matched. If /S appears twice, it suppresses studying, |
| 310 |
|
even if it was requested externally by the -s command line option. This |
| 311 |
|
makes it possible to specify that certain patterns are always studied, |
| 312 |
|
and others are never studied, independently of -s. This feature is used |
| 313 |
|
in the test files in a few cases where the output is different when the |
| 314 |
|
pattern is studied. |
| 315 |
|
|
| 316 |
The /T modifier must be followed by a single digit. It causes a spe- |
The /T modifier must be followed by a single digit. It causes a spe- |
| 317 |
cific set of built-in character tables to be passed to pcre_compile(). |
cific set of built-in character tables to be passed to pcre_compile(). |
| 346 |
DATA LINES |
DATA LINES |
| 347 |
|
|
| 348 |
Before each data line is passed to pcre_exec(), leading and trailing |
Before each data line is passed to pcre_exec(), leading and trailing |
| 349 |
whitespace is removed, and it is then scanned for \ escapes. Some of |
white space is removed, and it is then scanned for \ escapes. Some of |
| 350 |
these are pretty esoteric features, intended for checking out some of |
these are pretty esoteric features, intended for checking out some of |
| 351 |
the more complicated features of PCRE. If you are just testing "ordi- |
the more complicated features of PCRE. If you are just testing "ordi- |
| 352 |
nary" regular expressions, you probably don't need any of these. The |
nary" regular expressions, you probably don't need any of these. The |
| 355 |
\a alarm (BEL, \x07) |
\a alarm (BEL, \x07) |
| 356 |
\b backspace (\x08) |
\b backspace (\x08) |
| 357 |
\e escape (\x27) |
\e escape (\x27) |
| 358 |
\f formfeed (\x0c) |
\f form feed (\x0c) |
| 359 |
\n newline (\x0a) |
\n newline (\x0a) |
| 360 |
\qdd set the PCRE_MATCH_LIMIT limit to dd |
\qdd set the PCRE_MATCH_LIMIT limit to dd |
| 361 |
(any number of digits) |
(any number of digits) |
| 503 |
(Note that this is the entire substring that was inspected during the |
(Note that this is the entire substring that was inspected during the |
| 504 |
partial match; it may include characters before the actual match start |
partial match; it may include characters before the actual match start |
| 505 |
if a lookbehind assertion, \K, \b, or \B was involved.) For any other |
if a lookbehind assertion, \K, \b, or \B was involved.) For any other |
| 506 |
returns, it outputs the PCRE negative error number. Here is an example |
return, pcretest outputs the PCRE negative error number and a short |
| 507 |
of an interactive pcretest run. |
descriptive phrase. If the error is a failed UTF-8 string check, the |
| 508 |
|
byte offset of the start of the failing character and the reason code |
| 509 |
|
are also output, provided that the size of the output vector is at |
| 510 |
|
least two. Here is an example of an interactive pcretest run. |
| 511 |
|
|
| 512 |
$ pcretest |
$ pcretest |
| 513 |
PCRE version 7.0 30-Nov-2006 |
PCRE version 8.13 2011-04-30 |
| 514 |
|
|
| 515 |
re> /^abc(\d+)/ |
re> /^abc(\d+)/ |
| 516 |
data> abc123 |
data> abc123 |
| 519 |
data> xyz |
data> xyz |
| 520 |
No match |
No match |
| 521 |
|
|
| 522 |
Note that unset capturing substrings that are not followed by one that |
Unset capturing substrings that are not followed by one that is set are |
| 523 |
is set are not returned by pcre_exec(), and are not shown by pcretest. |
not returned by pcre_exec(), and are not shown by pcretest. In the fol- |
| 524 |
In the following example, there are two capturing substrings, but when |
lowing example, there are two capturing substrings, but when the first |
| 525 |
the first data line is matched, the second, unset substring is not |
data line is matched, the second, unset substring is not shown. An |
| 526 |
shown. An "internal" unset substring is shown as "<unset>", as for the |
"internal" unset substring is shown as "<unset>", as for the second |
| 527 |
second data line. |
data line. |
| 528 |
|
|
| 529 |
re> /(a)|(b)/ |
re> /(a)|(b)/ |
| 530 |
data> a |
data> a |
| 535 |
1: <unset> |
1: <unset> |
| 536 |
2: b |
2: b |
| 537 |
|
|
| 538 |
If the strings contain any non-printing characters, they are output as |
If the strings contain any non-printing characters, they are output as |
| 539 |
\0x escapes, or as \x{...} escapes if the /8 modifier was present on |
\0x escapes, or as \x{...} escapes if the /8 modifier was present on |
| 540 |
the pattern. See below for the definition of non-printing characters. |
the pattern. See below for the definition of non-printing characters. |
| 541 |
If the pattern has the /+ modifier, the output for substring 0 is fol- |
If the pattern has the /+ modifier, the output for substring 0 is fol- |
| 542 |
lowed by the the rest of the subject string, identified by "0+" like |
lowed by the the rest of the subject string, identified by "0+" like |
| 543 |
this: |
this: |
| 544 |
|
|
| 545 |
re> /cat/+ |
re> /cat/+ |
| 547 |
0: cat |
0: cat |
| 548 |
0+ aract |
0+ aract |
| 549 |
|
|
| 550 |
If the pattern has the /g or /G modifier, the results of successive |
If the pattern has the /g or /G modifier, the results of successive |
| 551 |
matching attempts are output in sequence, like this: |
matching attempts are output in sequence, like this: |
| 552 |
|
|
| 553 |
re> /\Bi(\w\w)/g |
re> /\Bi(\w\w)/g |
| 559 |
0: ipp |
0: ipp |
| 560 |
1: pp |
1: pp |
| 561 |
|
|
| 562 |
"No match" is output only if the first match attempt fails. |
"No match" is output only if the first match attempt fails. Here is an |
| 563 |
|
example of a failure message (the offset 4 that is specified by \>4 is |
| 564 |
|
past the end of the subject string): |
| 565 |
|
|
| 566 |
|
re> /xyz/ |
| 567 |
|
data> xyz\>4 |
| 568 |
|
Error -24 (bad offset value) |
| 569 |
|
|
| 570 |
If any of the sequences \C, \G, or \L are present in a data line that |
If any of the sequences \C, \G, or \L are present in a data line that |
| 571 |
is successfully matched, the substrings extracted by the convenience |
is successfully matched, the substrings extracted by the convenience |
| 572 |
functions are output with C, G, or L after the string number instead of |
functions are output with C, G, or L after the string number instead of |
| 573 |
a colon. This is in addition to the normal full list. The string length |
a colon. This is in addition to the normal full list. The string length |
| 574 |
(that is, the return from the extraction function) is given in paren- |
(that is, the return from the extraction function) is given in paren- |
| 575 |
theses after each string for \C and \G. |
theses after each string for \C and \G. |
| 576 |
|
|
| 577 |
Note that whereas patterns can be continued over several lines (a plain |
Note that whereas patterns can be continued over several lines (a plain |
| 578 |
">" prompt is used for continuations), data lines may not. However new- |
">" prompt is used for continuations), data lines may not. However new- |
| 579 |
lines can be included in data by means of the \n escape (or \r, \r\n, |
lines can be included in data by means of the \n escape (or \r, \r\n, |
| 580 |
etc., depending on the newline sequence setting). |
etc., depending on the newline sequence setting). |
| 581 |
|
|
| 582 |
|
|
| 583 |
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION |
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION |
| 584 |
|
|
| 585 |
When the alternative matching function, pcre_dfa_exec(), is used (by |
When the alternative matching function, pcre_dfa_exec(), is used (by |
| 586 |
means of the \D escape sequence or the -dfa command line option), the |
means of the \D escape sequence or the -dfa command line option), the |
| 587 |
output consists of a list of all the matches that start at the first |
output consists of a list of all the matches that start at the first |
| 588 |
point in the subject where there is at least one match. For example: |
point in the subject where there is at least one match. For example: |
| 589 |
|
|
| 590 |
re> /(tang|tangerine|tan)/ |
re> /(tang|tangerine|tan)/ |
| 593 |
1: tang |
1: tang |
| 594 |
2: tan |
2: tan |
| 595 |
|
|
| 596 |
(Using the normal matching function on this data finds only "tang".) |
(Using the normal matching function on this data finds only "tang".) |
| 597 |
The longest matching string is always given first (and numbered zero). |
The longest matching string is always given first (and numbered zero). |
| 598 |
After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol- |
After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol- |
| 599 |
lowed by the partially matching substring. (Note that this is the |
lowed by the partially matching substring. (Note that this is the |
| 600 |
entire substring that was inspected during the partial match; it may |
entire substring that was inspected during the partial match; it may |
| 601 |
include characters before the actual match start if a lookbehind asser- |
include characters before the actual match start if a lookbehind asser- |
| 602 |
tion, \K, \b, or \B was involved.) |
tion, \K, \b, or \B was involved.) |
| 603 |
|
|
| 613 |
1: tan |
1: tan |
| 614 |
0: tan |
0: tan |
| 615 |
|
|
| 616 |
Since the matching function does not support substring capture, the |
Since the matching function does not support substring capture, the |
| 617 |
escape sequences that are concerned with captured substrings are not |
escape sequences that are concerned with captured substrings are not |
| 618 |
relevant. |
relevant. |
| 619 |
|
|
| 620 |
|
|
| 621 |
RESTARTING AFTER A PARTIAL MATCH |
RESTARTING AFTER A PARTIAL MATCH |
| 622 |
|
|
| 623 |
When the alternative matching function has given the PCRE_ERROR_PARTIAL |
When the alternative matching function has given the PCRE_ERROR_PARTIAL |
| 624 |
return, indicating that the subject partially matched the pattern, you |
return, indicating that the subject partially matched the pattern, you |
| 625 |
can restart the match with additional subject data by means of the \R |
can restart the match with additional subject data by means of the \R |
| 626 |
escape sequence. For example: |
escape sequence. For example: |
| 627 |
|
|
| 628 |
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ |
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ |
| 631 |
data> n05\R\D |
data> n05\R\D |
| 632 |
0: n05 |
0: n05 |
| 633 |
|
|
| 634 |
For further information about partial matching, see the pcrepartial |
For further information about partial matching, see the pcrepartial |
| 635 |
documentation. |
documentation. |
| 636 |
|
|
| 637 |
|
|
| 638 |
CALLOUTS |
CALLOUTS |
| 639 |
|
|
| 640 |
If the pattern contains any callout requests, pcretest's callout func- |
If the pattern contains any callout requests, pcretest's callout func- |
| 641 |
tion is called during matching. This works with both matching func- |
tion is called during matching. This works with both matching func- |
| 642 |
tions. By default, the called function displays the callout number, the |
tions. By default, the called function displays the callout number, the |
| 643 |
start and current positions in the text at the callout time, and the |
start and current positions in the text at the callout time, and the |
| 644 |
next pattern item to be tested. For example, the output |
next pattern item to be tested. For example, the output |
| 645 |
|
|
| 646 |
--->pqrabcdef |
--->pqrabcdef |
| 647 |
0 ^ ^ \d |
0 ^ ^ \d |
| 648 |
|
|
| 649 |
indicates that callout number 0 occurred for a match attempt starting |
indicates that callout number 0 occurred for a match attempt starting |
| 650 |
at the fourth character of the subject string, when the pointer was at |
at the fourth character of the subject string, when the pointer was at |
| 651 |
the seventh character of the data, and when the next pattern item was |
the seventh character of the data, and when the next pattern item was |
| 652 |
\d. Just one circumflex is output if the start and current positions |
\d. Just one circumflex is output if the start and current positions |
| 653 |
are the same. |
are the same. |
| 654 |
|
|
| 655 |
Callouts numbered 255 are assumed to be automatic callouts, inserted as |
Callouts numbered 255 are assumed to be automatic callouts, inserted as |
| 656 |
a result of the /C pattern modifier. In this case, instead of showing |
a result of the /C pattern modifier. In this case, instead of showing |
| 657 |
the callout number, the offset in the pattern, preceded by a plus, is |
the callout number, the offset in the pattern, preceded by a plus, is |
| 658 |
output. For example: |
output. For example: |
| 659 |
|
|
| 660 |
re> /\d?[A-E]\*/C |
re> /\d?[A-E]\*/C |
| 666 |
+10 ^ ^ |
+10 ^ ^ |
| 667 |
0: E* |
0: E* |
| 668 |
|
|
| 669 |
|
If a pattern contains (*MARK) items, an additional line is output when- |
| 670 |
|
ever a change of latest mark is passed to the callout function. For |
| 671 |
|
example: |
| 672 |
|
|
| 673 |
|
re> /a(*MARK:X)bc/C |
| 674 |
|
data> abc |
| 675 |
|
--->abc |
| 676 |
|
+0 ^ a |
| 677 |
|
+1 ^^ (*MARK:X) |
| 678 |
|
+10 ^^ b |
| 679 |
|
Latest Mark: X |
| 680 |
|
+11 ^ ^ c |
| 681 |
|
+12 ^ ^ |
| 682 |
|
0: abc |
| 683 |
|
|
| 684 |
|
The mark changes between matching "a" and "b", but stays the same for |
| 685 |
|
the rest of the match, so nothing more is output. If, as a result of |
| 686 |
|
backtracking, the mark reverts to being unset, the text "<unset>" is |
| 687 |
|
output. |
| 688 |
|
|
| 689 |
The callout function in pcretest returns zero (carry on matching) by |
The callout function in pcretest returns zero (carry on matching) by |
| 690 |
default, but you can use a \C item in a data line (as described above) |
default, but you can use a \C item in a data line (as described above) |
| 691 |
to change this. |
to change this and other parameters of the callout. |
| 692 |
|
|
| 693 |
Inserting callouts can be helpful when using pcretest to check compli- |
Inserting callouts can be helpful when using pcretest to check compli- |
| 694 |
cated regular expressions. For further information about callouts, see |
cated regular expressions. For further information about callouts, see |
| 710 |
SAVING AND RELOADING COMPILED PATTERNS |
SAVING AND RELOADING COMPILED PATTERNS |
| 711 |
|
|
| 712 |
The facilities described in this section are not available when the |
The facilities described in this section are not available when the |
| 713 |
POSIX inteface to PCRE is being used, that is, when the /P pattern mod- |
POSIX interface to PCRE is being used, that is, when the /P pattern |
| 714 |
ifier is specified. |
modifier is specified. |
| 715 |
|
|
| 716 |
When the POSIX interface is not in use, you can cause pcretest to write |
When the POSIX interface is not in use, you can cause pcretest to write |
| 717 |
a compiled pattern to a file, by following the modifiers with > and a |
a compiled pattern to a file, by following the modifiers with > and a |
| 732 |
diately after the compiled pattern. After writing the file, pcretest |
diately after the compiled pattern. After writing the file, pcretest |
| 733 |
expects to read a new pattern. |
expects to read a new pattern. |
| 734 |
|
|
| 735 |
A saved pattern can be reloaded into pcretest by specifing < and a file |
A saved pattern can be reloaded into pcretest by specifying < and a |
| 736 |
name instead of a pattern. The name of the file must not contain a < |
file name instead of a pattern. The name of the file must not contain a |
| 737 |
character, as otherwise pcretest will interpret the line as a pattern |
< character, as otherwise pcretest will interpret the line as a pattern |
| 738 |
delimited by < characters. For example: |
delimited by < characters. For example: |
| 739 |
|
|
| 740 |
re> </some/file |
re> </some/file |
| 741 |
Compiled regex loaded from /some/file |
Compiled pattern loaded from /some/file |
| 742 |
No study data |
No study data |
| 743 |
|
|
| 744 |
When the pattern has been loaded, pcretest proceeds to read data lines |
When the pattern has been loaded, pcretest proceeds to read data lines |
| 778 |
|
|
| 779 |
REVISION |
REVISION |
| 780 |
|
|
| 781 |
Last updated: 21 November 2010 |
Last updated: 01 August 2011 |
| 782 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2011 University of Cambridge. |