| 85 |
"re>" to prompt for regular expressions, and "data>" to prompt for data |
"re>" to prompt for regular expressions, and "data>" to prompt for data |
| 86 |
lines. |
lines. |
| 87 |
|
|
| 88 |
|
When pcretest is built, a configuration option can specify that it |
| 89 |
|
should be linked with the libreadline library. When this is done, if |
| 90 |
|
the input is from a terminal, it is read using the readline() function. |
| 91 |
|
This provides line-editing and history facilities. The output from the |
| 92 |
|
-help option states whether or not readline() will be used. |
| 93 |
|
|
| 94 |
The program handles any number of sets of input on a single input file. |
The program handles any number of sets of input on a single input file. |
| 95 |
Each set starts with a regular expression, and continues with any num- |
Each set starts with a regular expression, and continues with any num- |
| 96 |
ber of data lines to be matched against the pattern. |
ber of data lines to be matched against the pattern. |
| 152 |
The following table shows additional modifiers for setting PCRE options |
The following table shows additional modifiers for setting PCRE options |
| 153 |
that do not correspond to anything in Perl: |
that do not correspond to anything in Perl: |
| 154 |
|
|
| 155 |
/A PCRE_ANCHORED |
/A PCRE_ANCHORED |
| 156 |
/C PCRE_AUTO_CALLOUT |
/C PCRE_AUTO_CALLOUT |
| 157 |
/E PCRE_DOLLAR_ENDONLY |
/E PCRE_DOLLAR_ENDONLY |
| 158 |
/f PCRE_FIRSTLINE |
/f PCRE_FIRSTLINE |
| 159 |
/J PCRE_DUPNAMES |
/J PCRE_DUPNAMES |
| 160 |
/N PCRE_NO_AUTO_CAPTURE |
/N PCRE_NO_AUTO_CAPTURE |
| 161 |
/U PCRE_UNGREEDY |
/U PCRE_UNGREEDY |
| 162 |
/X PCRE_EXTRA |
/X PCRE_EXTRA |
| 163 |
/<cr> PCRE_NEWLINE_CR |
/<cr> PCRE_NEWLINE_CR |
| 164 |
/<lf> PCRE_NEWLINE_LF |
/<lf> PCRE_NEWLINE_LF |
| 165 |
/<crlf> PCRE_NEWLINE_CRLF |
/<crlf> PCRE_NEWLINE_CRLF |
| 166 |
/<anycrlf> PCRE_NEWLINE_ANYCRLF |
/<anycrlf> PCRE_NEWLINE_ANYCRLF |
| 167 |
/<any> PCRE_NEWLINE_ANY |
/<any> PCRE_NEWLINE_ANY |
| 168 |
|
/<bsr_anycrlf> PCRE_BSR_ANYCRLF |
| 169 |
Those specifying line ending sequencess are literal strings as shown. |
/<bsr_unicode> PCRE_BSR_UNICODE |
| 170 |
This example sets multiline matching with CRLF as the line ending |
|
| 171 |
sequence: |
Those specifying line ending sequences are literal strings as shown, |
| 172 |
|
but the letters can be in either case. This example sets multiline |
| 173 |
|
matching with CRLF as the line ending sequence: |
| 174 |
|
|
| 175 |
/^abc/m<crlf> |
/^abc/m<crlf> |
| 176 |
|
|
| 377 |
The use of \x{hh...} to represent UTF-8 characters is not dependent on |
The use of \x{hh...} to represent UTF-8 characters is not dependent on |
| 378 |
the use of the /8 modifier on the pattern. It is recognized always. |
the use of the /8 modifier on the pattern. It is recognized always. |
| 379 |
There may be any number of hexadecimal digits inside the braces. The |
There may be any number of hexadecimal digits inside the braces. The |
| 380 |
result is from one to six bytes, encoded according to the UTF-8 rules. |
result is from one to six bytes, encoded according to the original |
| 381 |
|
UTF-8 rules of RFC 2279. This allows for values in the range 0 to |
| 382 |
|
0x7FFFFFFF. Note that not all of those are valid Unicode code points, |
| 383 |
|
or indeed valid UTF-8 characters according to the later rules in RFC |
| 384 |
|
3629. |
| 385 |
|
|
| 386 |
|
|
| 387 |
THE ALTERNATIVE MATCHING FUNCTION |
THE ALTERNATIVE MATCHING FUNCTION |
| 388 |
|
|
| 389 |
By default, pcretest uses the standard PCRE matching function, |
By default, pcretest uses the standard PCRE matching function, |
| 390 |
pcre_exec() to match each data line. From release 6.0, PCRE supports an |
pcre_exec() to match each data line. From release 6.0, PCRE supports an |
| 391 |
alternative matching function, pcre_dfa_test(), which operates in a |
alternative matching function, pcre_dfa_test(), which operates in a |
| 392 |
different way, and has some restrictions. The differences between the |
different way, and has some restrictions. The differences between the |
| 393 |
two functions are described in the pcrematching documentation. |
two functions are described in the pcrematching documentation. |
| 394 |
|
|
| 395 |
If a data line contains the \D escape sequence, or if the command line |
If a data line contains the \D escape sequence, or if the command line |
| 396 |
contains the -dfa option, the alternative matching function is called. |
contains the -dfa option, the alternative matching function is called. |
| 397 |
This function finds all possible matches at a given point. If, however, |
This function finds all possible matches at a given point. If, however, |
| 398 |
the \F escape sequence is present in the data line, it stops after the |
the \F escape sequence is present in the data line, it stops after the |
| 399 |
first match is found. This is always the shortest possible match. |
first match is found. This is always the shortest possible match. |
| 400 |
|
|
| 401 |
|
|
| 402 |
DEFAULT OUTPUT FROM PCRETEST |
DEFAULT OUTPUT FROM PCRETEST |
| 403 |
|
|
| 404 |
This section describes the output when the normal matching function, |
This section describes the output when the normal matching function, |
| 405 |
pcre_exec(), is being used. |
pcre_exec(), is being used. |
| 406 |
|
|
| 407 |
When a match succeeds, pcretest outputs the list of captured substrings |
When a match succeeds, pcretest outputs the list of captured substrings |
| 408 |
that pcre_exec() returns, starting with number 0 for the string that |
that pcre_exec() returns, starting with number 0 for the string that |
| 409 |
matched the whole pattern. Otherwise, it outputs "No match" or "Partial |
matched the whole pattern. Otherwise, it outputs "No match" or "Partial |
| 410 |
match" when pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PAR- |
match" when pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PAR- |
| 411 |
TIAL, respectively, and otherwise the PCRE negative error number. Here |
TIAL, respectively, and otherwise the PCRE negative error number. Here |
| 412 |
is an example of an interactive pcretest run. |
is an example of an interactive pcretest run. |
| 413 |
|
|
| 414 |
$ pcretest |
$ pcretest |
| 421 |
data> xyz |
data> xyz |
| 422 |
No match |
No match |
| 423 |
|
|
| 424 |
|
Note that unset capturing substrings that are not followed by one that |
| 425 |
|
is set are not returned by pcre_exec(), and are not shown by pcretest. |
| 426 |
|
In the following example, there are two capturing substrings, but when |
| 427 |
|
the first data line is matched, the second, unset substring is not |
| 428 |
|
shown. An "internal" unset substring is shown as "<unset>", as for the |
| 429 |
|
second data line. |
| 430 |
|
|
| 431 |
|
re> /(a)|(b)/ |
| 432 |
|
data> a |
| 433 |
|
0: a |
| 434 |
|
1: a |
| 435 |
|
data> b |
| 436 |
|
0: b |
| 437 |
|
1: <unset> |
| 438 |
|
2: b |
| 439 |
|
|
| 440 |
If the strings contain any non-printing characters, they are output as |
If the strings contain any non-printing characters, they are output as |
| 441 |
\0x escapes, or as \x{...} escapes if the /8 modifier was present on |
\0x escapes, or as \x{...} escapes if the /8 modifier was present on |
| 442 |
the pattern. See below for the definition of non-printing characters. |
the pattern. See below for the definition of non-printing characters. |
| 649 |
|
|
| 650 |
REVISION |
REVISION |
| 651 |
|
|
| 652 |
Last updated: 24 April 2007 |
Last updated: 18 December 2007 |
| 653 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |