| 14 |
<br> |
<br> |
| 15 |
<ul> |
<ul> |
| 16 |
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a> |
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a> |
| 17 |
<li><a name="TOC2" href="#SEC2">OPTIONS</a> |
<li><a name="TOC2" href="#SEC2">COMMAND LINE OPTIONS</a> |
| 18 |
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a> |
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a> |
| 19 |
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a> |
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a> |
| 20 |
<li><a name="TOC5" href="#SEC5">DATA LINES</a> |
<li><a name="TOC5" href="#SEC5">DATA LINES</a> |
| 31 |
</ul> |
</ul> |
| 32 |
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
| 33 |
<P> |
<P> |
| 34 |
<b>pcretest [options] [source] [destination]</b> |
<b>pcretest [options] [input file [output file]]</b> |
| 35 |
<br> |
<br> |
| 36 |
<br> |
<br> |
| 37 |
<b>pcretest</b> was written as a test program for the PCRE regular expression |
<b>pcretest</b> was written as a test program for the PCRE regular expression |
| 42 |
documentation. For details of the PCRE library function calls and their |
documentation. For details of the PCRE library function calls and their |
| 43 |
options, see the |
options, see the |
| 44 |
<a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| 45 |
documentation. |
documentation. The input for <b>pcretest</b> is a sequence of regular expression |
| 46 |
|
patterns and strings to be matched, as described below. The output shows the |
| 47 |
|
result of each match. Options on the command line and the patterns control PCRE |
| 48 |
|
options and exactly what is output. |
| 49 |
</P> |
</P> |
| 50 |
<br><a name="SEC2" href="#TOC1">OPTIONS</a><br> |
<br><a name="SEC2" href="#TOC1">COMMAND LINE OPTIONS</a><br> |
| 51 |
<P> |
<P> |
| 52 |
<b>-b</b> |
<b>-b</b> |
| 53 |
Behave as if each regex has the <b>/B</b> (show bytecode) modifier; the internal |
Behave as if each pattern has the <b>/B</b> (show byte code) modifier; the |
| 54 |
form is output after compilation. |
internal form is output after compilation. |
| 55 |
</P> |
</P> |
| 56 |
<P> |
<P> |
| 57 |
<b>-C</b> |
<b>-C</b> |
| 60 |
</P> |
</P> |
| 61 |
<P> |
<P> |
| 62 |
<b>-d</b> |
<b>-d</b> |
| 63 |
Behave as if each regex has the <b>/D</b> (debug) modifier; the internal |
Behave as if each pattern has the <b>/D</b> (debug) modifier; the internal |
| 64 |
form and information about the compiled pattern is output after compilation; |
form and information about the compiled pattern is output after compilation; |
| 65 |
<b>-d</b> is equivalent to <b>-b -i</b>. |
<b>-d</b> is equivalent to <b>-b -i</b>. |
| 66 |
</P> |
</P> |
| 76 |
</P> |
</P> |
| 77 |
<P> |
<P> |
| 78 |
<b>-i</b> |
<b>-i</b> |
| 79 |
Behave as if each regex has the <b>/I</b> modifier; information about the |
Behave as if each pattern has the <b>/I</b> modifier; information about the |
| 80 |
compiled pattern is given after compilation. |
compiled pattern is given after compilation. |
| 81 |
</P> |
</P> |
| 82 |
<P> |
<P> |
| 88 |
<P> |
<P> |
| 89 |
<b>-m</b> |
<b>-m</b> |
| 90 |
Output the size of each compiled pattern after it has been compiled. This is |
Output the size of each compiled pattern after it has been compiled. This is |
| 91 |
equivalent to adding <b>/M</b> to each regular expression. For compatibility |
equivalent to adding <b>/M</b> to each regular expression. |
|
with earlier versions of pcretest, <b>-s</b> is a synonym for <b>-m</b>. |
|
| 92 |
</P> |
</P> |
| 93 |
<P> |
<P> |
| 94 |
<b>-o</b> <i>osize</i> |
<b>-o</b> <i>osize</i> |
| 101 |
</P> |
</P> |
| 102 |
<P> |
<P> |
| 103 |
<b>-p</b> |
<b>-p</b> |
| 104 |
Behave as if each regex has the <b>/P</b> modifier; the POSIX wrapper API is |
Behave as if each pattern has the <b>/P</b> modifier; the POSIX wrapper API is |
| 105 |
used to call PCRE. None of the other options has any effect when <b>-p</b> is |
used to call PCRE. None of the other options has any effect when <b>-p</b> is |
| 106 |
set. |
set. |
| 107 |
</P> |
</P> |
| 111 |
</P> |
</P> |
| 112 |
<P> |
<P> |
| 113 |
<b>-S</b> <i>size</i> |
<b>-S</b> <i>size</i> |
| 114 |
On Unix-like systems, set the size of the runtime stack to <i>size</i> |
On Unix-like systems, set the size of the run-time stack to <i>size</i> |
| 115 |
megabytes. |
megabytes. |
| 116 |
</P> |
</P> |
| 117 |
<P> |
<P> |
| 118 |
|
<b>-s</b> |
| 119 |
|
Behave as if each pattern has the <b>/S</b> modifier; in other words, force each |
| 120 |
|
pattern to be studied. If the <b>/I</b> or <b>/D</b> option is present on a |
| 121 |
|
pattern (requesting output about the compiled pattern), information about the |
| 122 |
|
result of studying is not included when studying is caused only by <b>-s</b> and |
| 123 |
|
neither <b>-i</b> nor <b>-d</b> is present on the command line. This behaviour |
| 124 |
|
means that the output from tests that are run with and without <b>-s</b> should |
| 125 |
|
be identical, except when options that output information about the actual |
| 126 |
|
running of a match are set. The <b>-M</b>, <b>-t</b>, and <b>-tm</b> options, |
| 127 |
|
which give information about resources used, are likely to produce different |
| 128 |
|
output with and without <b>-s</b>. Output may also differ if the <b>/C</b> option |
| 129 |
|
is present on an individual pattern. This uses callouts to trace the the |
| 130 |
|
matching process, and this may be different between studied and non-studied |
| 131 |
|
patterns. If the pattern contains (*MARK) items there may also be differences, |
| 132 |
|
for the same reason. The <b>-s</b> command line option can be overridden for |
| 133 |
|
specific patterns that should never be studied (see the /S option below). |
| 134 |
|
</P> |
| 135 |
|
<P> |
| 136 |
<b>-t</b> |
<b>-t</b> |
| 137 |
Run each compile, study, and match many times with a timer, and output |
Run each compile, study, and match many times with a timer, and output |
| 138 |
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with |
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with |
| 209 |
A pattern may be followed by any number of modifiers, which are mostly single |
A pattern may be followed by any number of modifiers, which are mostly single |
| 210 |
characters. Following Perl usage, these are referred to below as, for example, |
characters. Following Perl usage, these are referred to below as, for example, |
| 211 |
"the <b>/i</b> modifier", even though the delimiter of the pattern need not |
"the <b>/i</b> modifier", even though the delimiter of the pattern need not |
| 212 |
always be a slash, and no slash is used when writing modifiers. Whitespace may |
always be a slash, and no slash is used when writing modifiers. White space may |
| 213 |
appear between the final pattern delimiter and the first modifier, and between |
appear between the final pattern delimiter and the first modifier, and between |
| 214 |
the modifiers themselves. |
the modifiers themselves. |
| 215 |
</P> |
</P> |
| 246 |
<b>/<bsr_unicode></b> PCRE_BSR_UNICODE |
<b>/<bsr_unicode></b> PCRE_BSR_UNICODE |
| 247 |
</pre> |
</pre> |
| 248 |
The modifiers that are enclosed in angle brackets are literal strings as shown, |
The modifiers that are enclosed in angle brackets are literal strings as shown, |
| 249 |
including the angle brackets, but the letters can be in either case. This |
including the angle brackets, but the letters within can be in either case. |
| 250 |
example sets multiline matching with CRLF as the line ending sequence: |
This example sets multiline matching with CRLF as the line ending sequence: |
| 251 |
<pre> |
<pre> |
| 252 |
/^abc/m<crlf> |
/^abc/m<CRLF> |
| 253 |
</pre> |
</pre> |
| 254 |
As well as turning on the PCRE_UTF8 option, the <b>/8</b> modifier also causes |
As well as turning on the PCRE_UTF8 option, the <b>/8</b> modifier also causes |
| 255 |
any non-printing characters in output strings to be printed using the |
any non-printing characters in output strings to be printed using the |
| 291 |
</P> |
</P> |
| 292 |
<P> |
<P> |
| 293 |
The <b>/+</b> modifier requests that as well as outputting the substring that |
The <b>/+</b> modifier requests that as well as outputting the substring that |
| 294 |
matched the entire pattern, pcretest should in addition output the remainder of |
matched the entire pattern, <b>pcretest</b> should in addition output the |
| 295 |
the subject string. This is useful for tests where the subject contains |
remainder of the subject string. This is useful for tests where the subject |
| 296 |
multiple copies of the same substring. |
contains multiple copies of the same substring. If the <b>+</b> modifier appears |
| 297 |
|
twice, the same action is taken for captured substrings. In each case the |
| 298 |
|
remainder is output on the following line with a plus character following the |
| 299 |
|
capture number. |
| 300 |
|
</P> |
| 301 |
|
<P> |
| 302 |
|
The <b>/=</b> modifier requests that the values of all potential captured |
| 303 |
|
parentheses be output after a match by <b>pcre_exec()</b>. By default, only |
| 304 |
|
those up to the highest one actually used in the match are output |
| 305 |
|
(corresponding to the return code from <b>pcre_exec()</b>). Values in the |
| 306 |
|
offsets vector corresponding to higher numbers should be set to -1, and these |
| 307 |
|
are output as "<unset>". This modifier gives a way of checking that this is |
| 308 |
|
happening. |
| 309 |
</P> |
</P> |
| 310 |
<P> |
<P> |
| 311 |
The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b> |
The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b> |
| 363 |
pattern to be output. |
pattern to be output. |
| 364 |
</P> |
</P> |
| 365 |
<P> |
<P> |
| 366 |
The <b>/S</b> modifier causes <b>pcre_study()</b> to be called after the |
If the <b>/S</b> modifier appears once, it causes <b>pcre_study()</b> to be |
| 367 |
expression has been compiled, and the results used when the expression is |
called after the expression has been compiled, and the results used when the |
| 368 |
matched. |
expression is matched. If <b>/S</b> appears twice, it suppresses studying, even |
| 369 |
|
if it was requested externally by the <b>-s</b> command line option. This makes |
| 370 |
|
it possible to specify that certain patterns are always studied, and others are |
| 371 |
|
never studied, independently of <b>-s</b>. This feature is used in the test |
| 372 |
|
files in a few cases where the output is different when the pattern is studied. |
| 373 |
</P> |
</P> |
| 374 |
<P> |
<P> |
| 375 |
The <b>/T</b> modifier must be followed by a single digit. It causes a specific |
The <b>/T</b> modifier must be followed by a single digit. It causes a specific |
| 406 |
<br><a name="SEC5" href="#TOC1">DATA LINES</a><br> |
<br><a name="SEC5" href="#TOC1">DATA LINES</a><br> |
| 407 |
<P> |
<P> |
| 408 |
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing |
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing |
| 409 |
whitespace is removed, and it is then scanned for \ escapes. Some of these are |
white space is removed, and it is then scanned for \ escapes. Some of these |
| 410 |
pretty esoteric features, intended for checking out some of the more |
are pretty esoteric features, intended for checking out some of the more |
| 411 |
complicated features of PCRE. If you are just testing "ordinary" regular |
complicated features of PCRE. If you are just testing "ordinary" regular |
| 412 |
expressions, you probably don't need any of these. The following escapes are |
expressions, you probably don't need any of these. The following escapes are |
| 413 |
recognized: |
recognized: |
| 415 |
\a alarm (BEL, \x07) |
\a alarm (BEL, \x07) |
| 416 |
\b backspace (\x08) |
\b backspace (\x08) |
| 417 |
\e escape (\x27) |
\e escape (\x27) |
| 418 |
\f formfeed (\x0c) |
\f form feed (\x0c) |
| 419 |
\n newline (\x0a) |
\n newline (\x0a) |
| 420 |
\qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits) |
\qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits) |
| 421 |
\r carriage return (\x0d) |
\r carriage return (\x0d) |
| 534 |
<b>pcre_exec()</b>, is being used. |
<b>pcre_exec()</b>, is being used. |
| 535 |
</P> |
</P> |
| 536 |
<P> |
<P> |
| 537 |
When a match succeeds, pcretest outputs the list of captured substrings that |
When a match succeeds, <b>pcretest</b> outputs the list of captured substrings |
| 538 |
<b>pcre_exec()</b> returns, starting with number 0 for the string that matched |
that <b>pcre_exec()</b> returns, starting with number 0 for the string that |
| 539 |
the whole pattern. Otherwise, it outputs "No match" when the return is |
matched the whole pattern. Otherwise, it outputs "No match" when the return is |
| 540 |
PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
| 541 |
substring when <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that this is |
substring when <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL. (Note that this is |
| 542 |
the entire substring that was inspected during the partial match; it may |
the entire substring that was inspected during the partial match; it may |
| 543 |
include characters before the actual match start if a lookbehind assertion, |
include characters before the actual match start if a lookbehind assertion, |
| 544 |
\K, \b, or \B was involved.) For any other returns, it outputs the PCRE |
\K, \b, or \B was involved.) For any other return, <b>pcretest</b> outputs |
| 545 |
negative error number. Here is an example of an interactive <b>pcretest</b> run. |
the PCRE negative error number and a short descriptive phrase. If the error is |
| 546 |
|
a failed UTF-8 string check, the byte offset of the start of the failing |
| 547 |
|
character and the reason code are also output, provided that the size of the |
| 548 |
|
output vector is at least two. Here is an example of an interactive |
| 549 |
|
<b>pcretest</b> run. |
| 550 |
<pre> |
<pre> |
| 551 |
$ pcretest |
$ pcretest |
| 552 |
PCRE version 7.0 30-Nov-2006 |
PCRE version 8.13 2011-04-30 |
| 553 |
|
|
| 554 |
re> /^abc(\d+)/ |
re> /^abc(\d+)/ |
| 555 |
data> abc123 |
data> abc123 |
| 558 |
data> xyz |
data> xyz |
| 559 |
No match |
No match |
| 560 |
</pre> |
</pre> |
| 561 |
Note that unset capturing substrings that are not followed by one that is set |
Unset capturing substrings that are not followed by one that is set are not |
| 562 |
are not returned by <b>pcre_exec()</b>, and are not shown by <b>pcretest</b>. In |
returned by <b>pcre_exec()</b>, and are not shown by <b>pcretest</b>. In the |
| 563 |
the following example, there are two capturing substrings, but when the first |
following example, there are two capturing substrings, but when the first data |
| 564 |
data line is matched, the second, unset substring is not shown. An "internal" |
line is matched, the second, unset substring is not shown. An "internal" unset |
| 565 |
unset substring is shown as "<unset>", as for the second data line. |
substring is shown as "<unset>", as for the second data line. |
| 566 |
<pre> |
<pre> |
| 567 |
re> /(a)|(b)/ |
re> /(a)|(b)/ |
| 568 |
data> a |
data> a |
| 596 |
0: ipp |
0: ipp |
| 597 |
1: pp |
1: pp |
| 598 |
</pre> |
</pre> |
| 599 |
"No match" is output only if the first match attempt fails. |
"No match" is output only if the first match attempt fails. Here is an example |
| 600 |
|
of a failure message (the offset 4 that is specified by \>4 is past the end of |
| 601 |
|
the subject string): |
| 602 |
|
<pre> |
| 603 |
|
re> /xyz/ |
| 604 |
|
data> xyz\>4 |
| 605 |
|
Error -24 (bad offset value) |
| 606 |
|
</PRE> |
| 607 |
</P> |
</P> |
| 608 |
<P> |
<P> |
| 609 |
If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a |
If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a |
| 703 |
+10 ^ ^ |
+10 ^ ^ |
| 704 |
0: E* |
0: E* |
| 705 |
</pre> |
</pre> |
| 706 |
|
If a pattern contains (*MARK) items, an additional line is output whenever |
| 707 |
|
a change of latest mark is passed to the callout function. For example: |
| 708 |
|
<pre> |
| 709 |
|
re> /a(*MARK:X)bc/C |
| 710 |
|
data> abc |
| 711 |
|
--->abc |
| 712 |
|
+0 ^ a |
| 713 |
|
+1 ^^ (*MARK:X) |
| 714 |
|
+10 ^^ b |
| 715 |
|
Latest Mark: X |
| 716 |
|
+11 ^ ^ c |
| 717 |
|
+12 ^ ^ |
| 718 |
|
0: abc |
| 719 |
|
</pre> |
| 720 |
|
The mark changes between matching "a" and "b", but stays the same for the rest |
| 721 |
|
of the match, so nothing more is output. If, as a result of backtracking, the |
| 722 |
|
mark reverts to being unset, the text "<unset>" is output. |
| 723 |
|
</P> |
| 724 |
|
<P> |
| 725 |
The callout function in <b>pcretest</b> returns zero (carry on matching) by |
The callout function in <b>pcretest</b> returns zero (carry on matching) by |
| 726 |
default, but you can use a \C item in a data line (as described above) to |
default, but you can use a \C item in a data line (as described above) to |
| 727 |
change this. |
change this and other parameters of the callout. |
| 728 |
</P> |
</P> |
| 729 |
<P> |
<P> |
| 730 |
Inserting callouts can be helpful when using <b>pcretest</b> to check |
Inserting callouts can be helpful when using <b>pcretest</b> to check |
| 748 |
<br><a name="SEC12" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br> |
<br><a name="SEC12" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br> |
| 749 |
<P> |
<P> |
| 750 |
The facilities described in this section are not available when the POSIX |
The facilities described in this section are not available when the POSIX |
| 751 |
inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is |
| 752 |
specified. |
specified. |
| 753 |
</P> |
</P> |
| 754 |
<P> |
<P> |
| 773 |
<b>pcretest</b> expects to read a new pattern. |
<b>pcretest</b> expects to read a new pattern. |
| 774 |
</P> |
</P> |
| 775 |
<P> |
<P> |
| 776 |
A saved pattern can be reloaded into <b>pcretest</b> by specifing < and a file |
A saved pattern can be reloaded into <b>pcretest</b> by specifying < and a file |
| 777 |
name instead of a pattern. The name of the file must not contain a < character, |
name instead of a pattern. The name of the file must not contain a < character, |
| 778 |
as otherwise <b>pcretest</b> will interpret the line as a pattern delimited by < |
as otherwise <b>pcretest</b> will interpret the line as a pattern delimited by < |
| 779 |
characters. |
characters. |
| 780 |
For example: |
For example: |
| 781 |
<pre> |
<pre> |
| 782 |
re> </some/file |
re> </some/file |
| 783 |
Compiled regex loaded from /some/file |
Compiled pattern loaded from /some/file |
| 784 |
No study data |
No study data |
| 785 |
</pre> |
</pre> |
| 786 |
When the pattern has been loaded, <b>pcretest</b> proceeds to read data lines in |
When the pattern has been loaded, <b>pcretest</b> proceeds to read data lines in |
| 823 |
</P> |
</P> |
| 824 |
<br><a name="SEC15" href="#TOC1">REVISION</a><br> |
<br><a name="SEC15" href="#TOC1">REVISION</a><br> |
| 825 |
<P> |
<P> |
| 826 |
Last updated: 21 November 2010 |
Last updated: 01 August 2011 |
| 827 |
<br> |
<br> |
| 828 |
Copyright © 1997-2010 University of Cambridge. |
Copyright © 1997-2011 University of Cambridge. |
| 829 |
<br> |
<br> |
| 830 |
<p> |
<p> |
| 831 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |