/[pcre]/code/trunk/doc/pcretest.1
ViewVC logotype

Diff of /code/trunk/doc/pcretest.1

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 553 by ph10, Fri Oct 22 15:57:50 2010 UTC revision 616 by ph10, Mon Jul 11 15:55:25 2011 UTC
# Line 4  pcretest - a program for testing Perl-co Line 4  pcretest - a program for testing Perl-co
4  .SH SYNOPSIS  .SH SYNOPSIS
5  .rs  .rs
6  .sp  .sp
7  .B pcretest "[options] [source] [destination]"  .B pcretest "[options] [input file [output file]]"
8  .sp  .sp
9  \fBpcretest\fP was written as a test program for the PCRE regular expression  \fBpcretest\fP was written as a test program for the PCRE regular expression
10  library itself, but it can also be used for experimenting with regular  library itself, but it can also be used for experimenting with regular
# Line 18  options, see the Line 18  options, see the
18  .\" HREF  .\" HREF
19  \fBpcreapi\fP  \fBpcreapi\fP
20  .\"  .\"
21  documentation.  documentation. The input for \fBpcretest\fP is a sequence of regular expression
22    patterns and strings to be matched, as described below. The output shows the
23    result of each match. Options on the command line and the patterns control PCRE
24    options and exactly what is output.
25  .  .
26  .  .
27  .SH OPTIONS  .SH COMMAND LINE OPTIONS
28  .rs  .rs
29  .TP 10  .TP 10
30  \fB-b\fP  \fB-b\fP
31  Behave as if each regex has the \fB/B\fP (show bytecode) modifier; the internal  Behave as if each pattern has the \fB/B\fP (show byte code) modifier; the
32  form is output after compilation.  internal form is output after compilation.
33  .TP 10  .TP 10
34  \fB-C\fP  \fB-C\fP
35  Output the version number of the PCRE library, and all available information  Output the version number of the PCRE library, and all available information
36  about the optional features that are included, and then exit.  about the optional features that are included, and then exit.
37  .TP 10  .TP 10
38  \fB-d\fP  \fB-d\fP
39  Behave as if each regex has the \fB/D\fP (debug) modifier; the internal  Behave as if each pattern has the \fB/D\fP (debug) modifier; the internal
40  form and information about the compiled pattern is output after compilation;  form and information about the compiled pattern is output after compilation;
41  \fB-d\fP is equivalent to \fB-b -i\fP.  \fB-d\fP is equivalent to \fB-b -i\fP.
42  .TP 10  .TP 10
# Line 46  standard \fBpcre_exec()\fP function (mor Line 49  standard \fBpcre_exec()\fP function (mor
49  Output a brief summary these options and then exit.  Output a brief summary these options and then exit.
50  .TP 10  .TP 10
51  \fB-i\fP  \fB-i\fP
52  Behave as if each regex has the \fB/I\fP modifier; information about the  Behave as if each pattern has the \fB/I\fP modifier; information about the
53  compiled pattern is given after compilation.  compiled pattern is given after compilation.
54  .TP 10  .TP 10
55  \fB-M\fP  \fB-M\fP
# Line 56  calling \fBpcre_exec()\fP repeatedly wit Line 59  calling \fBpcre_exec()\fP repeatedly wit
59  .TP 10  .TP 10
60  \fB-m\fP  \fB-m\fP
61  Output the size of each compiled pattern after it has been compiled. This is  Output the size of each compiled pattern after it has been compiled. This is
62  equivalent to adding \fB/M\fP to each regular expression. For compatibility  equivalent to adding \fB/M\fP to each regular expression.
 with earlier versions of pcretest, \fB-s\fP is a synonym for \fB-m\fP.  
63  .TP 10  .TP 10
64  \fB-o\fP \fIosize\fP  \fB-o\fP \fIosize\fP
65  Set the number of elements in the output vector that is used when calling  Set the number of elements in the output vector that is used when calling
# Line 68  changed for individual matching calls by Line 70  changed for individual matching calls by
70  below).  below).
71  .TP 10  .TP 10
72  \fB-p\fP  \fB-p\fP
73  Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is  Behave as if each pattern has the \fB/P\fP modifier; the POSIX wrapper API is
74  used to call PCRE. None of the other options has any effect when \fB-p\fP is  used to call PCRE. None of the other options has any effect when \fB-p\fP is
75  set.  set.
76  .TP 10  .TP 10
# Line 76  set. Line 78  set.
78  Do not output the version number of \fBpcretest\fP at the start of execution.  Do not output the version number of \fBpcretest\fP at the start of execution.
79  .TP 10  .TP 10
80  \fB-S\fP \fIsize\fP  \fB-S\fP \fIsize\fP
81  On Unix-like systems, set the size of the runtime stack to \fIsize\fP  On Unix-like systems, set the size of the run-time stack to \fIsize\fP
82  megabytes.  megabytes.
83  .TP 10  .TP 10
84    \fB-s\fP
85    Behave as if each pattern has the \fB/S\fP modifier; in other words, force each
86    pattern to be studied. If the \fB/I\fP or \fB/D\fP option is present on a
87    pattern (requesting output about the compiled pattern), information about the
88    result of studying is not included when studying is caused only by \fB-s\fP and
89    neither \fB-i\fP nor \fB-d\fP is present on the command line. This behaviour
90    means that the output from tests that are run with and without \fB-s\fP should
91    be identical, except when options that output information about the actual
92    running of a match are set. The \fB-M\fP, \fB-t\fP, and \fB-tm\fP options,
93    which give information about resources used, are likely to produce different
94    output with and without \fB-s\fP. Output may also differ if the \fB/C\fP option
95    is present on an individual pattern. This uses callouts to trace the the
96    matching process, and this may be different between studied and non-studied
97    patterns. If the pattern contains (*MARK) items there may also be differences,
98    for the same reason. The \fB-s\fP command line option can be overridden for
99    specific patterns that should never be studied (see the /S option below).
100    .TP 10
101  \fB-t\fP  \fB-t\fP
102  Run each compile, study, and match many times with a timer, and output  Run each compile, study, and match many times with a timer, and output
103  resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with  resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with
# Line 154  pcretest to read the next line as a cont Line 173  pcretest to read the next line as a cont
173  A pattern may be followed by any number of modifiers, which are mostly single  A pattern may be followed by any number of modifiers, which are mostly single
174  characters. Following Perl usage, these are referred to below as, for example,  characters. Following Perl usage, these are referred to below as, for example,
175  "the \fB/i\fP modifier", even though the delimiter of the pattern need not  "the \fB/i\fP modifier", even though the delimiter of the pattern need not
176  always be a slash, and no slash is used when writing modifiers. Whitespace may  always be a slash, and no slash is used when writing modifiers. White space may
177  appear between the final pattern delimiter and the first modifier, and between  appear between the final pattern delimiter and the first modifier, and between
178  the modifiers themselves.  the modifiers themselves.
179  .P  .P
# Line 179  options that do not correspond to anythi Line 198  options that do not correspond to anythi
198    \fB/U\fP              PCRE_UNGREEDY    \fB/U\fP              PCRE_UNGREEDY
199    \fB/W\fP              PCRE_UCP    \fB/W\fP              PCRE_UCP
200    \fB/X\fP              PCRE_EXTRA    \fB/X\fP              PCRE_EXTRA
201      \fB/Y\fP              PCRE_NO_START_OPTIMIZE
202    \fB/<JS>\fP           PCRE_JAVASCRIPT_COMPAT    \fB/<JS>\fP           PCRE_JAVASCRIPT_COMPAT
203    \fB/<cr>\fP           PCRE_NEWLINE_CR    \fB/<cr>\fP           PCRE_NEWLINE_CR
204    \fB/<lf>\fP           PCRE_NEWLINE_LF    \fB/<lf>\fP           PCRE_NEWLINE_LF
# Line 189  options that do not correspond to anythi Line 209  options that do not correspond to anythi
209    \fB/<bsr_unicode>\fP  PCRE_BSR_UNICODE    \fB/<bsr_unicode>\fP  PCRE_BSR_UNICODE
210  .sp  .sp
211  The modifiers that are enclosed in angle brackets are literal strings as shown,  The modifiers that are enclosed in angle brackets are literal strings as shown,
212  including the angle brackets, but the letters can be in either case. This  including the angle brackets, but the letters within can be in either case.
213  example sets multiline matching with CRLF as the line ending sequence:  This example sets multiline matching with CRLF as the line ending sequence:
214  .sp  .sp
215    /^abc/m<crlf>    /^abc/m<CRLF>
216  .sp  .sp
217  As well as turning on the PCRE_UTF8 option, the \fB/8\fP modifier also causes  As well as turning on the PCRE_UTF8 option, the \fB/8\fP modifier also causes
218  any non-printing characters in output strings to be printed using the  any non-printing characters in output strings to be printed using the
# Line 219  begins with a lookbehind assertion (incl Line 239  begins with a lookbehind assertion (incl
239  If any call to \fBpcre_exec()\fP in a \fB/g\fP or \fB/G\fP sequence matches an  If any call to \fBpcre_exec()\fP in a \fB/g\fP or \fB/G\fP sequence matches an
240  empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and  empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and
241  PCRE_ANCHORED flags set in order to search for another, non-empty, match at the  PCRE_ANCHORED flags set in order to search for another, non-empty, match at the
242  same point. If this second match fails, the start offset is advanced by one  same point. If this second match fails, the start offset is advanced, and the
243  character, and the normal match is retried. This imitates the way Perl handles  normal match is retried. This imitates the way Perl handles such cases when
244  such cases when using the \fB/g\fP modifier or the \fBsplit()\fP function.  using the \fB/g\fP modifier or the \fBsplit()\fP function. Normally, the start
245    offset is advanced by one character, but if the newline convention recognizes
246    CRLF as a newline, and the current character is CR followed by LF, an advance
247    of two is used.
248  .  .
249  .  .
250  .SS "Other modifiers"  .SS "Other modifiers"
# Line 231  There are yet more modifiers for control Line 254  There are yet more modifiers for control
254  operates.  operates.
255  .P  .P
256  The \fB/+\fP modifier requests that as well as outputting the substring that  The \fB/+\fP modifier requests that as well as outputting the substring that
257  matched the entire pattern, pcretest should in addition output the remainder of  matched the entire pattern, \fBpcretest\fP should in addition output the
258  the subject string. This is useful for tests where the subject contains  remainder of the subject string. This is useful for tests where the subject
259  multiple copies of the same substring.  contains multiple copies of the same substring. If the \fB+\fP modifier appears
260    twice, the same action is taken for captured substrings. In each case the
261    remainder is output on the following line with a plus character following the
262    capture number.
263  .P  .P
264  The \fB/B\fP modifier is a debugging feature. It requests that \fBpcretest\fP  The \fB/B\fP modifier is a debugging feature. It requests that \fBpcretest\fP
265  output a representation of the compiled byte code after compilation. Normally  output a representation of the compiled byte code after compilation. Normally
# Line 283  which it appears. Line 309  which it appears.
309  The \fB/M\fP modifier causes the size of memory block used to hold the compiled  The \fB/M\fP modifier causes the size of memory block used to hold the compiled
310  pattern to be output.  pattern to be output.
311  .P  .P
312  The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the  If the \fB/S\fP modifier appears once, it causes \fBpcre_study()\fP to be
313  expression has been compiled, and the results used when the expression is  called after the expression has been compiled, and the results used when the
314  matched.  expression is matched. If \fB/S\fP appears twice, it suppresses studying, even
315    if it was requested externally by the \fB-s\fP command line option. This makes
316    it possible to specify that certain patterns are always studied, and others are
317    never studied, independently of \fB-s\fP. This feature is used in the test
318    files in a few cases where the output is different when the pattern is studied.
319  .P  .P
320  The \fB/T\fP modifier must be followed by a single digit. It causes a specific  The \fB/T\fP modifier must be followed by a single digit. It causes a specific
321  set of built-in character tables to be passed to \fBpcre_compile()\fP. It is  set of built-in character tables to be passed to \fBpcre_compile()\fP. It is
# Line 323  ignored. Line 353  ignored.
353  .rs  .rs
354  .sp  .sp
355  Before each data line is passed to \fBpcre_exec()\fP, leading and trailing  Before each data line is passed to \fBpcre_exec()\fP, leading and trailing
356  whitespace is removed, and it is then scanned for \e escapes. Some of these are  white space is removed, and it is then scanned for \e escapes. Some of these
357  pretty esoteric features, intended for checking out some of the more  are pretty esoteric features, intended for checking out some of the more
358  complicated features of PCRE. If you are just testing "ordinary" regular  complicated features of PCRE. If you are just testing "ordinary" regular
359  expressions, you probably don't need any of these. The following escapes are  expressions, you probably don't need any of these. The following escapes are
360  recognized:  recognized:
# Line 332  recognized: Line 362  recognized:
362    \ea         alarm (BEL, \ex07)    \ea         alarm (BEL, \ex07)
363    \eb         backspace (\ex08)    \eb         backspace (\ex08)
364    \ee         escape (\ex27)    \ee         escape (\ex27)
365    \ef         formfeed (\ex0c)    \ef         form feed (\ex0c)
366    \en         newline (\ex0a)    \en         newline (\ex0a)
367  .\" JOIN  .\" JOIN
368    \eqdd       set the PCRE_MATCH_LIMIT limit to dd    \eqdd       set the PCRE_MATCH_LIMIT limit to dd
# Line 341  recognized: Line 371  recognized:
371    \et         tab (\ex09)    \et         tab (\ex09)
372    \ev         vertical tab (\ex0b)    \ev         vertical tab (\ex0b)
373    \ennn       octal character (up to 3 octal digits)    \ennn       octal character (up to 3 octal digits)
374    \exhh       hexadecimal character (up to 2 hex digits)                 always a byte unless > 255 in UTF-8 mode
375      \exhh       hexadecimal byte (up to 2 hex digits)
376  .\" JOIN  .\" JOIN
377    \ex{hh...}  hexadecimal character, any number of digits    \ex{hh...}  hexadecimal character, any number of digits
378                 in UTF-8 mode                 in UTF-8 mode
# Line 411  recognized: Line 442  recognized:
442  .\" JOIN  .\" JOIN
443    \e?         pass the PCRE_NO_UTF8_CHECK option to    \e?         pass the PCRE_NO_UTF8_CHECK option to
444                 \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP                 \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
   \e>dd       start the match at offset dd (any number of digits);  
445  .\" JOIN  .\" JOIN
446                 this sets the \fIstartoffset\fP argument for \fBpcre_exec()\fP    \e>dd       start the match at offset dd (optional "-"; then
447                 or \fBpcre_dfa_exec()\fP                 any number of digits); this sets the \fIstartoffset\fP
448                   argument for \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
449  .\" JOIN  .\" JOIN
450    \e<cr>      pass the PCRE_NEWLINE_CR option to \fBpcre_exec()\fP    \e<cr>      pass the PCRE_NEWLINE_CR option to \fBpcre_exec()\fP
451                 or \fBpcre_dfa_exec()\fP                 or \fBpcre_dfa_exec()\fP
# Line 431  recognized: Line 462  recognized:
462    \e<any>     pass the PCRE_NEWLINE_ANY option to \fBpcre_exec()\fP    \e<any>     pass the PCRE_NEWLINE_ANY option to \fBpcre_exec()\fP
463                 or \fBpcre_dfa_exec()\fP                 or \fBpcre_dfa_exec()\fP
464  .sp  .sp
465    Note that \exhh always specifies one byte, even in UTF-8 mode; this makes it
466    possible to construct invalid UTF-8 sequences for testing purposes. On the
467    other hand, \ex{hh} is interpreted as a UTF-8 character in UTF-8 mode,
468    generating more than one byte if the value is greater than 127. When not in
469    UTF-8 mode, it generates one byte for values less than 256, and causes an error
470    for greater values.
471    .P
472  The escapes that specify line ending sequences are literal strings, exactly as  The escapes that specify line ending sequences are literal strings, exactly as
473  shown. No more than one newline setting should be present in any data line.  shown. No more than one newline setting should be present in any data line.
474  .P  .P
# Line 495  found. This is always the shortest possi Line 533  found. This is always the shortest possi
533  This section describes the output when the normal matching function,  This section describes the output when the normal matching function,
534  \fBpcre_exec()\fP, is being used.  \fBpcre_exec()\fP, is being used.
535  .P  .P
536  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, \fBpcretest\fP outputs the list of captured substrings
537  \fBpcre_exec()\fP returns, starting with number 0 for the string that matched  that \fBpcre_exec()\fP returns, starting with number 0 for the string that
538  the whole pattern. Otherwise, it outputs "No match" when the return is  matched the whole pattern. Otherwise, it outputs "No match" when the return is
539  PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching  PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching
540  substring when \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL. (Note that this is  substring when \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL. (Note that this is
541  the entire substring that was inspected during the partial match; it may  the entire substring that was inspected during the partial match; it may
542  include characters before the actual match start if a lookbehind assertion,  include characters before the actual match start if a lookbehind assertion,
543  \eK, \eb, or \eB was involved.) For any other returns, it outputs the PCRE  \eK, \eb, or \eB was involved.) For any other return, \fBpcretest\fP outputs
544  negative error number. Here is an example of an interactive \fBpcretest\fP run.  the PCRE negative error number and a short descriptive phrase. If the error is
545    a failed UTF-8 string check, the byte offset of the start of the failing
546    character and the reason code are also output, provided that the size of the
547    output vector is at least two. Here is an example of an interactive
548    \fBpcretest\fP run.
549  .sp  .sp
550    $ pcretest    $ pcretest
551    PCRE version 7.0 30-Nov-2006    PCRE version 8.13 2011-04-30
552  .sp  .sp
553      re> /^abc(\ed+)/      re> /^abc(\ed+)/
554    data> abc123    data> abc123
# Line 515  negative error number. Here is an exampl Line 557  negative error number. Here is an exampl
557    data> xyz    data> xyz
558    No match    No match
559  .sp  .sp
560  Note that unset capturing substrings that are not followed by one that is set  Unset capturing substrings that are not followed by one that is set are not
561  are not returned by \fBpcre_exec()\fP, and are not shown by \fBpcretest\fP. In  returned by \fBpcre_exec()\fP, and are not shown by \fBpcretest\fP. In the
562  the following example, there are two capturing substrings, but when the first  following example, there are two capturing substrings, but when the first data
563  data line is matched, the second, unset substring is not shown. An "internal"  line is matched, the second, unset substring is not shown. An "internal" unset
564  unset substring is shown as "<unset>", as for the second data line.  substring is shown as "<unset>", as for the second data line.
565  .sp  .sp
566      re> /(a)|(b)/      re> /(a)|(b)/
567    data> a    data> a
# Line 553  matching attempts are output in sequence Line 595  matching attempts are output in sequence
595     0: ipp     0: ipp
596     1: pp     1: pp
597  .sp  .sp
598  "No match" is output only if the first match attempt fails.  "No match" is output only if the first match attempt fails. Here is an example
599    of a failure message (the offset 4 that is specified by \e>4 is past the end of
600    the subject string):
601    .sp
602        re> /xyz/
603      data> xyz\>4
604      Error -24 (bad offset value)
605  .P  .P
606  If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a  If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a
607  data line that is successfully matched, the substrings extracted by the  data line that is successfully matched, the substrings extracted by the
# Line 690  function to distinguish printing and non Line 738  function to distinguish printing and non
738  .rs  .rs
739  .sp  .sp
740  The facilities described in this section are not available when the POSIX  The facilities described in this section are not available when the POSIX
741  inteface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is  interface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is
742  specified.  specified.
743  .P  .P
744  When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a  When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a
# Line 714  exact copy of the compiled pattern. If t Line 762  exact copy of the compiled pattern. If t
762  follows immediately after the compiled pattern. After writing the file,  follows immediately after the compiled pattern. After writing the file,
763  \fBpcretest\fP expects to read a new pattern.  \fBpcretest\fP expects to read a new pattern.
764  .P  .P
765  A saved pattern can be reloaded into \fBpcretest\fP by specifing < and a file  A saved pattern can be reloaded into \fBpcretest\fP by specifying < and a file
766  name instead of a pattern. The name of the file must not contain a < character,  name instead of a pattern. The name of the file must not contain a < character,
767  as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <  as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <
768  characters.  characters.
769  For example:  For example:
770  .sp  .sp
771     re> </some/file     re> </some/file
772    Compiled regex loaded from /some/file    Compiled pattern loaded from /some/file
773    No study data    No study data
774  .sp  .sp
775  When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in  When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in
# Line 767  Cambridge CB2 3QH, England. Line 815  Cambridge CB2 3QH, England.
815  .rs  .rs
816  .sp  .sp
817  .nf  .nf
818  Last updated: 22 October 2010  Last updated: 11 July 2011
819  Copyright (c) 1997-2010 University of Cambridge.  Copyright (c) 1997-2011 University of Cambridge.
820  .fi  .fi

Legend:
Removed from v.553  
changed lines
  Added in v.616

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12