/[pcre]/code/trunk/doc/pcretest.1
ViewVC logotype

Diff of /code/trunk/doc/pcretest.1

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 53 by nigel, Sat Feb 24 21:39:42 2007 UTC revision 63 by nigel, Sat Feb 24 21:40:03 2007 UTC
# Line 6  pcretest - a program for testing Perl-co Line 6  pcretest - a program for testing Perl-co
6    
7  \fBpcretest\fR was written as a test program for the PCRE regular expression  \fBpcretest\fR was written as a test program for the PCRE regular expression
8  library itself, but it can also be used for experimenting with regular  library itself, but it can also be used for experimenting with regular
9  expressions. This man page describes the features of the test program; for  expressions. This document describes the features of the test program; for
10  details of the regular expressions themselves, see the \fBpcre\fR man page.  details of the regular expressions themselves, see the
11    .\" HREF
12    \fBpcrepattern\fR
13    .\"
14    documentation. For details of PCRE and its options, see the
15    .\" HREF
16    \fBpcreapi\fR
17    .\"
18    documentation.
19    
20  .SH OPTIONS  .SH OPTIONS
21    .rs
22    .sp
23    .TP 10
24    \fB-C\fR
25    Output the version number of the PCRE library, and all available information
26    about the optional features that are included, and then exit.
27  .TP 10  .TP 10
28  \fB-d\fR  \fB-d\fR
29  Behave as if each regex had the \fB/D\fR modifier (see below); the internal  Behave as if each regex had the \fB/D\fR modifier (see below); the internal
# Line 35  Behave as if each regex has \fB/P\fR mod Line 49  Behave as if each regex has \fB/P\fR mod
49  to call PCRE. None of the other options has any effect when \fB-p\fR is set.  to call PCRE. None of the other options has any effect when \fB-p\fR is set.
50  .TP 10  .TP 10
51  \fB-t\fR  \fB-t\fR
52  Run each compile, study, and match 20000 times with a timer, and output  Run each compile, study, and match many times with a timer, and output
53  resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with  resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with
54  \fB-m\fR, because you will then get the size output 20000 times and the timing  \fB-m\fR, because you will then get the size output 20000 times and the timing
55  will be distorted.  will be distorted.
56    
   
57  .SH DESCRIPTION  .SH DESCRIPTION
58    .rs
59    .sp
60  If \fBpcretest\fR is given two filename arguments, it reads from the first and  If \fBpcretest\fR is given two filename arguments, it reads from the first and
61  writes to the second. If it is given only one filename argument, it reads from  writes to the second. If it is given only one filename argument, it reads from
62  that file and writes to stdout. Otherwise, it reads from stdin and writes to  that file and writes to stdout. Otherwise, it reads from stdin and writes to
# Line 51  expressions, and "data>" to prompt for d Line 65  expressions, and "data>" to prompt for d
65    
66  The program handles any number of sets of input on a single input file. Each  The program handles any number of sets of input on a single input file. Each
67  set starts with a regular expression, and continues with any number of data  set starts with a regular expression, and continues with any number of data
68  lines to be matched against the pattern. An empty line signals the end of the  lines to be matched against the pattern.
69  data lines, at which point a new regular expression is read. The regular  
70  expressions are given enclosed in any non-alphameric delimiters other than  Each line is matched separately and independently. If you want to do
71  backslash, for example  multiple-line matches, you have to use the \\n escape sequence in a single line
72    of input to encode the newline characters. The maximum length of data line is
73    30,000 characters.
74    
75    An empty line signals the end of the data lines, at which point a new regular
76    expression is read. The regular expressions are given enclosed in any
77    non-alphameric delimiters other than backslash, for example
78    
79    /(a|bc)x+yz/    /(a|bc)x+yz/
80    
# Line 81  backslash, because Line 101  backslash, because
101  is interpreted as the first line of a pattern that starts with "abc/", causing  is interpreted as the first line of a pattern that starts with "abc/", causing
102  pcretest to read the next line as a continuation of the regular expression.  pcretest to read the next line as a continuation of the regular expression.
103    
   
104  .SH PATTERN MODIFIERS  .SH PATTERN MODIFIERS
105    .rs
106    .sp
107  The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the  The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the
108  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
109  respectively. For example:  respectively. For example:
# Line 138  studied, the results of that are also ou Line 158  studied, the results of that are also ou
158    
159  The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.  The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.
160  It causes the internal form of compiled regular expressions to be output after  It causes the internal form of compiled regular expressions to be output after
161  compilation.  compilation. If the pattern was studied, the information returned is also
162    output.
163    
164  The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the  The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the
165  expression has been compiled, and the results used when the expression is  expression has been compiled, and the results used when the expression is
# Line 154  present, and REG_NEWLINE is set if \fB/m Line 175  present, and REG_NEWLINE is set if \fB/m
175  force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.  force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
176    
177  The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8  The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8
178  option set. This turns on the (currently incomplete) support for UTF-8  option set. This turns on support for UTF-8 character handling in PCRE,
179  character handling in PCRE, provided that it was compiled with this support  provided that it was compiled with this support enabled. This modifier also
180  enabled. This modifier also causes any non-printing characters in output  causes any non-printing characters in output strings to be printed using the
181  strings to be printed using the \\x{hh...} notation if they are valid UTF-8  \\x{hh...} notation if they are valid UTF-8 sequences.
182  sequences.  
183    .SH CALLOUTS
184    .rs
185    .sp
186    If the pattern contains any callout requests, \fBpcretest\fR's callout function
187    will be called. By default, it displays the callout number, and the start and
188    current positions in the text at the callout time. For example, the output
189    
190      --->pqrabcdef
191        0    ^  ^
192    
193    indicates that callout number 0 occurred for a match attempt starting at the
194    fourth character of the subject string, when the pointer was at the seventh
195    character. The callout function returns zero (carry on matching) by default.
196    
197    Inserting callouts may be helpful when using \fBpcretest\fR to check
198    complicated regular expressions. For further information about callouts, see
199    the
200    .\" HREF
201    \fBpcrecallout\fR
202    .\"
203    documentation.
204    
205    For testing the PCRE library, additional control of callout behaviour is
206    available via escape sequences in the data, as described in the following
207    section. In particular, it is possible to pass in a number as callout data (the
208    default is zero). If the callout function receives a non-zero number, it
209    returns that value instead of zero.
210    
211  .SH DATA LINES  .SH DATA LINES
212    .rs
213    .sp
214  Before each data line is passed to \fBpcre_exec()\fR, leading and trailing  Before each data line is passed to \fBpcre_exec()\fR, leading and trailing
215  whitespace is removed, and it is then scanned for \\ escapes. The following are  whitespace is removed, and it is then scanned for \\ escapes. Some of these are
216    pretty esoteric features, intended for checking out some of the more
217    complicated features of PCRE. If you are just testing "ordinary" regular
218    expressions, you probably don't need any of these. The following escapes are
219  recognized:  recognized:
220    
221    \\a         alarm (= BEL)    \\a         alarm (= BEL)
# Line 177  recognized: Line 228  recognized:
228    \\v         vertical tab    \\v         vertical tab
229    \\nnn       octal character (up to 3 octal digits)    \\nnn       octal character (up to 3 octal digits)
230    \\xhh       hexadecimal character (up to 2 hex digits)    \\xhh       hexadecimal character (up to 2 hex digits)
231    \\x{hh...}  hexadecimal UTF-8 character    \\x{hh...}  hexadecimal character, any number of digits
232                   in UTF-8 mode
233    \\A         pass the PCRE_ANCHORED option to \fBpcre_exec()\fR    \\A         pass the PCRE_ANCHORED option to \fBpcre_exec()\fR
234    \\B         pass the PCRE_NOTBOL option to \fBpcre_exec()\fR    \\B         pass the PCRE_NOTBOL option to \fBpcre_exec()\fR
235    \\Cdd       call pcre_copy_substring() for substring dd    \\Cdd       call pcre_copy_substring() for substring dd
236                  after a successful match (any decimal number                 after a successful match (any decimal number
237                  less than 32)                 less than 32)
238      \\Cname     call pcre_copy_named_substring() for substring
239                   "name" after a successful match (name termin-
240                   ated by next non alphanumeric character)
241      \\C+        show the current captured substrings at callout
242                   time
243      \\C-        do not supply a callout function
244      \\C!n       return 1 instead of 0 when callout number n is
245                   reached
246      \\C!n!m     return 1 instead of 0 when callout number n is
247                   reached for the nth time
248      \\C*n       pass the number n (may be negative) as callout
249                   data
250    \\Gdd       call pcre_get_substring() for substring dd    \\Gdd       call pcre_get_substring() for substring dd
251                  after a successful match (any decimal number                 after a successful match (any decimal number
252                  less than 32)                 less than 32)
253      \\Gname     call pcre_get_named_substring() for substring
254                   "name" after a successful match (name termin-
255                   ated by next non-alphanumeric character)
256    \\L         call pcre_get_substringlist() after a    \\L         call pcre_get_substringlist() after a
257                  successful match                 successful match
258      \\M         discover the minimum MATCH_LIMIT setting
259    \\N         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR    \\N         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR
260    \\Odd       set the size of the output vector passed to    \\Odd       set the size of the output vector passed to
261                  \fBpcre_exec()\fR to dd (any number of decimal                 \fBpcre_exec()\fR to dd (any number of decimal
262                  digits)                 digits)
263    \\Z         pass the PCRE_NOTEOL option to \fBpcre_exec()\fR    \\Z         pass the PCRE_NOTEOL option to \fBpcre_exec()\fR
264    
265    If \\M is present, \fBpcretest\fR calls \fBpcre_exec()\fR several times, with
266    different values in the \fImatch_limit\fR field of the \fBpcre_extra\fR data
267    structure, until it finds the minimum number that is needed for
268    \fBpcre_exec()\fR to complete. This number is a measure of the amount of
269    recursion and backtracking that takes place, and checking it out can be
270    instructive. For most simple matches, the number is quite small, but for
271    patterns with very large numbers of matching possibilities, it can become large
272    very quickly with increasing length of subject string.
273    
274  When \\O is used, it may be higher or lower than the size set by the \fB-O\fR  When \\O is used, it may be higher or lower than the size set by the \fB-O\fR
275  option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR  option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR
276  for the line in which it appears.  for the line in which it appears.
# Line 212  of the \fB/8\fR modifier on the pattern. Line 288  of the \fB/8\fR modifier on the pattern.
288  any number of hexadecimal digits inside the braces. The result is from one to  any number of hexadecimal digits inside the braces. The result is from one to
289  six bytes, encoded according to the UTF-8 rules.  six bytes, encoded according to the UTF-8 rules.
290    
   
291  .SH OUTPUT FROM PCRETEST  .SH OUTPUT FROM PCRETEST
292    .rs
293    .sp
294  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
295  \fBpcre_exec()\fR returns, starting with number 0 for the string that matched  \fBpcre_exec()\fR returns, starting with number 0 for the string that matched
296  the whole pattern. Here is an example of an interactive pcretest run.  the whole pattern. Here is an example of an interactive pcretest run.
297    
298    $ pcretest    $ pcretest
299    PCRE version 2.06 08-Jun-1999    PCRE version 4.00 08-Jan-2003
300    
301      re> /^abc(\\d+)/      re> /^abc(\\d+)/
302    data> abc123    data> abc123
# Line 265  Note that while patterns can be continue Line 341  Note that while patterns can be continue
341  prompt is used for continuations), data lines may not. However newlines can be  prompt is used for continuations), data lines may not. However newlines can be
342  included in data by means of the \\n escape.  included in data by means of the \\n escape.
343    
   
344  .SH AUTHOR  .SH AUTHOR
345    .rs
346    .sp
347  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel <ph10@cam.ac.uk>
348  .br  .br
349  University Computing Service,  University Computing Service,
350  .br  .br
 New Museums Site,  
 .br  
351  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
 .br  
 Phone: +44 1223 334714  
352    
353  Last updated: 15 August 2001  .in 0
354    Last updated: 03 February 2003
355  .br  .br
356  Copyright (c) 1997-2001 University of Cambridge.  Copyright (c) 1997-2003 University of Cambridge.

Legend:
Removed from v.53  
changed lines
  Added in v.63

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12