/[pcre]/code/trunk/doc/pcretest.1
ViewVC logotype

Diff of /code/trunk/doc/pcretest.1

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 74 by nigel, Sat Feb 24 21:40:30 2007 UTC revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC
# Line 2  Line 2 
2  .SH NAME  .SH NAME
3  pcretest - a program for testing Perl-compatible regular expressions.  pcretest - a program for testing Perl-compatible regular expressions.
4  .SH SYNOPSIS  .SH SYNOPSIS
5  .B pcretest "[-d] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]"  .rs
6    .sp
7  \fBpcretest\fR was written as a test program for the PCRE regular expression  .B pcretest "[-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]"
8    .ti +5n
9    .B "[destination]"
10    .P
11    \fBpcretest\fP was written as a test program for the PCRE regular expression
12  library itself, but it can also be used for experimenting with regular  library itself, but it can also be used for experimenting with regular
13  expressions. This document describes the features of the test program; for  expressions. This document describes the features of the test program; for
14  details of the regular expressions themselves, see the  details of the regular expressions themselves, see the
15  .\" HREF  .\" HREF
16  \fBpcrepattern\fR  \fBpcrepattern\fP
17  .\"  .\"
18  documentation. For details of PCRE and its options, see the  documentation. For details of the PCRE library function calls and their
19    options, see the
20  .\" HREF  .\" HREF
21  \fBpcreapi\fR  \fBpcreapi\fP
22  .\"  .\"
23  documentation.  documentation.
24    .
25    .
26  .SH OPTIONS  .SH OPTIONS
27  .rs  .rs
 .sp  
28  .TP 10  .TP 10
29  \fB-C\fR  \fB-C\fP
30  Output the version number of the PCRE library, and all available information  Output the version number of the PCRE library, and all available information
31  about the optional features that are included, and then exit.  about the optional features that are included, and then exit.
32  .TP 10  .TP 10
33  \fB-d\fR  \fB-d\fP
34  Behave as if each regex had the \fB/D\fR modifier (see below); the internal  Behave as if each regex had the \fB/D\fP (debug) modifier; the internal
35  form is output after compilation.  form is output after compilation.
36  .TP 10  .TP 10
37  \fB-i\fR  \fB-i\fP
38  Behave as if each regex had the \fB/I\fR modifier; information about the  Behave as if each regex had the \fB/I\fP modifier; information about the
39  compiled pattern is given after compilation.  compiled pattern is given after compilation.
40  .TP 10  .TP 10
41  \fB-m\fR  \fB-m\fP
42  Output the size of each compiled pattern after it has been compiled. This is  Output the size of each compiled pattern after it has been compiled. This is
43  equivalent to adding /M to each regular expression. For compatibility with  equivalent to adding \fB/M\fP to each regular expression. For compatibility
44  earlier versions of pcretest, \fB-s\fR is a synonym for \fB-m\fR.  with earlier versions of pcretest, \fB-s\fP is a synonym for \fB-m\fP.
45  .TP 10  .TP 10
46  \fB-o\fR \fIosize\fR  \fB-o\fP \fIosize\fP
47  Set the number of elements in the output vector that is used when calling PCRE  Set the number of elements in the output vector that is used when calling
48  to be \fIosize\fR. The default value is 45, which is enough for 14 capturing  \fBpcre_exec()\fP to be \fIosize\fP. The default value is 45, which is enough
49  subexpressions. The vector size can be changed for individual matching calls by  for 14 capturing subexpressions. The vector size can be changed for individual
50  including \\O in the data line (see below).  matching calls by including \eO in the data line (see below).
51  .TP 10  .TP 10
52  \fB-p\fR  \fB-p\fP
53  Behave as if each regex has \fB/P\fR modifier; the POSIX wrapper API is used  Behave as if each regex has \fB/P\fP modifier; the POSIX wrapper API is used
54  to call PCRE. None of the other options has any effect when \fB-p\fR is set.  to call PCRE. None of the other options has any effect when \fB-p\fP is set.
55  .TP 10  .TP 10
56  \fB-t\fR  \fB-t\fP
57  Run each compile, study, and match many times with a timer, and output  Run each compile, study, and match many times with a timer, and output
58  resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with  resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with
59  \fB-m\fR, because you will then get the size output 20000 times and the timing  \fB-t\fP, because you will then get the size output a zillion times, and the
60  will be distorted.  timing will be distorted.
61    .
62    .
63  .SH DESCRIPTION  .SH DESCRIPTION
64  .rs  .rs
65  .sp  .sp
66  If \fBpcretest\fR is given two filename arguments, it reads from the first and  If \fBpcretest\fP is given two filename arguments, it reads from the first and
67  writes to the second. If it is given only one filename argument, it reads from  writes to the second. If it is given only one filename argument, it reads from
68  that file and writes to stdout. Otherwise, it reads from stdin and writes to  that file and writes to stdout. Otherwise, it reads from stdin and writes to
69  stdout, and prompts for each line of input, using "re>" to prompt for regular  stdout, and prompts for each line of input, using "re>" to prompt for regular
70  expressions, and "data>" to prompt for data lines.  expressions, and "data>" to prompt for data lines.
71    .P
72  The program handles any number of sets of input on a single input file. Each  The program handles any number of sets of input on a single input file. Each
73  set starts with a regular expression, and continues with any number of data  set starts with a regular expression, and continues with any number of data
74  lines to be matched against the pattern.  lines to be matched against the pattern.
75    .P
76  Each line is matched separately and independently. If you want to do  Each data line is matched separately and independently. If you want to do
77  multiple-line matches, you have to use the \\n escape sequence in a single line  multiple-line matches, you have to use the \en escape sequence in a single line
78  of input to encode the newline characters. The maximum length of data line is  of input to encode the newline characters. The maximum length of data line is
79  30,000 characters.  30,000 characters.
80    .P
81  An empty line signals the end of the data lines, at which point a new regular  An empty line signals the end of the data lines, at which point a new regular
82  expression is read. The regular expressions are given enclosed in any  expression is read. The regular expressions are given enclosed in any
83  non-alphameric delimiters other than backslash, for example  non-alphanumeric delimiters other than backslash, for example
84    .sp
85    /(a|bc)x+yz/    /(a|bc)x+yz/
86    .sp
87  White space before the initial delimiter is ignored. A regular expression may  White space before the initial delimiter is ignored. A regular expression may
88  be continued over several input lines, in which case the newline characters are  be continued over several input lines, in which case the newline characters are
89  included within it. It is possible to include the delimiter within the pattern  included within it. It is possible to include the delimiter within the pattern
90  by escaping it, for example  by escaping it, for example
91    .sp
92    /abc\\/def/    /abc\e/def/
93    .sp
94  If you do so, the escape and the delimiter form part of the pattern, but since  If you do so, the escape and the delimiter form part of the pattern, but since
95  delimiters are always non-alphameric, this does not affect its interpretation.  delimiters are always non-alphanumeric, this does not affect its interpretation.
96  If the terminating delimiter is immediately followed by a backslash, for  If the terminating delimiter is immediately followed by a backslash, for
97  example,  example,
98    .sp
99    /abc/\\    /abc/\e
100    .sp
101  then a backslash is added to the end of the pattern. This is done to provide a  then a backslash is added to the end of the pattern. This is done to provide a
102  way of testing the error condition that arises if a pattern finishes with a  way of testing the error condition that arises if a pattern finishes with a
103  backslash, because  backslash, because
104    .sp
105    /abc\\/    /abc\e/
106    .sp
107  is interpreted as the first line of a pattern that starts with "abc/", causing  is interpreted as the first line of a pattern that starts with "abc/", causing
108  pcretest to read the next line as a continuation of the regular expression.  pcretest to read the next line as a continuation of the regular expression.
109    .
110  .SH PATTERN MODIFIERS  .
111    .SH "PATTERN MODIFIERS"
112  .rs  .rs
113  .sp  .sp
114  The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the  A pattern may be followed by any number of modifiers, which are mostly single
115  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,  characters. Following Perl usage, these are referred to below as, for example,
116  respectively. For example:  "the \fB/i\fP modifier", even though the delimiter of the pattern need not
117    always be a slash, and no slash is used when writing modifiers. Whitespace may
118    appear between the final pattern delimiter and the first modifier, and between
119    the modifiers themselves.
120    .P
121    The \fB/i\fP, \fB/m\fP, \fB/s\fP, and \fB/x\fP modifiers set the PCRE_CASELESS,
122    PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
123    \fBpcre_compile()\fP is called. These four modifier letters have the same
124    effect as they do in Perl. For example:
125    .sp
126    /caseless/i    /caseless/i
127    .sp
128  These modifier letters have the same effect as they do in Perl. There are  The following table shows additional modifiers for setting PCRE options that do
129  others that set PCRE options that do not correspond to anything in Perl:  not correspond to anything in Perl:
130  \fB/A\fR, \fB/E\fR, \fB/N\fR, \fB/U\fR, and \fB/X\fR set PCRE_ANCHORED,  .sp
131  PCRE_DOLLAR_ENDONLY, PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA    \fB/A\fP    PCRE_ANCHORED
132  respectively.    \fB/C\fP    PCRE_AUTO_CALLOUT
133      \fB/E\fP    PCRE_DOLLAR_ENDONLY
134      \fB/N\fP    PCRE_NO_AUTO_CAPTURE
135      \fB/U\fP    PCRE_UNGREEDY
136      \fB/X\fP    PCRE_EXTRA
137    .sp
138  Searching for all possible matches within each subject string can be requested  Searching for all possible matches within each subject string can be requested
139  by the \fB/g\fR or \fB/G\fR modifier. After finding a match, PCRE is called  by the \fB/g\fP or \fB/G\fP modifier. After finding a match, PCRE is called
140  again to search the remainder of the subject string. The difference between  again to search the remainder of the subject string. The difference between
141  \fB/g\fR and \fB/G\fR is that the former uses the \fIstartoffset\fR argument to  \fB/g\fP and \fB/G\fP is that the former uses the \fIstartoffset\fP argument to
142  \fBpcre_exec()\fR to start searching at a new point within the entire string  \fBpcre_exec()\fP to start searching at a new point within the entire string
143  (which is in effect what Perl does), whereas the latter passes over a shortened  (which is in effect what Perl does), whereas the latter passes over a shortened
144  substring. This makes a difference to the matching process if the pattern  substring. This makes a difference to the matching process if the pattern
145  begins with a lookbehind assertion (including \\b or \\B).  begins with a lookbehind assertion (including \eb or \eB).
146    .P
147  If any call to \fBpcre_exec()\fR in a \fB/g\fR or \fB/G\fR sequence matches an  If any call to \fBpcre_exec()\fP in a \fB/g\fP or \fB/G\fP sequence matches an
148  empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED  empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
149  flags set in order to search for another, non-empty, match at the same point.  flags set in order to search for another, non-empty, match at the same point.
150  If this second match fails, the start offset is advanced by one, and the normal  If this second match fails, the start offset is advanced by one, and the normal
151  match is retried. This imitates the way Perl handles such cases when using the  match is retried. This imitates the way Perl handles such cases when using the
152  \fB/g\fR modifier or the \fBsplit()\fR function.  \fB/g\fP modifier or the \fBsplit()\fP function.
153    .P
154  There are a number of other modifiers for controlling the way \fBpcretest\fR  There are yet more modifiers for controlling the way \fBpcretest\fP
155  operates.  operates.
156    .P
157  The \fB/+\fR modifier requests that as well as outputting the substring that  The \fB/+\fP modifier requests that as well as outputting the substring that
158  matched the entire pattern, pcretest should in addition output the remainder of  matched the entire pattern, pcretest should in addition output the remainder of
159  the subject string. This is useful for tests where the subject contains  the subject string. This is useful for tests where the subject contains
160  multiple copies of the same substring.  multiple copies of the same substring.
161    .P
162  The \fB/L\fR modifier must be followed directly by the name of a locale, for  The \fB/L\fP modifier must be followed directly by the name of a locale, for
163  example,  example,
164    .sp
165    /pattern/Lfr    /pattern/Lfr_FR
166    .sp
167  For this reason, it must be the last modifier letter. The given locale is set,  For this reason, it must be the last modifier. The given locale is set,
168  \fBpcre_maketables()\fR is called to build a set of character tables for the  \fBpcre_maketables()\fP is called to build a set of character tables for the
169  locale, and this is then passed to \fBpcre_compile()\fR when compiling the  locale, and this is then passed to \fBpcre_compile()\fP when compiling the
170  regular expression. Without an \fB/L\fR modifier, NULL is passed as the tables  regular expression. Without an \fB/L\fP modifier, NULL is passed as the tables
171  pointer; that is, \fB/L\fR applies only to the expression on which it appears.  pointer; that is, \fB/L\fP applies only to the expression on which it appears.
172    .P
173  The \fB/I\fR modifier requests that \fBpcretest\fR output information about the  The \fB/I\fP modifier requests that \fBpcretest\fP output information about the
174  compiled expression (whether it is anchored, has a fixed first character, and  compiled pattern (whether it is anchored, has a fixed first character, and
175  so on). It does this by calling \fBpcre_fullinfo()\fR after compiling an  so on). It does this by calling \fBpcre_fullinfo()\fP after compiling a
176  expression, and outputting the information it gets back. If the pattern is  pattern. If the pattern is studied, the results of that are also output.
177  studied, the results of that are also output.  .P
178    The \fB/D\fP modifier is a PCRE debugging feature, which also assumes \fB/I\fP.
 The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.  
179  It causes the internal form of compiled regular expressions to be output after  It causes the internal form of compiled regular expressions to be output after
180  compilation. If the pattern was studied, the information returned is also  compilation. If the pattern was studied, the information returned is also
181  output.  output.
182    .P
183  The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the  The \fB/F\fP modifier causes \fBpcretest\fP to flip the byte order of the
184    fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
185    facility is for testing the feature in PCRE that allows it to execute patterns
186    that were compiled on a host with a different endianness. This feature is not
187    available when the POSIX interface to PCRE is being used, that is, when the
188    \fB/P\fP pattern modifier is specified. See also the section about saving and
189    reloading compiled patterns below.
190    .P
191    The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the
192  expression has been compiled, and the results used when the expression is  expression has been compiled, and the results used when the expression is
193  matched.  matched.
194    .P
195  The \fB/M\fR modifier causes the size of memory block used to hold the compiled  The \fB/M\fP modifier causes the size of memory block used to hold the compiled
196  pattern to be output.  pattern to be output.
197    .P
198  The \fB/P\fR modifier causes \fBpcretest\fR to call PCRE via the POSIX wrapper  The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper
199  API rather than its native API. When this is done, all other modifiers except  API rather than its native API. When this is done, all other modifiers except
200  \fB/i\fR, \fB/m\fR, and \fB/+\fR are ignored. REG_ICASE is set if \fB/i\fR is  \fB/i\fP, \fB/m\fP, and \fB/+\fP are ignored. REG_ICASE is set if \fB/i\fP is
201  present, and REG_NEWLINE is set if \fB/m\fR is present. The wrapper functions  present, and REG_NEWLINE is set if \fB/m\fP is present. The wrapper functions
202  force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.  force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
203    .P
204  The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8  The \fB/8\fP modifier causes \fBpcretest\fP to call PCRE with the PCRE_UTF8
205  option set. This turns on support for UTF-8 character handling in PCRE,  option set. This turns on support for UTF-8 character handling in PCRE,
206  provided that it was compiled with this support enabled. This modifier also  provided that it was compiled with this support enabled. This modifier also
207  causes any non-printing characters in output strings to be printed using the  causes any non-printing characters in output strings to be printed using the
208  \\x{hh...} notation if they are valid UTF-8 sequences.  \ex{hh...} notation if they are valid UTF-8 sequences.
209    .P
210  If the \fB/?\fR modifier is used with \fB/8\fR, it causes \fBpcretest\fR to  If the \fB/?\fP modifier is used with \fB/8\fP, it causes \fBpcretest\fP to
211  call \fBpcre_compile()\fR with the PCRE_NO_UTF8_CHECK option, to suppress the  call \fBpcre_compile()\fP with the PCRE_NO_UTF8_CHECK option, to suppress the
212  checking of the string for UTF-8 validity.  checking of the string for UTF-8 validity.
213    .
214  .SH CALLOUTS  .
215  .rs  .SH "DATA LINES"
 .sp  
 If the pattern contains any callout requests, \fBpcretest\fR's callout function  
 will be called. By default, it displays the callout number, and the start and  
 current positions in the text at the callout time. For example, the output  
   
   --->pqrabcdef  
     0    ^  ^  
   
 indicates that callout number 0 occurred for a match attempt starting at the  
 fourth character of the subject string, when the pointer was at the seventh  
 character. The callout function returns zero (carry on matching) by default.  
   
 Inserting callouts may be helpful when using \fBpcretest\fR to check  
 complicated regular expressions. For further information about callouts, see  
 the  
 .\" HREF  
 \fBpcrecallout\fR  
 .\"  
 documentation.  
   
 For testing the PCRE library, additional control of callout behaviour is  
 available via escape sequences in the data, as described in the following  
 section. In particular, it is possible to pass in a number as callout data (the  
 default is zero). If the callout function receives a non-zero number, it  
 returns that value instead of zero.  
   
 .SH DATA LINES  
216  .rs  .rs
217  .sp  .sp
218  Before each data line is passed to \fBpcre_exec()\fR, leading and trailing  Before each data line is passed to \fBpcre_exec()\fP, leading and trailing
219  whitespace is removed, and it is then scanned for \\ escapes. Some of these are  whitespace is removed, and it is then scanned for \e escapes. Some of these are
220  pretty esoteric features, intended for checking out some of the more  pretty esoteric features, intended for checking out some of the more
221  complicated features of PCRE. If you are just testing "ordinary" regular  complicated features of PCRE. If you are just testing "ordinary" regular
222  expressions, you probably don't need any of these. The following escapes are  expressions, you probably don't need any of these. The following escapes are
223  recognized:  recognized:
224    .sp
225    \\a         alarm (= BEL)    \ea         alarm (= BEL)
226    \\b         backspace    \eb         backspace
227    \\e         escape    \ee         escape
228    \\f         formfeed    \ef         formfeed
229    \\n         newline    \en         newline
230    \\r         carriage return    \er         carriage return
231    \\t         tab    \et         tab
232    \\v         vertical tab    \ev         vertical tab
233    \\nnn       octal character (up to 3 octal digits)    \ennn       octal character (up to 3 octal digits)
234    \\xhh       hexadecimal character (up to 2 hex digits)    \exhh       hexadecimal character (up to 2 hex digits)
235    \\x{hh...}  hexadecimal character, any number of digits  .\" JOIN
236      \ex{hh...}  hexadecimal character, any number of digits
237                 in UTF-8 mode                 in UTF-8 mode
238    \\A         pass the PCRE_ANCHORED option to \fBpcre_exec()\fR    \eA         pass the PCRE_ANCHORED option to \fBpcre_exec()\fP
239    \\B         pass the PCRE_NOTBOL option to \fBpcre_exec()\fR    \eB         pass the PCRE_NOTBOL option to \fBpcre_exec()\fP
240    \\Cdd       call pcre_copy_substring() for substring dd  .\" JOIN
241                 after a successful match (any decimal number    \eCdd       call pcre_copy_substring() for substring dd
242                 less than 32)                 after a successful match (number less than 32)
243    \\Cname     call pcre_copy_named_substring() for substring  .\" JOIN
244      \eCname     call pcre_copy_named_substring() for substring
245                 "name" after a successful match (name termin-                 "name" after a successful match (name termin-
246                 ated by next non alphanumeric character)                 ated by next non alphanumeric character)
247    \\C+        show the current captured substrings at callout  .\" JOIN
248      \eC+        show the current captured substrings at callout
249                 time                 time
250    \\C-        do not supply a callout function    \eC-        do not supply a callout function
251    \\C!n       return 1 instead of 0 when callout number n is  .\" JOIN
252      \eC!n       return 1 instead of 0 when callout number n is
253                 reached                 reached
254    \\C!n!m     return 1 instead of 0 when callout number n is  .\" JOIN
255      \eC!n!m     return 1 instead of 0 when callout number n is
256                 reached for the nth time                 reached for the nth time
257    \\C*n       pass the number n (may be negative) as callout  .\" JOIN
258                 data    \eC*n       pass the number n (may be negative) as callout
259    \\Gdd       call pcre_get_substring() for substring dd                 data; this is used as the callout return value
260                 after a successful match (any decimal number  .\" JOIN
261                 less than 32)    \eGdd       call pcre_get_substring() for substring dd
262    \\Gname     call pcre_get_named_substring() for substring                 after a successful match (number less than 32)
263    .\" JOIN
264      \eGname     call pcre_get_named_substring() for substring
265                 "name" after a successful match (name termin-                 "name" after a successful match (name termin-
266                 ated by next non-alphanumeric character)                 ated by next non-alphanumeric character)
267    \\L         call pcre_get_substringlist() after a  .\" JOIN
268      \eL         call pcre_get_substringlist() after a
269                 successful match                 successful match
270    \\M         discover the minimum MATCH_LIMIT setting    \eM         discover the minimum MATCH_LIMIT setting
271    \\N         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR    \eN         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP
272    \\Odd       set the size of the output vector passed to  .\" JOIN
273                 \fBpcre_exec()\fR to dd (any number of decimal    \eOdd       set the size of the output vector passed to
274                 digits)                 \fBpcre_exec()\fP to dd (any number of digits)
275    \\S         output details of memory get/free calls during matching    \eP         pass the PCRE_PARTIAL option to \fBpcre_exec()\fP
276    \\Z         pass the PCRE_NOTEOL option to \fBpcre_exec()\fR    \eS         output details of memory get/free calls during matching
277    \\?         pass the PCRE_NO_UTF8_CHECK option to    \eZ         pass the PCRE_NOTEOL option to \fBpcre_exec()\fP
278                 \fBpcre_exec()\fR  .\" JOIN
279      \e?         pass the PCRE_NO_UTF8_CHECK option to
280  If \\M is present, \fBpcretest\fR calls \fBpcre_exec()\fR several times, with                 \fBpcre_exec()\fP
281  different values in the \fImatch_limit\fR field of the \fBpcre_extra\fR data    \e>dd       start the match at offset dd (any number of digits);
282                   this sets the \fIstartoffset\fP argument for \fBpcre_exec()\fP
283    .sp
284    A backslash followed by anything else just escapes the anything else. If the
285    very last character is a backslash, it is ignored. This gives a way of passing
286    an empty line as data, since a real empty line terminates the data input.
287    .P
288    If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with
289    different values in the \fImatch_limit\fP field of the \fBpcre_extra\fP data
290  structure, until it finds the minimum number that is needed for  structure, until it finds the minimum number that is needed for
291  \fBpcre_exec()\fR to complete. This number is a measure of the amount of  \fBpcre_exec()\fP to complete. This number is a measure of the amount of
292  recursion and backtracking that takes place, and checking it out can be  recursion and backtracking that takes place, and checking it out can be
293  instructive. For most simple matches, the number is quite small, but for  instructive. For most simple matches, the number is quite small, but for
294  patterns with very large numbers of matching possibilities, it can become large  patterns with very large numbers of matching possibilities, it can become large
295  very quickly with increasing length of subject string.  very quickly with increasing length of subject string.
296    .P
297  When \\O is used, it may be higher or lower than the size set by the \fB-O\fR  When \eO is used, the value specified may be higher or lower than the size set
298  option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR  by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to
299  for the line in which it appears.  the call of \fBpcre_exec()\fP for the line in which it appears.
300    .P
301  A backslash followed by anything else just escapes the anything else. If the  If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper
302  very last character is a backslash, it is ignored. This gives a way of passing  API to be used, only \eB and \eZ have any effect, causing REG_NOTBOL and
303  an empty line as data, since a real empty line terminates the data input.  REG_NOTEOL to be passed to \fBregexec()\fP respectively.
304    .P
305  If \fB/P\fR was present on the regex, causing the POSIX wrapper API to be used,  The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use
306  only \fB\B\fR, and \fB\Z\fR have any effect, causing REG_NOTBOL and REG_NOTEOL  of the \fB/8\fP modifier on the pattern. It is recognized always. There may be
 to be passed to \fBregexec()\fR respectively.  
   
 The use of \\x{hh...} to represent UTF-8 characters is not dependent on the use  
 of the \fB/8\fR modifier on the pattern. It is recognized always. There may be  
307  any number of hexadecimal digits inside the braces. The result is from one to  any number of hexadecimal digits inside the braces. The result is from one to
308  six bytes, encoded according to the UTF-8 rules.  six bytes, encoded according to the UTF-8 rules.
309    .
310  .SH OUTPUT FROM PCRETEST  .
311    .SH "OUTPUT FROM PCRETEST"
312  .rs  .rs
313  .sp  .sp
314  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
315  \fBpcre_exec()\fR returns, starting with number 0 for the string that matched  \fBpcre_exec()\fP returns, starting with number 0 for the string that matched
316  the whole pattern. Here is an example of an interactive pcretest run.  the whole pattern. Otherwise, it outputs "No match" or "Partial match"
317    when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
318    respectively, and otherwise the PCRE negative error number. Here is an example
319    of an interactive pcretest run.
320    .sp
321    $ pcretest    $ pcretest
322    PCRE version 4.00 08-Jan-2003    PCRE version 5.00 07-Sep-2004
323    .sp
324      re> /^abc(\\d+)/      re> /^abc(\ed+)/
325    data> abc123    data> abc123
326     0: abc123     0: abc123
327     1: 123     1: 123
328    data> xyz    data> xyz
329    No match    No match
330    .sp
331  If the strings contain any non-printing characters, they are output as \\0x  If the strings contain any non-printing characters, they are output as \e0x
332  escapes, or as \\x{...} escapes if the \fB/8\fR modifier was present on the  escapes, or as \ex{...} escapes if the \fB/8\fP modifier was present on the
333  pattern. If the pattern has the \fB/+\fR modifier, then the output for  pattern. If the pattern has the \fB/+\fP modifier, the output for substring 0
334  substring 0 is followed by the the rest of the subject string, identified by  is followed by the the rest of the subject string, identified by "0+" like
335  "0+" like this:  this:
336    .sp
337      re> /cat/+      re> /cat/+
338    data> cataract    data> cataract
339     0: cat     0: cat
340     0+ aract     0+ aract
341    .sp
342  If the pattern has the \fB/g\fR or \fB/G\fR modifier, the results of successive  If the pattern has the \fB/g\fP or \fB/G\fP modifier, the results of successive
343  matching attempts are output in sequence, like this:  matching attempts are output in sequence, like this:
344    .sp
345      re> /\\Bi(\\w\\w)/g      re> /\eBi(\ew\ew)/g
346    data> Mississippi    data> Mississippi
347     0: iss     0: iss
348     1: ss     1: ss
# Line 335  matching attempts are output in sequence Line 350  matching attempts are output in sequence
350     1: ss     1: ss
351     0: ipp     0: ipp
352     1: pp     1: pp
353    .sp
354  "No match" is output only if the first match attempt fails.  "No match" is output only if the first match attempt fails.
355    .P
356  If any of the sequences \fB\\C\fR, \fB\\G\fR, or \fB\\L\fR are present in a  If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a
357  data line that is successfully matched, the substrings extracted by the  data line that is successfully matched, the substrings extracted by the
358  convenience functions are output with C, G, or L after the string number  convenience functions are output with C, G, or L after the string number
359  instead of a colon. This is in addition to the normal full list. The string  instead of a colon. This is in addition to the normal full list. The string
360  length (that is, the return from the extraction function) is given in  length (that is, the return from the extraction function) is given in
361  parentheses after each string for \fB\\C\fR and \fB\\G\fR.  parentheses after each string for \fB\eC\fP and \fB\eG\fP.
362    .P
363  Note that while patterns can be continued over several lines (a plain ">"  Note that while patterns can be continued over several lines (a plain ">"
364  prompt is used for continuations), data lines may not. However newlines can be  prompt is used for continuations), data lines may not. However newlines can be
365  included in data by means of the \\n escape.  included in data by means of the \en escape.
366    .
367    .
368    .SH CALLOUTS
369    .rs
370    .sp
371    If the pattern contains any callout requests, \fBpcretest\fP's callout function
372    is called during matching. By default, it displays the callout number, the
373    start and current positions in the text at the callout time, and the next
374    pattern item to be tested. For example, the output
375    .sp
376      --->pqrabcdef
377        0    ^  ^     \ed
378    .sp
379    indicates that callout number 0 occurred for a match attempt starting at the
380    fourth character of the subject string, when the pointer was at the seventh
381    character of the data, and when the next pattern item was \ed. Just one
382    circumflex is output if the start and current positions are the same.
383    .P
384    Callouts numbered 255 are assumed to be automatic callouts, inserted as a
385    result of the \fB/C\fP pattern modifier. In this case, instead of showing the
386    callout number, the offset in the pattern, preceded by a plus, is output. For
387    example:
388    .sp
389        re> /\ed?[A-E]\e*/C
390      data> E*
391      --->E*
392       +0 ^      \ed?
393       +3 ^      [A-E]
394       +8 ^^     \e*
395      +10 ^ ^
396       0: E*
397    .sp
398    The callout function in \fBpcretest\fP returns zero (carry on matching) by
399    default, but you can use an \eC item in a data line (as described above) to
400    change this.
401    .P
402    Inserting callouts can be helpful when using \fBpcretest\fP to check
403    complicated regular expressions. For further information about callouts, see
404    the
405    .\" HREF
406    \fBpcrecallout\fP
407    .\"
408    documentation.
409    .
410    .
411    .SH "SAVING AND RELOADING COMPILED PATTERNS"
412    .rs
413    .sp
414    The facilities described in this section are not available when the POSIX
415    inteface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is
416    specified.
417    .P
418    When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a
419    compiled pattern to a file, by following the modifiers with > and a file name.
420    For example:
421    .sp
422      /pattern/im >/some/file
423    .sp
424    See the
425    .\" HREF
426    \fBpcreprecompile\fP
427    .\"
428    documentation for a discussion about saving and re-using compiled patterns.
429    .P
430    The data that is written is binary. The first eight bytes are the length of the
431    compiled pattern data followed by the length of the optional study data, each
432    written as four bytes in big-endian order (most significant byte first). If
433    there is no study data (either the pattern was not studied, or studying did not
434    return any data), the second length is zero. The lengths are followed by an
435    exact copy of the compiled pattern. If there is additional study data, this
436    follows immediately after the compiled pattern. After writing the file,
437    \fBpcretest\fP expects to read a new pattern.
438    .P
439    A saved pattern can be reloaded into \fBpcretest\fP by specifing < and a file
440    name instead of a pattern. The name of the file must not contain a < character,
441    as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <
442    characters.
443    For example:
444    .sp
445       re> </some/file
446      Compiled regex loaded from /some/file
447      No study data
448    .sp
449    When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in
450    the usual way.
451    .P
452    You can copy a file written by \fBpcretest\fP to a different host and reload it
453    there, even if the new host has opposite endianness to the one on which the
454    pattern was compiled. For example, you can compile on an i86 machine and run on
455    a SPARC machine.
456    .P
457    File names for saving and reloading can be absolute or relative, but note that
458    the shell facility of expanding a file name that starts with a tilde (~) is not
459    available.
460    .P
461    The ability to save and reload files in \fBpcretest\fP is intended for testing
462    and experimentation. It is not intended for production use because only a
463    single pattern can be written to a file. Furthermore, there is no facility for
464    supplying custom character tables for use with a reloaded pattern. If the
465    original pattern was compiled with custom tables, an attempt to match a subject
466    string using a reloaded pattern is likely to cause \fBpcretest\fP to crash.
467    Finally, if you attempt to load a file that is not in the correct format, the
468    result is undefined.
469    .
470    .
471  .SH AUTHOR  .SH AUTHOR
472  .rs  .rs
473  .sp  .sp
# Line 357  Philip Hazel Line 476  Philip Hazel
476  University Computing Service,  University Computing Service,
477  .br  .br
478  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
479    .P
480  .in 0  .in 0
481  Last updated: 09 December 2003  Last updated: 10 September 2004
482  .br  .br
483  Copyright (c) 1997-2003 University of Cambridge.  Copyright (c) 1997-2004 University of Cambridge.

Legend:
Removed from v.74  
changed lines
  Added in v.75

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12