/[pcre]/code/trunk/README
ViewVC logotype

Diff of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 25 by nigel, Sat Feb 24 21:38:45 2007 UTC revision 31 by nigel, Sat Feb 24 21:38:57 2007 UTC
# Line 21  README file for PCRE (Perl-compatible re Line 21  README file for PCRE (Perl-compatible re
21  The distribution should contain the following files:  The distribution should contain the following files:
22    
23    ChangeLog         log of changes to the code    ChangeLog         log of changes to the code
24      LICENCE           conditions for the use of PCRE
25    Makefile          for building PCRE    Makefile          for building PCRE
26    README            this file    README            this file
27    RunTest           a shell script for running tests    RunTest           a shell script for running tests
28    Tech.Notes        notes on the encoding    Tech.Notes        notes on the encoding
29    pcre.3            man page for the functions    pcre.3            man page for the functions
30    pcreposix.3       man page for the POSIX wrapper API    pcreposix.3       man page for the POSIX wrapper API
31    deftables.c       auxiliary program for building chartables.c    dftables.c        auxiliary program for building chartables.c
32      get.c             )
33    maketables.c      )    maketables.c      )
34    study.c           ) source of    study.c           ) source of
35    pcre.c            )   the functions    pcre.c            )   the functions
# Line 69  additional features of release 5.005, wh Line 71  additional features of release 5.005, wh
71  main test input, which needs only Perl 5.004. In the long run, when 5.005 is  main test input, which needs only Perl 5.004. In the long run, when 5.005 is
72  widespread, these two test files may get amalgamated.  widespread, these two test files may get amalgamated.
73    
74  The second set of tests check pcre_info(), pcre_study(), error detection and  The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(),
75  run-time flags that are specific to PCRE, as well as the POSIX wrapper API.  pcre_get_substring(), pcre_get_substring_list(), error detection and run-time
76    flags that are specific to PCRE, as well as the POSIX wrapper API.
77    
78  The fourth set of tests checks pcre_maketables(), the facility for building a  The fourth set of tests checks pcre_maketables(), the facility for building a
79  set of character tables for a specific locale and using them instead of the  set of character tables for a specific locale and using them instead of the
# Line 115  is passed as NULL, a set of default tabl Line 118  is passed as NULL, a set of default tabl
118  used.  used.
119    
120  The source file called chartables.c contains the default set of tables. This is  The source file called chartables.c contains the default set of tables. This is
121  not supplied in the distribution, but is built by the program deftables  not supplied in the distribution, but is built by the program dftables
122  (compiled from deftables.c), which uses the ANSI C character handling functions  (compiled from dftables.c), which uses the ANSI C character handling functions
123  such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table  such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
124  sources. This means that the default C locale set your system will control the  sources. This means that the default C locale set your system will control the
125  contents of the tables. You can change the default tables by editing  contents of the tables. You can change the default tables by editing
# Line 157  The program handles any number of sets o Line 160  The program handles any number of sets o
160  set starts with a regular expression, and continues with any number of data  set starts with a regular expression, and continues with any number of data
161  lines to be matched against the pattern. An empty line signals the end of the  lines to be matched against the pattern. An empty line signals the end of the
162  set. The regular expressions are given enclosed in any non-alphameric  set. The regular expressions are given enclosed in any non-alphameric
163  delimiters, for example  delimiters other than backslash, for example
164    
165    /(a|bc)x+yz/    /(a|bc)x+yz/
166    
167  and may be followed by i, m, s, or x to set the PCRE_CASELESS, PCRE_MULTILINE,  White space before the initial delimiter is ignored. A regular expression may
168  PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These options have the  be continued over several input lines, in which case the newline characters are
169  same effect as they do in Perl.  included within it. See the testinput files for many examples. It is possible
170    to include the delimiter within the pattern by escaping it, for example
171    
172      /abc\/def/
173    
174    If you do so, the escape and the delimiter form part of the pattern, but since
175    delimiters are always non-alphameric, this does not affect its interpretation.
176    If the terminating delimiter is immediately followed by a backslash, for
177    example,
178    
179      /abc/\
180    
181    then a backslash is added to the end of the pattern. This provides a way of
182    testing the error condition that arises if a pattern finishes with a backslash,
183    because
184    
185      /abc\/
186    
187    is interpreted as the first line of a pattern that starts with "abc/", causing
188    pcretest to read the next line as a continuation of the regular expression.
189    
190    The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
191    PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These
192    options have the same effect as they do in Perl.
193    
194  There are also some upper case options that do not match Perl options: /A, /E,  There are also some upper case options that do not match Perl options: /A, /E,
195  and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.  and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
# Line 190  internal form of compiled regular expres Line 216  internal form of compiled regular expres
216  The /S option causes pcre_study() to be called after the expression has been  The /S option causes pcre_study() to be called after the expression has been
217  compiled, and the results used when the expression is matched.  compiled, and the results used when the expression is matched.
218    
219    The /M option causes information about the size of memory block used to hold
220    the compile pattern to be output.
221    
222  Finally, the /P option causes pcretest to call PCRE via the POSIX wrapper API  Finally, the /P option causes pcretest to call PCRE via the POSIX wrapper API
223  rather than its native API. When this is done, all other options except /i and  rather than its native API. When this is done, all other options except /i and
224  /m are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is set if /m  /m are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is set if /m
225  is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and  is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and
226  PCRE_DOTALL unless REG_NEWLINE is set.  PCRE_DOTALL unless REG_NEWLINE is set.
227    
 A regular expression can extend over several lines of input; the newlines are  
 included in it. See the testinput files for many examples.  
   
228  Before each data line is passed to pcre_exec(), leading and trailing whitespace  Before each data line is passed to pcre_exec(), leading and trailing whitespace
229  is removed, and it is then scanned for \ escapes. The following are recognized:  is removed, and it is then scanned for \ escapes. The following are recognized:
230    
# Line 215  is removed, and it is then scanned for \ Line 241  is removed, and it is then scanned for \
241    
242    \A     pass the PCRE_ANCHORED option to pcre_exec()    \A     pass the PCRE_ANCHORED option to pcre_exec()
243    \B     pass the PCRE_NOTBOL option to pcre_exec()    \B     pass the PCRE_NOTBOL option to pcre_exec()
244      \Cdd   call pcre_copy_substring() for substring dd after a successful match
245               (any decimal number less than 32)
246      \Gdd   call pcre_get_substring() for substring dd after a successful match
247               (any decimal number less than 32)
248      \L     call pcre_get_substringlist() after a successful match
249    \Odd   set the size of the output vector passed to pcre_exec() to dd    \Odd   set the size of the output vector passed to pcre_exec() to dd
250             (any number of decimal digits)             (any number of decimal digits)
251    \Z     pass the PCRE_NOTEOL option to pcre_exec()    \Z     pass the PCRE_NOTEOL option to pcre_exec()
# Line 227  If /P was present on the regex, causing Line 258  If /P was present on the regex, causing
258  \B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to  \B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
259  regexec() respectively.  regexec() respectively.
260    
261  When a match succeeds, pcretest outputs the list of identified substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
262  pcre_exec() returns, starting with number 0 for the string that matched the  pcre_exec() returns, starting with number 0 for the string that matched the
263  whole pattern. Here is an example of an interactive pcretest run.  whole pattern. Here is an example of an interactive pcretest run.
264    
# Line 242  whole pattern. Here is an example of an Line 273  whole pattern. Here is an example of an
273    data> xyz    data> xyz
274    No match    No match
275    
276    If any of \C, \G, or \L are present in a data line that is successfully
277    matched, the substrings extracted by the convenience functions are output with
278    C, G, or L after the string number instead of a colon. This is in addition to
279    the normal full list. The string length (that is, the return from the
280    extraction function) is given in parentheses after each string for \C and \G.
281    
282  Note that while patterns can be continued over several lines (a plain ">"  Note that while patterns can be continued over several lines (a plain ">"
283  prompt is used for continuations), data lines may not. However newlines can be  prompt is used for continuations), data lines may not. However newlines can be
284  included in data by means of the \n escape.  included in data by means of the \n escape.
# Line 257  If the option -i is given to pcretest, i Line 294  If the option -i is given to pcretest, i
294  regular expression: information about the compiled pattern is given after  regular expression: information about the compiled pattern is given after
295  compilation.  compilation.
296    
297  If the option -s is given to pcretest, it outputs the size of each compiled  If the option -m is given to pcretest, it outputs the size of each compiled
298  pattern after it has been compiled.  pattern after it has been compiled. It is equivalent to adding /M to each
299    regular expression. For compatibility with earlier versions of pcretest, -s is
300    a synonym for -m.
301    
302  If the -t option is given, each compile, study, and match is run 10000 times  If the -t option is given, each compile, study, and match is run 20000 times
303  while being timed, and the resulting time per compile or match is output in  while being timed, and the resulting time per compile or match is output in
304  milliseconds. Do not set -t with -s, because you will then get the size output  milliseconds. Do not set -t with -s, because you will then get the size output
305  10000 times and the timing will be distorted. If you want to change the number  20000 times and the timing will be distorted. If you want to change the number
306  of repetitions used for timing, edit the definition of LOOPREPEAT at the top of  of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
307  pcretest.c  pcretest.c
308    
# Line 291  contains malformed regular expressions, Line 330  contains malformed regular expressions,
330  them correctly.  them correctly.
331    
332  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel <ph10@cam.ac.uk>
333  October 1998  February 1999

Legend:
Removed from v.25  
changed lines
  Added in v.31

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12