/[pcre]/code/tags/pcre-2.05/README
ViewVC logotype

Diff of /code/tags/pcre-2.05/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 3 by nigel, Sat Feb 24 21:38:01 2007 UTC revision 27 by nigel, Sat Feb 24 21:38:49 2007 UTC
# Line 1  Line 1 
1  README file for PCRE (Perl-compatible regular expressions)  README file for PCRE (Perl-compatible regular expressions)
2  ----------------------------------------------------------  ----------------------------------------------------------
3    
4    *******************************************************************************
5    *           IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00           *
6    *                                                                             *
7    * Please note that there has been a change in the API such that a larger      *
8    * ovector is required at matching time, to provide some additional workspace. *
9    * The new man page has details. This change was necessary in order to support *
10    * some of the new functionality in Perl 5.005.                                *
11    *                                                                             *
12    *           IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00                   *
13    *                                                                             *
14    * Another (I hope this is the last!) change has been made to the API for the  *
15    * pcre_compile() function. An additional argument has been added to make it   *
16    * possible to pass over a pointer to character tables built in the current    *
17    * locale by pcre_maketables(). To use the default tables, this new arguement  *
18    * should be passed as NULL.                                                   *
19    *******************************************************************************
20    
21  The distribution should contain the following files:  The distribution should contain the following files:
22    
23    ChangeLog         log of changes to the code    ChangeLog         log of changes to the code
24    Makefile          for building PCRE    Makefile          for building PCRE
   Performance       notes on performance  
25    README            this file    README            this file
26      RunTest           a shell script for running tests
27    Tech.Notes        notes on the encoding    Tech.Notes        notes on the encoding
28    pcre.3            man page for the functions    pcre.3            man page for the functions
29    pcreposix.3       man page for the POSIX wrapper API    pcreposix.3       man page for the POSIX wrapper API
30    maketables.c      auxiliary program for building chartables.c    dftables.c        auxiliary program for building chartables.c
31      maketables.c      )
32    study.c           ) source of    study.c           ) source of
33    pcre.c            )   the functions    pcre.c            )   the functions
34    pcreposix.c       )    pcreposix.c       )
# Line 21  The distribution should contain the foll Line 39  The distribution should contain the foll
39    pgrep.1           man page for pgrep    pgrep.1           man page for pgrep
40    pgrep.c           source of a grep utility that uses PCRE    pgrep.c           source of a grep utility that uses PCRE
41    perltest          Perl test program    perltest          Perl test program
42    testinput         test data, compatible with Perl    testinput         test data, compatible with Perl 5.004 and 5.005
43    testinput2        test data for error messages and non-Perl things    testinput2        test data for error messages and non-Perl things
44      testinput3        test data, compatible with Perl 5.005
45      testinput4        test data for locale-specific tests
46    testoutput        test results corresponding to testinput    testoutput        test results corresponding to testinput
47    testoutput2       test results corresponding to testinput2    testoutput2       test results corresponding to testinput2
48      testoutput3       test results corresponding to testinput3
49      testoutput4       test results corresponding to testinput4
50    
51    To build PCRE, edit Makefile for your system (it is a fairly simple make file,
52    and there are some comments at the top) and then run it. It builds two
53    libraries called libpcre.a and libpcreposix.a, a test program called pcretest,
54    and the pgrep command.
55    
56    To test PCRE, run the RunTest script in the pcre directory. This runs pcretest
57    on each of the testinput files in turn, and compares the output with the
58    contents of the corresponding testoutput file. A file called testtry is used to
59    hold the output from pcretest (which is documented below).
60    
61    To run pcretest on just one of the test files, give its number as an argument
62    to RunTest, for example:
63    
64      RunTest 3
65    
66    The first and third test files can also be fed directly into the perltest
67    program to check that Perl gives the same results. The third file requires the
68    additional features of release 5.005, which is why it is kept separate from the
69    main test input, which needs only Perl 5.004. In the long run, when 5.005 is
70    widespread, these two test files may get amalgamated.
71    
72    The second set of tests check pcre_info(), pcre_study(), error detection and
73    run-time flags that are specific to PCRE, as well as the POSIX wrapper API.
74    
75    The fourth set of tests checks pcre_maketables(), the facility for building a
76    set of character tables for a specific locale and using them instead of the
77    default tables. The tests make use of the "fr" (French) locale. Before running
78    the test, the script checks for the presence of this locale by running the
79    "locale" command. If that command fails, or if it doesn't include "fr" in the
80    list of available locales, the fourth test cannot be run, and a comment is
81    output to say why. If running this test produces instances of the error
82    
83  To build PCRE, edit Makefile for your system (it is a fairly simple make file)    ** Failed to set locale "fr"
84  and then run it. It builds a two libraries called libpcre.a and libpcreposix.a,  
85  a test program called pcretest, and the pgrep command.  in the comparison output, it means that locale is not available on your system,
86    despite being listed by "locale". This does not mean that PCRE is broken.
 To test PCRE, run pcretest on the file testinput, and compare the output with  
 the contents of testoutput. There should be no differences. For example:  
   
   pcretest testinput /tmp/anything  
   diff /tmp/anything testoutput  
   
 Do the same with testinput2, comparing the output with testoutput2, but this  
 time using the -i flag for pcretest, i.e.  
   
   pcretest -i testinput2 /tmp/anything  
   diff /tmp/anything testoutput2  
   
 There are two sets of tests because the first set can also be fed directly into  
 the perltest program to check that Perl gives the same results. The second set  
 of tests check pcre_info(), pcre_study(), error detection and run-time flags  
 that are specific to PCRE, as well as the POSIX wrapper API.  
87    
88  To install PCRE, copy libpcre.a to any suitable library directory (e.g.  To install PCRE, copy libpcre.a to any suitable library directory (e.g.
89  /usr/local/lib), pcre.h to any suitable include directory (e.g.  /usr/local/lib), pcre.h to any suitable include directory (e.g.
# Line 63  themselves still follow Perl syntax and Line 101  themselves still follow Perl syntax and
101  for the POSIX-style functions is called pcreposix.h. The official POSIX name is  for the POSIX-style functions is called pcreposix.h. The official POSIX name is
102  regex.h, but I didn't want to risk possible problems with existing files of  regex.h, but I didn't want to risk possible problems with existing files of
103  that name by distributing it that way. To use it with an existing program that  that name by distributing it that way. To use it with an existing program that
104  uses the POSIX API it will have to be renamed or pointed at by a link.  uses the POSIX API, it will have to be renamed or pointed at by a link.
105    
106    
107  Character tables  Character tables
108  ----------------  ----------------
109    
110  PCRE uses four tables for manipulating and identifying characters. These are  PCRE uses four tables for manipulating and identifying characters. The final
111  compiled from a source file called chartables.c. This is not supplied in  argument of the pcre_compile() function is a pointer to a block of memory
112  the distribution, but is built by the program maketables (compiled from  containing the concatenated tables. A call to pcre_maketables() is used to
113  maketables.c), which uses the ANSI C character handling functions such as  generate a set of tables in the current locale. However, if the final argument
114  isalnum(), isalpha(), isupper(), islower(), etc. to build the table sources.  is passed as NULL, a set of default tables that is built into the binary is
115  This means that the default C locale set in your system may affect the contents  used.
116  of the tables. You can change the tables by editing chartables.c and then  
117  re-building PCRE. If you do this, you should probably also edit Makefile to  The source file called chartables.c contains the default set of tables. This is
118  ensure that the file doesn't ever get re-generated.  not supplied in the distribution, but is built by the program dftables
119    (compiled from dftables.c), which uses the ANSI C character handling functions
120  The first two tables pcre_lcc[] and pcre_fcc[] provide lower casing and a  such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
121  case flipping functions, respectively. The pcre_cbits[] table consists of four  sources. This means that the default C locale set your system will control the
122  32-byte bit maps which identify digits, letters, "word" characters, and white  contents of the tables. You can change the default tables by editing
123  space, respectively. These are used when building 32-byte bit maps that  chartables.c and then re-building PCRE. If you do this, you should probably
124  represent character classes.  also edit Makefile to ensure that the file doesn't ever get re-generated.
125    
126    The first two 256-byte tables provide lower casing and case flipping functions,
127    respectively. The next table consists of three 32-byte bit maps which identify
128    digits, "word" characters, and white space, respectively. These are used when
129    building 32-byte bit maps that represent character classes.
130    
131  The pcre_ctypes[] table has bits indicating various character types, as  The final 256-byte table has bits indicating various character types, as
132  follows:  follows:
133    
134      1   white space character      1   white space character
# Line 124  same effect as they do in Perl. Line 167  same effect as they do in Perl.
167    
168  There are also some upper case options that do not match Perl options: /A, /E,  There are also some upper case options that do not match Perl options: /A, /E,
169  and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.  and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
170  The /D option is a PCRE debugging feature. It causes the internal form of  
171  compiled regular expressions to be output after compilation. The /S option  The /L option must be followed directly by the name of a locale, for example,
172  causes pcre_study() to be called after the expression has been compiled, and  
173  the results used when the expression is matched. If /I is present as well as    /pattern/Lfr
174  /S, then pcre_study() is called with the PCRE_CASELESS option.  
175    For this reason, it must be the last option letter. The given locale is set,
176    pcre_maketables() is called to build a set of character tables for the locale,
177    and this is then passed to pcre_compile() when compiling the regular
178    expression. Without an /L option, NULL is passed as the tables pointer; that
179    is, /L applies only to the expression on which it appears.
180    
181    The /I option requests that pcretest output information about the compiled
182    expression (whether it is anchored, has a fixed first character, and so on). It
183    does this by calling pcre_info() after compiling an expression, and outputting
184    the information it gets back. If the pattern is studied, the results of that
185    are also output.
186    
187    The /D option is a PCRE debugging feature, which also assumes /I. It causes the
188    internal form of compiled regular expressions to be output after compilation.
189    
190    The /S option causes pcre_study() to be called after the expression has been
191    compiled, and the results used when the expression is matched.
192    
193  Finally, the /P option causes pcretest to call PCRE via the POSIX wrapper API  Finally, the /P option causes pcretest to call PCRE via the POSIX wrapper API
194  rather than its native API. When this is done, all other options except /i and  rather than its native API. When this is done, all other options except /i and
# Line 137  is present. The wrapper functions force Line 197  is present. The wrapper functions force
197  PCRE_DOTALL unless REG_NEWLINE is set.  PCRE_DOTALL unless REG_NEWLINE is set.
198    
199  A regular expression can extend over several lines of input; the newlines are  A regular expression can extend over several lines of input; the newlines are
200  included in it. See the testinput file for many examples.  included in it. See the testinput files for many examples.
201    
202  Before each data line is passed to pcre_exec(), leading and trailing whitespace  Before each data line is passed to pcre_exec(), leading and trailing whitespace
203  is removed, and it is then scanned for \ escapes. The following are recognized:  is removed, and it is then scanned for \ escapes. The following are recognized:
# Line 155  is removed, and it is then scanned for \ Line 215  is removed, and it is then scanned for \
215    
216    \A     pass the PCRE_ANCHORED option to pcre_exec()    \A     pass the PCRE_ANCHORED option to pcre_exec()
217    \B     pass the PCRE_NOTBOL option to pcre_exec()    \B     pass the PCRE_NOTBOL option to pcre_exec()
   \E     pass the PCRE_DOLLAR_ENDONLY option to pcre_exec()  
   \I     pass the PCRE_CASELESS option to pcre_exec()  
   \M     pass the PCRE_MULTILINE option to pcre_exec()  
   \S     pass the PCRE_DOTALL option to pcre_exec()  
218    \Odd   set the size of the output vector passed to pcre_exec() to dd    \Odd   set the size of the output vector passed to pcre_exec() to dd
219             (any number of decimal digits)             (any number of decimal digits)
220    \Z     pass the PCRE_NOTEOL option to pcre_exec()    \Z     pass the PCRE_NOTEOL option to pcre_exec()
# Line 179  whole pattern. Here is an example of an Line 235  whole pattern. Here is an example of an
235    Testing Perl-Compatible Regular Expressions    Testing Perl-Compatible Regular Expressions
236    PCRE version 0.90 08-Sep-1997    PCRE version 0.90 08-Sep-1997
237    
238        re> /^abc(\d+)/      re> /^abc(\d+)/
239      data> abc123    data> abc123
240     0: abc123      0: abc123
241     1: 123      1: 123
242      data> xyz    data> xyz
243    No match    No match
244    
245  Note that while patterns can be continued over several lines (a plain ">"  Note that while patterns can be continued over several lines (a plain ">"
# Line 197  following flags has any effect in this c Line 253  following flags has any effect in this c
253  If the option -d is given to pcretest, it is equivalent to adding /D to each  If the option -d is given to pcretest, it is equivalent to adding /D to each
254  regular expression: the internal form is output after compilation.  regular expression: the internal form is output after compilation.
255    
256  If the option -i (for "information") is given to pcretest, it calls pcre_info()  If the option -i is given to pcretest, it is equivalent to adding /I to each
257  after compiling an expression, and outputs the information it gets back. If the  regular expression: information about the compiled pattern is given after
258  pattern is studied, the results of that are also output.  compilation.
259    
260  If the option -s is given to pcretest, it outputs the size of each compiled  If the option -s is given to pcretest, it outputs the size of each compiled
261  pattern after it has been compiled.  pattern after it has been compiled.
262    
263  If the -t option is given, each compile, study, and match is run 2000 times  If the -t option is given, each compile, study, and match is run 10000 times
264  while being timed, and the resulting time per compile or match is output in  while being timed, and the resulting time per compile or match is output in
265  milliseconds. Do not set -t with -s, because you will then get the size output  milliseconds. Do not set -t with -s, because you will then get the size output
266  2000 times and the timing will be distorted.  10000 times and the timing will be distorted. If you want to change the number
267    of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
268    pcretest.c
269    
270    
271    
# Line 216  The perltest program Line 274  The perltest program
274    
275  The perltest program tests Perl's regular expressions; it has the same  The perltest program tests Perl's regular expressions; it has the same
276  specification as pcretest, and so can be given identical input, except that  specification as pcretest, and so can be given identical input, except that
277  input patterns can be followed only by Perl's lower case options.  input patterns can be followed only by Perl's lower case options. The contents
278    of testinput and testinput3 meet this condition.
279    
280  The data lines are processed as Perl strings, so if they contain $ or @  The data lines are processed as Perl strings, so if they contain $ or @
281  characters, these have to be escaped. For this reason, all such characters in  characters, these have to be escaped. For this reason, all such characters in
# Line 225  for pcretest, and the special upper case Line 284  for pcretest, and the special upper case
284  recognizes are not used in this file. The output should be identical, apart  recognizes are not used in this file. The output should be identical, apart
285  from the initial identifying banner.  from the initial identifying banner.
286    
287  The testinput2 file is not suitable for feeding to Perltest, since it does  The testinput2 and testinput4 files are not suitable for feeding to Perltest,
288  make use of the special upper case options and escapes that pcretest uses to  since they do make use of the special upper case options and escapes that
289  test additional features of PCRE.  pcretest uses to test some features of PCRE. The first of these files also
290    contains malformed regular expressions, in order to check that PCRE diagnoses
291    them correctly.
292    
293  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel <ph10@cam.ac.uk>
294  October 1997  January 1999

Legend:
Removed from v.3  
changed lines
  Added in v.27

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12