--- code/trunk/README 2007/02/24 21:38:01 3 +++ code/trunk/README 2007/02/24 21:39:21 43 @@ -1,60 +1,121 @@ -README file for PCRE (Perl-compatible regular expressions) ----------------------------------------------------------- +README file for PCRE (Perl-compatible regular expression library) +----------------------------------------------------------------- -The distribution should contain the following files: +The latest release of PCRE is always available from + + ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz + +Please read the NEWS file if you are upgrading from a previous release. + + +Building PCRE on a Unix system +------------------------------ + +To build PCRE on a Unix system, run the "configure" command in the PCRE +distribution directory. This is a standard GNU "autoconf" configuration script, +for which generic instructions are supplied in INSTALL. On many systems just +running "./configure" is sufficient, but the usual methods of changing standard +defaults are available. For example + +CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local + +specifies that the C compiler should be run with the flags '-O2 -Wall' instead +of the default, and that "make install" should install PCRE under /opt/local +instead of the default /usr/local. The "configure" script builds thre files: + +. Makefile is built by copying Makefile.in and making substitutions. +. config.h is built by copying config.in and making substitutions. +. pcre-config is built by copying pcre-config.in and making substitutions. + +Once "configure" has run, you can run "make". It builds two libraries called +libpcre and libpcreposix, a test program called pcretest, and the pgrep +command. You can use "make install" to copy these, and the public header file +pcre.h, to appropriate live directories on your system, in the normal way. + +Running "make install" also installs the command pcre-config, which can be used +to recall information about the PCRE configuration and installation. For +example, + + pcre-config --version + +prints the version number, and + + pcre-config --libs + +outputs information about where the library is installed. This command can be +included in makefiles for programs that use PCRE, saving the programmer from +having to remember too many details. + + +Shared libraries on Unix systems +-------------------------------- + +The default distribution builds PCRE as two shared libraries. This support is +new and experimental and may not work on all systems. It relies on the +"libtool" scripts - these are distributed with PCRE. It should build a +"libtool" script and use this to compile and link shared libraries, which are +placed in a subdirectory called .libs. The programs pcretest and pgrep are +built to use these uninstalled libraries by means of wrapper scripts. When you +use "make install" to install shared libraries, pgrep and pcretest are +automatically re-built to use the newly installed libraries. However, only +pgrep is installed, as pcretest is really just a test program. + +To build PCRE using static libraries you must use --disable-shared when +configuring it. For example - ChangeLog log of changes to the code - Makefile for building PCRE - Performance notes on performance - README this file - Tech.Notes notes on the encoding - pcre.3 man page for the functions - pcreposix.3 man page for the POSIX wrapper API - maketables.c auxiliary program for building chartables.c - study.c ) source of - pcre.c ) the functions - pcreposix.c ) - pcre.h header for the external API - pcreposix.h header for the external POSIX wrapper API - internal.h header for internal use - pcretest.c test program - pgrep.1 man page for pgrep - pgrep.c source of a grep utility that uses PCRE - perltest Perl test program - testinput test data, compatible with Perl - testinput2 test data for error messages and non-Perl things - testoutput test results corresponding to testinput - testoutput2 test results corresponding to testinput2 - -To build PCRE, edit Makefile for your system (it is a fairly simple make file) -and then run it. It builds a two libraries called libpcre.a and libpcreposix.a, -a test program called pcretest, and the pgrep command. - -To test PCRE, run pcretest on the file testinput, and compare the output with -the contents of testoutput. There should be no differences. For example: - - pcretest testinput /tmp/anything - diff /tmp/anything testoutput - -Do the same with testinput2, comparing the output with testoutput2, but this -time using the -i flag for pcretest, i.e. - - pcretest -i testinput2 /tmp/anything - diff /tmp/anything testoutput2 - -There are two sets of tests because the first set can also be fed directly into -the perltest program to check that Perl gives the same results. The second set -of tests check pcre_info(), pcre_study(), error detection and run-time flags -that are specific to PCRE, as well as the POSIX wrapper API. - -To install PCRE, copy libpcre.a to any suitable library directory (e.g. -/usr/local/lib), pcre.h to any suitable include directory (e.g. -/usr/local/include), and pcre.3 to any suitable man directory (e.g. -/usr/local/man/man3). - -To install the pgrep command, copy it to any suitable binary directory, (e.g. -/usr/local/bin) and pgrep.1 to any suitable man directory (e.g. -/usr/local/man/man1). +./configure --prefix=/usr/gnu --disable-shared + +Then run "make" in the usual way. + + +Building on non-Unix systems +---------------------------- + +For a non-Unix system, read the comments in the file NON-UNIX-USE. PCRE has +been compiled on Windows systems and on Macintoshes, but I don't know the +details because I don't use those systems. It should be straightforward to +build PCRE on any system that has a Standard C compiler, because it uses only +Standard C functions. + + +Testing PCRE +------------ + +To test PCRE on a Unix system, run the RunTest script in the pcre directory. +(This can also be run by "make runtest" or "make check".) For other systems, +see the instruction in NON-UNIX-USE. + +The script runs the pcretest test program (which is documented in +doc/pcretest.txt) on each of the testinput files (in the testdata directory) in +turn, and compares the output with the contents of the corresponding testoutput +file. A file called testtry is used to hold the output from pcretest. To run +pcretest on just one of the test files, give its number as an argument to +RunTest, for example: + + RunTest 3 + +The first and third test files can also be fed directly into the perltest +script to check that Perl gives the same results. The third file requires the +additional features of release 5.005, which is why it is kept separate from the +main test input, which needs only Perl 5.004. In the long run, when 5.005 is +widespread, these two test files may get amalgamated. + +The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(), +pcre_get_substring(), pcre_get_substring_list(), error detection and run-time +flags that are specific to PCRE, as well as the POSIX wrapper API. + +The fourth set of tests checks pcre_maketables(), the facility for building a +set of character tables for a specific locale and using them instead of the +default tables. The tests make use of the "fr" (French) locale. Before running +the test, the script checks for the presence of this locale by running the +"locale" command. If that command fails, or if it doesn't include "fr" in the +list of available locales, the fourth test cannot be run, and a comment is +output to say why. If running this test produces instances of the error + + ** Failed to set locale "fr" + +in the comparison output, it means that locale is not available on your system, +despite being listed by "locale". This does not mean that PCRE is broken. PCRE has its own native API, but a set of "wrapper" functions that are based on the POSIX API are also supplied in the library libpcreposix.a. Note that this @@ -63,29 +124,35 @@ for the POSIX-style functions is called pcreposix.h. The official POSIX name is regex.h, but I didn't want to risk possible problems with existing files of that name by distributing it that way. To use it with an existing program that -uses the POSIX API it will have to be renamed or pointed at by a link. +uses the POSIX API, it will have to be renamed or pointed at by a link. Character tables ---------------- -PCRE uses four tables for manipulating and identifying characters. These are -compiled from a source file called chartables.c. This is not supplied in -the distribution, but is built by the program maketables (compiled from -maketables.c), which uses the ANSI C character handling functions such as -isalnum(), isalpha(), isupper(), islower(), etc. to build the table sources. -This means that the default C locale set in your system may affect the contents -of the tables. You can change the tables by editing chartables.c and then -re-building PCRE. If you do this, you should probably also edit Makefile to -ensure that the file doesn't ever get re-generated. - -The first two tables pcre_lcc[] and pcre_fcc[] provide lower casing and a -case flipping functions, respectively. The pcre_cbits[] table consists of four -32-byte bit maps which identify digits, letters, "word" characters, and white -space, respectively. These are used when building 32-byte bit maps that -represent character classes. +PCRE uses four tables for manipulating and identifying characters. The final +argument of the pcre_compile() function is a pointer to a block of memory +containing the concatenated tables. A call to pcre_maketables() can be used to +generate a set of tables in the current locale. If the final argument for +pcre_compile() is passed as NULL, a set of default tables that is built into +the binary is used. + +The source file called chartables.c contains the default set of tables. This is +not supplied in the distribution, but is built by the program dftables +(compiled from dftables.c), which uses the ANSI C character handling functions +such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table +sources. This means that the default C locale which is set for your system will +control the contents of these default tables. You can change the default tables +by editing chartables.c and then re-building PCRE. If you do this, you should +probably also edit Makefile to ensure that the file doesn't ever get +re-generated. + +The first two 256-byte tables provide lower casing and case flipping functions, +respectively. The next table consists of three 32-byte bit maps which identify +digits, "word" characters, and white space, respectively. These are used when +building 32-byte bit maps that represent character classes. -The pcre_ctypes[] table has bits indicating various character types, as +The final 256-byte table has bits indicating various character types, as follows: 1 white space character @@ -99,135 +166,74 @@ will cause PCRE to malfunction. -The pcretest program --------------------- +Manifest +-------- + +The distribution should contain the following files: + +(A) The actual source files of the PCRE library functions and their + headers: + + dftables.c auxiliary program for building chartables.c + get.c ) + maketables.c ) + study.c ) source of + pcre.c ) the functions + pcreposix.c ) + pcre.in "source" for the header for the external API; pcre.h + is built from this by "configure" + pcreposix.h header for the external POSIX wrapper API + internal.h header for internal use + config.in template for config.h, which is built by configure + +(B) Auxiliary files: + + AUTHORS information about the author of PCRE + ChangeLog log of changes to the code + INSTALL generic installation instructions + LICENCE conditions for the use of PCRE + COPYING the same, using GNU's standard name + Makefile.in template for Unix Makefile, which is built by configure + NEWS important changes in this release + NON-UNIX-USE notes on building PCRE on non-Unix systems + README this file + RunTest a Unix shell script for running tests + config.guess ) files used by libtool, + config.sub ) used only when building a shared library + configure a configuring shell script (built by autoconf) + configure.in the autoconf input used to build configure + doc/Tech.Notes notes on the encoding + doc/pcre.3 man page source for the PCRE functions + doc/pcre.html HTML version + doc/pcre.txt plain text version + doc/pcreposix.3 man page source for the POSIX wrapper API + doc/pcreposix.html HTML version + doc/pcreposix.txt plain text version + doc/pcretest.txt documentation of test program + doc/perltest.txt documentation of Perl test program + doc/pgrep.1 man page source for the pgrep utility + doc/pgrep.html HTML version + doc/pgrep.txt plain text version + install-sh a shell script for installing files + ltconfig ) files used to build "libtool", + ltmain.sh ) used only when building a shared library + pcretest.c test program + perltest Perl test program + pgrep.c source of a grep utility that uses PCRE + pcre-config.in source of script which retains PCRE information + testdata/testinput1 test data, compatible with Perl 5.004 and 5.005 + testdata/testinput2 test data for error messages and non-Perl things + testdata/testinput3 test data, compatible with Perl 5.005 + testdata/testinput4 test data for locale-specific tests + testdata/testoutput1 test results corresponding to testinput1 + testdata/testoutput2 test results corresponding to testinput2 + testdata/testoutput3 test results corresponding to testinput3 + testdata/testoutput4 test results corresponding to testinput4 -This program is intended for testing PCRE, but it can also be used for -experimenting with regular expressions. +(C) Auxiliary files for Win32 DLL -If it is given two filename arguments, it reads from the first and writes to -the second. If it is given only one filename argument, it reads from that file -and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and -prompts for each line of input. - -The program handles any number of sets of input on a single input file. Each -set starts with a regular expression, and continues with any number of data -lines to be matched against the pattern. An empty line signals the end of the -set. The regular expressions are given enclosed in any non-alphameric -delimiters, for example - - /(a|bc)x+yz/ - -and may be followed by i, m, s, or x to set the PCRE_CASELESS, PCRE_MULTILINE, -PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These options have the -same effect as they do in Perl. - -There are also some upper case options that do not match Perl options: /A, /E, -and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively. -The /D option is a PCRE debugging feature. It causes the internal form of -compiled regular expressions to be output after compilation. The /S option -causes pcre_study() to be called after the expression has been compiled, and -the results used when the expression is matched. If /I is present as well as -/S, then pcre_study() is called with the PCRE_CASELESS option. - -Finally, the /P option causes pcretest to call PCRE via the POSIX wrapper API -rather than its native API. When this is done, all other options except /i and -/m are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is set if /m -is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and -PCRE_DOTALL unless REG_NEWLINE is set. - -A regular expression can extend over several lines of input; the newlines are -included in it. See the testinput file for many examples. - -Before each data line is passed to pcre_exec(), leading and trailing whitespace -is removed, and it is then scanned for \ escapes. The following are recognized: - - \a alarm (= BEL) - \b backspace - \e escape - \f formfeed - \n newline - \r carriage return - \t tab - \v vertical tab - \nnn octal character (up to 3 octal digits) - \xhh hexadecimal character (up to 2 hex digits) - - \A pass the PCRE_ANCHORED option to pcre_exec() - \B pass the PCRE_NOTBOL option to pcre_exec() - \E pass the PCRE_DOLLAR_ENDONLY option to pcre_exec() - \I pass the PCRE_CASELESS option to pcre_exec() - \M pass the PCRE_MULTILINE option to pcre_exec() - \S pass the PCRE_DOTALL option to pcre_exec() - \Odd set the size of the output vector passed to pcre_exec() to dd - (any number of decimal digits) - \Z pass the PCRE_NOTEOL option to pcre_exec() - -A backslash followed by anything else just escapes the anything else. If the -very last character is a backslash, it is ignored. This gives a way of passing -an empty line as data, since a real empty line terminates the data input. - -If /P was present on the regex, causing the POSIX wrapper API to be used, only -\B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to -regexec() respectively. - -When a match succeeds, pcretest outputs the list of identified substrings that -pcre_exec() returns, starting with number 0 for the string that matched the -whole pattern. Here is an example of an interactive pcretest run. - - $ pcretest - Testing Perl-Compatible Regular Expressions - PCRE version 0.90 08-Sep-1997 - - re> /^abc(\d+)/ - data> abc123 - 0: abc123 - 1: 123 - data> xyz - No match - -Note that while patterns can be continued over several lines (a plain ">" -prompt is used for continuations), data lines may not. However newlines can be -included in data by means of the \n escape. - -If the -p option is given to pcretest, it is equivalent to adding /P to each -regular expression: the POSIX wrapper API is used to call PCRE. None of the -following flags has any effect in this case. - -If the option -d is given to pcretest, it is equivalent to adding /D to each -regular expression: the internal form is output after compilation. - -If the option -i (for "information") is given to pcretest, it calls pcre_info() -after compiling an expression, and outputs the information it gets back. If the -pattern is studied, the results of that are also output. - -If the option -s is given to pcretest, it outputs the size of each compiled -pattern after it has been compiled. - -If the -t option is given, each compile, study, and match is run 2000 times -while being timed, and the resulting time per compile or match is output in -milliseconds. Do not set -t with -s, because you will then get the size output -2000 times and the timing will be distorted. - - - -The perltest program --------------------- - -The perltest program tests Perl's regular expressions; it has the same -specification as pcretest, and so can be given identical input, except that -input patterns can be followed only by Perl's lower case options. - -The data lines are processed as Perl strings, so if they contain $ or @ -characters, these have to be escaped. For this reason, all such characters in -the testinput file are escaped so that it can be used for perltest as well as -for pcretest, and the special upper case options such as /A that pcretest -recognizes are not used in this file. The output should be identical, apart -from the initial identifying banner. - -The testinput2 file is not suitable for feeding to Perltest, since it does -make use of the special upper case options and escapes that pcretest uses to -test additional features of PCRE. + dll.mk + pcre.def Philip Hazel -October 1997 +February 2000