ViewVC logotype

Contents of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log

Revision 53 - (hide annotations) (download)
Sat Feb 24 21:39:42 2007 UTC (8 years, 1 month ago) by nigel
File size: 14247 byte(s)
Load pcre-3.5 into code/trunk.

1 nigel 41 README file for PCRE (Perl-compatible regular expression library)
2     -----------------------------------------------------------------
3 nigel 3
4 nigel 43 The latest release of PCRE is always available from
6     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
8 nigel 41 Please read the NEWS file if you are upgrading from a previous release.
9 nigel 23
10 nigel 49 PCRE has its own native API, but a set of "wrapper" functions that are based on
11     the POSIX API are also supplied in the library libpcreposix. Note that this
12     just provides a POSIX calling interface to PCRE: the regular expressions
13     themselves still follow Perl syntax and semantics. The header file
14     for the POSIX-style functions is called pcreposix.h. The official POSIX name is
15     regex.h, but I didn't want to risk possible problems with existing files of
16     that name by distributing it that way. To use it with an existing program that
17     uses the POSIX API, it will have to be renamed or pointed at by a link.
18 nigel 35
19 nigel 49
20 nigel 53 Contributions by users of PCRE
21     ------------------------------
23     You can find contributions from PCRE users in the directory
25     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
27     where there is also a README file giving brief descriptions of what they are.
28     Several of them provide support for compiling PCRE on various flavours of
29     Windows systems (I myself do not use Windows). Some are complete in themselves;
30     others are pointers to URLs containing relevant files.
33 nigel 41 Building PCRE on a Unix system
34     ------------------------------
35 nigel 3
36 nigel 53 To build PCRE on a Unix system, first run the "configure" command from the PCRE
37     distribution directory, with your current directory set to the directory where
38     you want the files to be created. This command is a standard GNU "autoconf"
39     configuration script, for which generic instructions are supplied in INSTALL.
40 nigel 3
41 nigel 53 Most commonly, people build PCRE within its own distribution directory, and in
42     this case, on many systems, just running "./configure" is sufficient, but the
43     usual methods of changing standard defaults are available. For example,
45 nigel 41 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
47     specifies that the C compiler should be run with the flags '-O2 -Wall' instead
48     of the default, and that "make install" should install PCRE under /opt/local
49 nigel 49 instead of the default /usr/local.
50 nigel 41
51 nigel 53 If you want to build in a different directory, just run "configure" with that
52     directory as current. For example, suppose you have unpacked the PCRE source
53     into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
55     cd /build/pcre/pcre-xxx
56     /source/pcre/pcre-xxx/configure
58 nigel 49 If you want to make use of the experimential, incomplete support for UTF-8
59     character strings in PCRE, you must add --enable-utf8 to the "configure"
60     command. Without it, the code for handling UTF-8 is not included in the
61     library. (Even when included, it still has to be enabled by an option at run
62     time.)
64 nigel 53 The "configure" script builds five files:
65 nigel 49
66 nigel 53 . libtool is a script that builds shared and/or static libraries
67 nigel 43 . Makefile is built by copying Makefile.in and making substitutions.
68     . config.h is built by copying config.in and making substitutions.
69     . pcre-config is built by copying pcre-config.in and making substitutions.
70 nigel 49 . RunTest is a script for running tests
71 nigel 41
72     Once "configure" has run, you can run "make". It builds two libraries called
73 nigel 49 libpcre and libpcreposix, a test program called pcretest, and the pcregrep
74 nigel 53 command. You can use "make install" to copy these, the public header files
75     pcre.h and pcreposix.h, and the man pages to appropriate live directories on
76     your system, in the normal way.
77 nigel 3
78 nigel 43 Running "make install" also installs the command pcre-config, which can be used
79     to recall information about the PCRE configuration and installation. For
80     example,
81 nigel 37
82 nigel 43 pcre-config --version
84     prints the version number, and
86     pcre-config --libs
88     outputs information about where the library is installed. This command can be
89     included in makefiles for programs that use PCRE, saving the programmer from
90     having to remember too many details.
92 nigel 53 There is one esoteric feature that is controlled by "configure". It concerns
93     the character value used for "newline", and is something that you probably do
94     not want to change on a Unix system. The default is to use whatever value your
95     compiler gives to '\n'. By using --enable-newline-is-cr or
96     --enable-newline-is-lf you can force the value to be CR (13) or LF (10) if you
97     really want to.
98 nigel 43
99 nigel 53
100 nigel 41 Shared libraries on Unix systems
101     --------------------------------
102 nigel 39
103 nigel 53 The default distribution builds PCRE as two shared libraries and two static
104     libraries, as long as the operating system supports shared libraries. Shared
105     library support relies on the "libtool" script which is built as part of the
106     "configure" process.
107 nigel 39
108 nigel 53 The libtool script is used to compile and link both shared and static
109     libraries. They are placed in a subdirectory called .libs when they are newly
110     built. The programs pcretest and pcregrep are built to use these uninstalled
111     libraries (by means of wrapper scripts in the case of shared libraries). When
112     you use "make install" to install shared libraries, pcregrep and pcretest are
113     automatically re-built to use the newly installed shared libraries before being
114     installed themselves. However, the versions left in the source directory still
115     use the uninstalled libraries.
117     To build PCRE using static libraries only you must use --disable-shared when
118 nigel 43 configuring it. For example
119 nigel 3
120 nigel 43 ./configure --prefix=/usr/gnu --disable-shared
121 nigel 41
122 nigel 53 Then run "make" in the usual way. Similarly, you can use --disable-static to
123     build only shared libraries.
124 nigel 41
125 nigel 43
126 nigel 41 Building on non-Unix systems
127     ----------------------------
129     For a non-Unix system, read the comments in the file NON-UNIX-USE. PCRE has
130     been compiled on Windows systems and on Macintoshes, but I don't know the
131     details because I don't use those systems. It should be straightforward to
132     build PCRE on any system that has a Standard C compiler, because it uses only
133     Standard C functions.
136     Testing PCRE
137     ------------
139 nigel 53 To test PCRE on a Unix system, run the RunTest script that is created by the
140     configuring process. (This can also be run by "make runtest", "make check", or
141     "make test".) For other systems, see the instruction in NON-UNIX-USE.
142 nigel 41
143 nigel 53 The script runs the pcretest test program (which is documented in the doc
144     directory) on each of the testinput files (in the testdata directory) in turn,
145     and compares the output with the contents of the corresponding testoutput file.
146     A file called testtry is used to hold the output from pcretest. To run pcretest
147     on just one of the test files, give its number as an argument to RunTest, for
148     example:
149 nigel 41
150 nigel 23 RunTest 3
151 nigel 3
152 nigel 23 The first and third test files can also be fed directly into the perltest
153 nigel 37 script to check that Perl gives the same results. The third file requires the
154 nigel 23 additional features of release 5.005, which is why it is kept separate from the
155 nigel 49 main test input, which needs only Perl 5.004. In the long run, when 5.005 (or
156     higher) is widespread, these two test files may get amalgamated.
157 nigel 3
158 nigel 49 The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
159     pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
160     detection, and run-time flags that are specific to PCRE, as well as the POSIX
161     wrapper API. It also uses the debugging flag to check some of the internals of
162     pcre_compile().
163 nigel 7
164 nigel 49 If you build PCRE with a locale setting that is not the standard C locale, the
165     character tables may be different (see next paragraph). In some cases, this may
166     cause failures in the second set of tests. For example, in a locale where the
167     isprint() function yields TRUE for characters in the range 128-255, the use of
168     [:isascii:] inside a character class defines a different set of characters, and
169     this shows up in this test as a difference in the compiled code, which is being
170     listed for checking. Where the comparison test output contains [\x00-\x7f] the
171     test will contain [\x00-\xff], and similarly in some other cases. This is not a
172     bug in PCRE.
174 nigel 25 The fourth set of tests checks pcre_maketables(), the facility for building a
175     set of character tables for a specific locale and using them instead of the
176     default tables. The tests make use of the "fr" (French) locale. Before running
177     the test, the script checks for the presence of this locale by running the
178     "locale" command. If that command fails, or if it doesn't include "fr" in the
179     list of available locales, the fourth test cannot be run, and a comment is
180     output to say why. If running this test produces instances of the error
182     ** Failed to set locale "fr"
184     in the comparison output, it means that locale is not available on your system,
185     despite being listed by "locale". This does not mean that PCRE is broken.
187 nigel 49 The fifth test checks the experimental, incomplete UTF-8 support. It is not run
188     automatically unless PCRE is built with UTF-8 support. This file can be fed
189     directly to the perltest8 script, which requires Perl 5.6 or higher. The sixth
190     file tests internal UTF-8 features of PCRE that are not relevant to Perl.
191 nigel 3
193     Character tables
194     ----------------
196 nigel 25 PCRE uses four tables for manipulating and identifying characters. The final
197     argument of the pcre_compile() function is a pointer to a block of memory
198 nigel 35 containing the concatenated tables. A call to pcre_maketables() can be used to
199     generate a set of tables in the current locale. If the final argument for
200     pcre_compile() is passed as NULL, a set of default tables that is built into
201     the binary is used.
202 nigel 3
203 nigel 25 The source file called chartables.c contains the default set of tables. This is
204 nigel 27 not supplied in the distribution, but is built by the program dftables
205     (compiled from dftables.c), which uses the ANSI C character handling functions
206 nigel 25 such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
207 nigel 35 sources. This means that the default C locale which is set for your system will
208     control the contents of these default tables. You can change the default tables
209     by editing chartables.c and then re-building PCRE. If you do this, you should
210     probably also edit Makefile to ensure that the file doesn't ever get
211     re-generated.
212 nigel 3
213 nigel 25 The first two 256-byte tables provide lower casing and case flipping functions,
214     respectively. The next table consists of three 32-byte bit maps which identify
215     digits, "word" characters, and white space, respectively. These are used when
216     building 32-byte bit maps that represent character classes.
218     The final 256-byte table has bits indicating various character types, as
219 nigel 3 follows:
221     1 white space character
222     2 letter
223     4 decimal digit
224     8 hexadecimal digit
225     16 alphanumeric or '_'
226     128 regular expression metacharacter or binary zero
228     You should not alter the set of characters that contain the 128 bit, as that
229     will cause PCRE to malfunction.
232 nigel 41 Manifest
233     --------
234 nigel 3
235 nigel 41 The distribution should contain the following files:
236 nigel 3
237 nigel 41 (A) The actual source files of the PCRE library functions and their
238     headers:
239 nigel 3
240 nigel 41 dftables.c auxiliary program for building chartables.c
241     get.c )
242     maketables.c )
243     study.c ) source of
244     pcre.c ) the functions
245     pcreposix.c )
246 nigel 43 pcre.in "source" for the header for the external API; pcre.h
247     is built from this by "configure"
248 nigel 41 pcreposix.h header for the external POSIX wrapper API
249     internal.h header for internal use
250     config.in template for config.h, which is built by configure
251 nigel 3
252 nigel 41 (B) Auxiliary files:
253 nigel 3
254 nigel 41 AUTHORS information about the author of PCRE
255     ChangeLog log of changes to the code
256     INSTALL generic installation instructions
257     LICENCE conditions for the use of PCRE
258 nigel 43 COPYING the same, using GNU's standard name
259 nigel 41 Makefile.in template for Unix Makefile, which is built by configure
260     NEWS important changes in this release
261     NON-UNIX-USE notes on building PCRE on non-Unix systems
262     README this file
263 nigel 49 RunTest.in template for a Unix shell script for running tests
264 nigel 41 config.guess ) files used by libtool,
265     config.sub ) used only when building a shared library
266     configure a configuring shell script (built by autoconf)
267     configure.in the autoconf input used to build configure
268     doc/Tech.Notes notes on the encoding
269     doc/pcre.3 man page source for the PCRE functions
270     doc/pcre.html HTML version
271     doc/pcre.txt plain text version
272     doc/pcreposix.3 man page source for the POSIX wrapper API
273     doc/pcreposix.html HTML version
274     doc/pcreposix.txt plain text version
275     doc/pcretest.txt documentation of test program
276     doc/perltest.txt documentation of Perl test program
277 nigel 49 doc/pcregrep.1 man page source for the pcregrep utility
278     doc/pcregrep.html HTML version
279     doc/pcregrep.txt plain text version
280 nigel 41 install-sh a shell script for installing files
281 nigel 53 ltmain.sh file used to build a libtool script
282     pcretest.c comprehensive test program
283     pcredemo.c simple demonstration of coding calls to PCRE
284 nigel 41 perltest Perl test program
285 nigel 49 perltest8 Perl test program for UTF-8 tests
286     pcregrep.c source of a grep utility that uses PCRE
287 nigel 43 pcre-config.in source of script which retains PCRE information
288 nigel 41 testdata/testinput1 test data, compatible with Perl 5.004 and 5.005
289     testdata/testinput2 test data for error messages and non-Perl things
290     testdata/testinput3 test data, compatible with Perl 5.005
291     testdata/testinput4 test data for locale-specific tests
292 nigel 49 testdata/testinput5 test data for UTF-8 tests compatible with Perl 5.6
293     testdata/testinput6 test data for other UTF-8 tests
294 nigel 41 testdata/testoutput1 test results corresponding to testinput1
295     testdata/testoutput2 test results corresponding to testinput2
296     testdata/testoutput3 test results corresponding to testinput3
297     testdata/testoutput4 test results corresponding to testinput4
298 nigel 49 testdata/testoutput5 test results corresponding to testinput5
299     testdata/testoutput6 test results corresponding to testinput6
300 nigel 3
301 nigel 41 (C) Auxiliary files for Win32 DLL
302 nigel 29
303 nigel 41 dll.mk
304     pcre.def
305 nigel 29
306 nigel 3 Philip Hazel <ph10@cam.ac.uk>
307 nigel 53 August 2001

ViewVC Help
Powered by ViewVC 1.1.12