ViewVC logotype

Contents of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log

Revision 489 - (hide annotations) (download)
Tue Jan 19 16:42:21 2010 UTC (5 years, 3 months ago) by ph10
File size: 36246 byte(s)
File tidies and documentation update for 8.01.

1 nigel 41 README file for PCRE (Perl-compatible regular expression library)
2     -----------------------------------------------------------------
3 nigel 3
4 ph10 374 The latest release of PCRE is always available in three alternative formats
5     from:
6 nigel 43
7     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
8 ph10 374 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.bz2
9     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.zip
10 ph10 123
11 ph10 111 There is a mailing list for discussion about the development of PCRE at
12 nigel 43
13 ph10 123 pcre-dev@exim.org
15 nigel 41 Please read the NEWS file if you are upgrading from a previous release.
16 ph10 109 The contents of this README file are:
17 nigel 23
18 ph10 109 The PCRE APIs
19     Documentation for PCRE
20     Contributions by users of PCRE
21     Building PCRE on non-Unix systems
22 ph10 122 Building PCRE on Unix-like systems
23     Retrieving configuration information on Unix-like systems
24 ph10 109 Shared libraries on Unix-like systems
25 ph10 122 Cross-compiling on Unix-like systems
26 ph10 109 Using HP's ANSI C++ compiler (aCC)
27 ph10 461 Using PCRE from MySQL
28 ph10 111 Making new tarballs
29 ph10 109 Testing PCRE
30     Character tables
31     File manifest
32 nigel 35
33 ph10 109
34 nigel 77 The PCRE APIs
35     -------------
37 ph10 128 PCRE is written in C, and it has its own API. The distribution also includes a
38     set of C++ wrapper functions (see the pcrecpp man page for details), courtesy
39     of Google Inc.
40 nigel 77
41 ph10 128 In addition, there is a set of C wrapper functions that are based on the POSIX
42     regular expression API (see the pcreposix man page). These end up in the
43     library called libpcreposix. Note that this just provides a POSIX calling
44     interface to PCRE; the regular expressions themselves still follow Perl syntax
45     and semantics. The POSIX API is restricted, and does not give full access to
46     all of PCRE's facilities.
47 nigel 77
48 ph10 109 The header file for the POSIX-style functions is called pcreposix.h. The
49     official POSIX name is regex.h, but I did not want to risk possible problems
50     with existing files of that name by distributing it that way. To use PCRE with
51     an existing program that uses the POSIX API, pcreposix.h will have to be
52     renamed or pointed at by a link.
54 nigel 73 If you are using the POSIX interface to PCRE and there is already a POSIX regex
55 ph10 109 library installed on your system, as well as worrying about the regex.h header
56     file (as mentioned above), you must also take care when linking programs to
57 nigel 73 ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
58 ph10 109 up the POSIX functions of the same name from the other library.
59 nigel 49
60 ph10 109 One way of avoiding this confusion is to compile PCRE with the addition of
61 ph10 122 -Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
62     compiler flags (CFLAGS if you are using "configure" -- see below). This has the
63     effect of renaming the functions so that the names no longer clash. Of course,
64     you have to do the same thing for your applications, or write them using the
65     new names.
66 nigel 73
67 ph10 109
68 nigel 75 Documentation for PCRE
69     ----------------------
71 ph10 122 If you install PCRE in the normal way on a Unix-like system, you will end up
72     with a set of man pages whose names all start with "pcre". The one that is just
73     called "pcre" lists all the others. In addition to these man pages, the PCRE
74     documentation is supplied in two other forms:
75 nigel 75
76 ph10 109 1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
77     doc/pcretest.txt in the source distribution. The first of these is a
78     concatenation of the text forms of all the section 3 man pages except
79     those that summarize individual functions. The other two are the text
80     forms of the section 1 man pages for the pcregrep and pcretest commands.
81     These text forms are provided for ease of scanning with text editors or
82 ph10 123 similar tools. They are installed in <prefix>/share/doc/pcre, where
83 ph10 111 <prefix> is the installation prefix (defaulting to /usr/local).
84 nigel 75
85 ph10 109 2. A set of files containing all the documentation in HTML form, hyperlinked
86 ph10 123 in various ways, and rooted in a file called index.html, is distributed in
87 ph10 122 doc/html and installed in <prefix>/share/doc/pcre/html.
88 ph10 406
89 ph10 401 Users of PCRE have contributed files containing the documentation for various
90     releases in CHM format. These can be found in the Contrib directory of the FTP
91     site (see next section).
92 nigel 75
94 nigel 53 Contributions by users of PCRE
95     ------------------------------
97     You can find contributions from PCRE users in the directory
99     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
101 ph10 122 There is a README file giving brief descriptions of what they are. Some are
102     complete in themselves; others are pointers to URLs containing relevant files.
103 ph10 128 Some of this material is likely to be well out-of-date. Several of the earlier
104     contributions provided support for compiling PCRE on various flavours of
105     Windows (I myself do not use Windows). Nowadays there is more Windows support
106     in the standard distribution, so these contibutions have been archived.
107 nigel 53
109 ph10 109 Building PCRE on non-Unix systems
110     ---------------------------------
111 ph10 101
112 ph10 122 For a non-Unix system, please read the comments in the file NON-UNIX-USE,
113     though if your system supports the use of "configure" and "make" you may be
114 ph10 260 able to build PCRE in the same way as for Unix-like systems. PCRE can also be
115 ph10 436 configured in many platform environments using the GUI facility provided by
116     CMake's cmake-gui command. This creates Makefiles, solution files, etc.
117 ph10 101
118 ph10 109 PCRE has been compiled on many different operating systems. It should be
119     straightforward to build PCRE on any system that has a Standard C compiler and
120     library, because it uses only Standard C functions.
121 ph10 101
123 ph10 122 Building PCRE on Unix-like systems
124     ----------------------------------
125 nigel 3
126 nigel 87 If you are using HP's ANSI C++ compiler (aCC), please see the special note
127     in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
129 ph10 145 The following instructions assume the use of the widely used "configure, make,
130 ph10 317 make install" process. There is also support for CMake in the PCRE
131     distribution; there are some comments about using CMake in the NON-UNIX-USE
132     file, though it can also be used in Unix-like systems.
133 ph10 144
134 nigel 63 To build PCRE on a Unix-like system, first run the "configure" command from the
135     PCRE distribution directory, with your current directory set to the directory
136     where you want the files to be created. This command is a standard GNU
137     "autoconf" configuration script, for which generic instructions are supplied in
138 ph10 122 the file INSTALL.
139 nigel 3
140 nigel 53 Most commonly, people build PCRE within its own distribution directory, and in
141 ph10 109 this case, on many systems, just running "./configure" is sufficient. However,
142     the usual methods of changing standard defaults are available. For example:
143 nigel 53
144 nigel 41 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
146     specifies that the C compiler should be run with the flags '-O2 -Wall' instead
147     of the default, and that "make install" should install PCRE under /opt/local
148 nigel 49 instead of the default /usr/local.
149 nigel 41
150 nigel 53 If you want to build in a different directory, just run "configure" with that
151     directory as current. For example, suppose you have unpacked the PCRE source
152     into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
154     cd /build/pcre/pcre-xxx
155     /source/pcre/pcre-xxx/configure
157 nigel 87 PCRE is written in C and is normally compiled as a C library. However, it is
158     possible to build it as a C++ library, though the provided building apparatus
159     does not have any features to support this.
161 nigel 63 There are some optional features that can be included or omitted from the PCRE
162     library. You can read more about them in the pcrebuild man page.
163 nigel 49
164 nigel 83 . If you want to suppress the building of the C++ wrapper library, you can add
165     --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
166 ph10 128 it will try to find a C++ compiler and C++ header files, and if it succeeds,
167     it will try to build the C++ wrapper.
168 nigel 83
169 ph10 391 . If you want to make use of the support for UTF-8 Unicode character strings in
170     PCRE, you must add --enable-utf8 to the "configure" command. Without it, the
171     code for handling UTF-8 is not included in the library. Even when included,
172     it still has to be enabled by an option at run time. When PCRE is compiled
173     with this option, its input can only either be ASCII or UTF-8, even when
174     running on EBCDIC platforms. It is not possible to use both --enable-utf8 and
175     --enable-ebcdic at the same time.
176 nigel 63
177 nigel 75 . If, in addition to support for UTF-8 character strings, you want to include
178     support for the \P, \p, and \X sequences that recognize Unicode character
179     properties, you must add --enable-unicode-properties to the "configure"
180 nigel 91 command. This adds about 30K to the size of the library (in the form of a
181 nigel 75 property table); only the basic two-letter properties such as Lu are
182     supported.
184 nigel 93 . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
185 ph10 149 of the preceding, or any of the Unicode newline sequences as indicating the
186     end of a line. Whatever you specify at build time is the default; the caller
187     of PCRE can change the selection at run time. The default newline indicator
188     is a single LF character (the Unix standard). You can specify the default
189     newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
190     or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
191     --enable-newline-is-any to the "configure" command, respectively.
192 ph10 109
193 ph10 149 If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
194     the standard tests will fail, because the lines in the test files end with
195     LF. Even if the files are edited to change the line endings, there are likely
196     to be some failures. With --enable-newline-is-anycrlf or
197     --enable-newline-is-any, many tests should succeed, but there may be some
198     failures.
199 ph10 254
200     . By default, the sequence \R in a pattern matches any Unicode line ending
201     sequence. This is independent of the option specifying what PCRE considers to
202 ph10 251 be the end of a line (see above). However, the caller of PCRE can restrict \R
203     to match only CR, LF, or CRLF. You can make this the default by adding
204     --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
205 nigel 63
206     . When called via the POSIX interface, PCRE uses malloc() to get additional
207     storage for processing capturing parentheses if there are more than 10 of
208 ph10 128 them in a pattern. You can increase this threshold by setting, for example,
209 nigel 63
210     --with-posix-malloc-threshold=20
212     on the "configure" command.
214 nigel 77 . PCRE has a counter that can be set to limit the amount of resources it uses.
215 nigel 63 If the limit is exceeded during a match, the match fails. The default is ten
216     million. You can change the default by setting, for example,
218     --with-match-limit=500000
220     on the "configure" command. This is just the default; individual calls to
221 ph10 122 pcre_exec() can supply their own value. There is more discussion on the
222     pcreapi man page.
223 nigel 63
224 nigel 91 . There is a separate counter that limits the depth of recursive function calls
225     during a matching process. This also has a default of ten million, which is
226     essentially "unlimited". You can change the default by setting, for example,
228     --with-match-limit-recursion=500000
230     Recursive function calls use up the runtime stack; running out of stack can
231     cause programs to crash in strange ways. There is a discussion about stack
232     sizes in the pcrestack man page.
234 nigel 63 . The default maximum compiled pattern size is around 64K. You can increase
235     this by adding --with-link-size=3 to the "configure" command. You can
236     increase it even more by setting --with-link-size=4, but this is unlikely
237 ph10 128 ever to be necessary. Increasing the internal link size will reduce
238     performance.
239 nigel 63
240 nigel 77 . You can build PCRE so that its internal match() function that is called from
241 ph10 122 pcre_exec() does not call itself recursively. Instead, it uses memory blocks
242     obtained from the heap via the special functions pcre_stack_malloc() and
243     pcre_stack_free() to save data that would otherwise be saved on the stack. To
244     build PCRE like this, use
245 nigel 73
246     --disable-stack-for-recursion
248     on the "configure" command. PCRE runs more slowly in this mode, but it may be
249 nigel 77 necessary in environments with limited stack sizes. This applies only to the
250     pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
251 ph10 122 use deeply nested recursion. There is a discussion about stack sizes in the
252     pcrestack man page.
253 nigel 73
254 ph10 128 . For speed, PCRE uses four tables for manipulating and identifying characters
255     whose code point values are less than 256. By default, it uses a set of
256     tables for ASCII encoding that is part of the distribution. If you specify
258     --enable-rebuild-chartables
260     a program called dftables is compiled and run in the default C locale when
261     you obey "make". It builds a source file called pcre_chartables.c. If you do
262     not specify this option, pcre_chartables.c is created as a copy of
263     pcre_chartables.c.dist. See "Character tables" below for further information.
265     . It is possible to compile PCRE for use on systems that use EBCDIC as their
266 ph10 391 character code (as opposed to ASCII) by specifying
267 ph10 128
268     --enable-ebcdic
270 ph10 392 This automatically implies --enable-rebuild-chartables (see above). However,
271     when PCRE is built this way, it always operates in EBCDIC. It cannot support
272 ph10 391 both EBCDIC and UTF-8.
273 ph10 128
274 ph10 289 . It is possible to compile pcregrep to use libz and/or libbz2, in order to
275 ph10 287 read .gz and .bz2 files (respectively), by specifying one or both of
276 ph10 286
277     --enable-pcregrep-libz
278     --enable-pcregrep-libbz2
279 ph10 289
280 ph10 287 Of course, the relevant libraries must be installed on your system.
281 ph10 289
282     . It is possible to compile pcretest so that it links with the libreadline
283 ph10 287 library, by specifying
284 ph10 289
285     --enable-pcretest-libreadline
287 ph10 287 If this is done, when pcretest's input is from a terminal, it reads it using
288     the readline() function. This provides line-editing and history facilities.
289     Note that libreadline is GPL-licenced, so if you distribute a binary of
290     pcretest linked in this way, there may be licensing issues.
291 ph10 345
292     Setting this option causes the -lreadline option to be added to the pcretest
293 ph10 338 build. In many operating environments with a sytem-installed readline
294     library this is sufficient. However, in some environments (e.g. if an
295     unmodified distribution version of readline is in use), it may be necessary
296     to specify something like LIBS="-lncurses" as well. This is because, to quote
297     the readline INSTALL, "Readline uses the termcap functions, but does not link
298     with the termcap or curses library itself, allowing applications which link
299 ph10 392 with readline the to choose an appropriate library." If you get error
300     messages about missing functions tgetstr, tgetent, tputs, tgetflag, or tgoto,
301     this is the problem, and linking with the ncurses library should fix it.
302 ph10 286
303 ph10 109 The "configure" script builds the following files for the basic C library:
304 nigel 49
305 nigel 77 . Makefile is the makefile that builds the library
306     . config.h contains build-time configuration options for the library
307 ph10 109 . pcre.h is the public PCRE header file
308 nigel 77 . pcre-config is a script that shows the settings of "configure" options
309     . libpcre.pc is data for the pkg-config command
310 nigel 75 . libtool is a script that builds shared and/or static libraries
311 ph10 109 . RunTest is a script for running tests on the basic C library
312 nigel 77 . RunGrepTest is a script for running tests on the pcregrep command
313 nigel 41
314 ph10 489 Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
315     names config.h.generic and pcre.h.generic. These are provided for those who
316     have to built PCRE without using "configure" or CMake. If you use "configure"
317     or CMake, the .generic versions are not used.
318 nigel 77
319 ph10 109 If a C++ compiler is found, the following files are also built:
321     . libpcrecpp.pc is data for the pkg-config command
322     . pcrecpparg.h is a header file for programs that call PCRE via the C++ wrapper
323 nigel 77 . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
325     The "configure" script also creates config.status, which is an executable
326     script that can be run to recreate the configuration, and config.log, which
327     contains compiler output from tests that "configure" runs.
329     Once "configure" has run, you can run "make". It builds two libraries, called
330 ph10 312 libpcre and libpcreposix, a test program called pcretest, and the pcregrep
331     command. If a C++ compiler was found on your system, "make" also builds the C++
332     wrapper library, which is called libpcrecpp, and some test programs called
333     pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
334     Building the C++ wrapper can be disabled by adding --disable-cpp to the
335     "configure" command.
336 nigel 3
337 ph10 109 The command "make check" runs all the appropriate tests. Details of the PCRE
338     tests are given below in a separate section of this document.
339 nigel 75
340 ph10 109 You can use "make install" to install PCRE into live directories on your
341     system. The following are installed (file names are all relative to the
342     <prefix> that is set when "configure" is run):
343 nigel 77
344 ph10 109 Commands (bin):
345     pcretest
346     pcregrep
347 ph10 111 pcre-config
348 ph10 109
349     Libraries (lib):
350     libpcre
351     libpcreposix
352     libpcrecpp (if C++ support is enabled)
354     Configuration information (lib/pkgconfig):
355     libpcre.pc
356 ph10 122 libpcrecpp.pc (if C++ support is enabled)
357 ph10 109
358     Header files (include):
359     pcre.h
360     pcreposix.h
361     pcre_scanner.h )
362     pcre_stringpiece.h ) if C++ support is enabled
363     pcrecpp.h )
364     pcrecpparg.h )
366     Man pages (share/man/man{1,3}):
367     pcregrep.1
368     pcretest.1
369     pcre.3
370     pcre*.3 (lots more pages, all starting "pcre")
372     HTML documentation (share/doc/pcre/html):
373     index.html
374     *.html (lots more pages, hyperlinked from index.html)
376     Text file documentation (share/doc/pcre):
377     AUTHORS
378     COPYING
379     ChangeLog
380     LICENCE
381     NEWS
382     README
383     pcre.txt (a concatenation of the man(3) pages)
384     pcretest.txt the pcretest man page
385     pcregrep.txt the pcregrep man page
387 nigel 77 If you want to remove PCRE from your system, you can run "make uninstall".
388     This removes all the files that "make install" installed. However, it does not
389     remove any directories, because these are often shared with other programs.
392 ph10 122 Retrieving configuration information on Unix-like systems
393     ---------------------------------------------------------
394 nigel 75
395 ph10 109 Running "make install" installs the command pcre-config, which can be used to
396     recall information about the PCRE configuration and installation. For example:
397 nigel 37
398 nigel 43 pcre-config --version
400     prints the version number, and
402 nigel 75 pcre-config --libs
403 nigel 43
404     outputs information about where the library is installed. This command can be
405     included in makefiles for programs that use PCRE, saving the programmer from
406     having to remember too many details.
408 nigel 75 The pkg-config command is another system for saving and retrieving information
409     about installed libraries. Instead of separate commands for each library, a
410     single command is used. For example:
411 nigel 43
412 nigel 75 pkg-config --cflags pcre
414     The data is held in *.pc files that are installed in a directory called
415 ph10 109 <prefix>/lib/pkgconfig.
416 nigel 75
418 nigel 63 Shared libraries on Unix-like systems
419     -------------------------------------
420 nigel 53
421 nigel 77 The default distribution builds PCRE as shared libraries and static libraries,
422     as long as the operating system supports shared libraries. Shared library
423     support relies on the "libtool" script which is built as part of the
424 nigel 53 "configure" process.
425 nigel 39
426 nigel 53 The libtool script is used to compile and link both shared and static
427     libraries. They are placed in a subdirectory called .libs when they are newly
428     built. The programs pcretest and pcregrep are built to use these uninstalled
429     libraries (by means of wrapper scripts in the case of shared libraries). When
430     you use "make install" to install shared libraries, pcregrep and pcretest are
431     automatically re-built to use the newly installed shared libraries before being
432 ph10 122 installed themselves. However, the versions left in the build directory still
433 nigel 53 use the uninstalled libraries.
435     To build PCRE using static libraries only you must use --disable-shared when
436 nigel 75 configuring it. For example:
437 nigel 3
438 nigel 43 ./configure --prefix=/usr/gnu --disable-shared
439 nigel 41
440 nigel 53 Then run "make" in the usual way. Similarly, you can use --disable-static to
441     build only shared libraries.
442 nigel 41
443 nigel 43
444 ph10 122 Cross-compiling on Unix-like systems
445     ------------------------------------
446 nigel 63
447     You can specify CC and CFLAGS in the normal way to the "configure" command, in
448 ph10 128 order to cross-compile PCRE for some other host. However, you should NOT
449     specify --enable-rebuild-chartables, because if you do, the dftables.c source
450     file is compiled and run on the local host, in order to generate the inbuilt
451     character tables (the pcre_chartables.c file). This will probably not work,
452     because dftables.c needs to be compiled with the local compiler, not the cross
453     compiler.
454 nigel 63
455 ph10 128 When --enable-rebuild-chartables is not specified, pcre_chartables.c is created
456     by making a copy of pcre_chartables.c.dist, which is a default set of tables
457     that assumes ASCII code. Cross-compiling with the default tables should not be
458     a problem.
459 nigel 63
460 ph10 128 If you need to modify the character tables when cross-compiling, you should
461     move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and
462     run it on the local host to make a new version of pcre_chartables.c.dist.
463     Then when you cross-compile PCRE this new version of the tables will be used.
466 nigel 87 Using HP's ANSI C++ compiler (aCC)
467     ----------------------------------
469 nigel 93 Unless C++ support is disabled by specifying the "--disable-cpp" option of the
470 ph10 122 "configure" script, you must include the "-AA" option in the CXXFLAGS
471 nigel 87 environment variable in order for the C++ components to compile correctly.
473     Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
474     needed libraries fail to get included when specifying the "-AA" compiler
475     option. If you experience unresolved symbols when linking the C++ programs,
476     use the workaround of specifying the following environment variable prior to
477     running the "configure" script:
479     CXXLDFLAGS="-lstd_v2 -lCsup_v2"
481 ph10 461
482 ph10 469 Using Sun's compilers for Solaris
483     ---------------------------------
485     A user reports that the following configurations work on Solaris 9 sparcv9 and
486     Solaris 9 x86 (32-bit):
488     Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
489     Solaris 9 x86: ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
492 ph10 452 Using PCRE from MySQL
493     ---------------------
494 ph10 123
495 ph10 461 On systems where both PCRE and MySQL are installed, it is possible to make use
496     of PCRE from within MySQL, as an alternative to the built-in pattern matching.
497 ph10 452 There is a web page that tells you how to do this:
499 ph10 461 http://www.mysqludf.org/lib_mysqludf_preg/index.php
500 ph10 452
502 ph10 111 Making new tarballs
503     -------------------
504 nigel 87
505 ph10 123 The command "make dist" creates three PCRE tarballs, in tar.gz, tar.bz2, and
506 ph10 155 zip formats. The command "make distcheck" does the same, but then does a trial
507     build of the new distribution to ensure that it works.
508 ph10 111
509 ph10 155 If you have modified any of the man page sources in the doc directory, you
510     should first run the PrepareRelease script before making a distribution. This
511     script creates the .txt and HTML forms of the documentation from the man pages.
512 ph10 111
513 ph10 155
514 nigel 41 Testing PCRE
515     ------------
517 ph10 122 To test the basic PCRE library on a Unix system, run the RunTest script that is
518     created by the configuring process. There is also a script called RunGrepTest
519     that tests the options of the pcregrep command. If the C++ wrapper library is
520     built, three test programs called pcrecpp_unittest, pcre_scanner_unittest, and
521 ph10 109 pcre_stringpiece_unittest are also built.
522 nigel 41
523 ph10 109 Both the scripts and all the program tests are run if you obey "make check" or
524     "make test". For other systems, see the instructions in NON-UNIX-USE.
525 nigel 77
526     The RunTest script runs the pcretest test program (which is documented in its
527 ph10 122 own man page) on each of the testinput files in the testdata directory in
528 nigel 77 turn, and compares the output with the contents of the corresponding testoutput
529 nigel 93 files. A file called testtry is used to hold the main output from pcretest
530 nigel 75 (testsavedregex is also used as a working file). To run pcretest on just one of
531     the test files, give its number as an argument to RunTest, for example:
532 nigel 41
533 nigel 63 RunTest 2
534 nigel 3
535 ph10 109 The first test file can also be fed directly into the perltest.pl script to
536     check that Perl gives the same results. The only difference you should see is
537     in the first few lines, where the Perl version is given instead of the PCRE
538     version.
539 nigel 3
540 nigel 49 The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
541     pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
542     detection, and run-time flags that are specific to PCRE, as well as the POSIX
543 ph10 122 wrapper API. It also uses the debugging flags to check some of the internals of
544 nigel 49 pcre_compile().
545 nigel 7
546 nigel 49 If you build PCRE with a locale setting that is not the standard C locale, the
547     character tables may be different (see next paragraph). In some cases, this may
548     cause failures in the second set of tests. For example, in a locale where the
549     isprint() function yields TRUE for characters in the range 128-255, the use of
550     [:isascii:] inside a character class defines a different set of characters, and
551     this shows up in this test as a difference in the compiled code, which is being
552     listed for checking. Where the comparison test output contains [\x00-\x7f] the
553     test will contain [\x00-\xff], and similarly in some other cases. This is not a
554     bug in PCRE.
556 nigel 63 The third set of tests checks pcre_maketables(), the facility for building a
557 nigel 25 set of character tables for a specific locale and using them instead of the
558 nigel 73 default tables. The tests make use of the "fr_FR" (French) locale. Before
559     running the test, the script checks for the presence of this locale by running
560     the "locale" command. If that command fails, or if it doesn't include "fr_FR"
561     in the list of available locales, the third test cannot be run, and a comment
562     is output to say why. If running this test produces instances of the error
563 nigel 25
564 nigel 73 ** Failed to set locale "fr_FR"
565 nigel 25
566     in the comparison output, it means that locale is not available on your system,
567     despite being listed by "locale". This does not mean that PCRE is broken.
569 ph10 142 [If you are trying to run this test on Windows, you may be able to get it to
570 ph10 260 work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
571     RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
572     Windows versions of test 2. More info on using RunTest.bat is included in the
573     document entitled NON-UNIX-USE.]
574 ph10 139
575 nigel 63 The fourth test checks the UTF-8 support. It is not run automatically unless
576     PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
577 ph10 461 running "configure". This file can be also fed directly to the perltest.pl
578     script, provided you are running Perl 5.8 or higher.
579 nigel 3
580 nigel 75 The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
581     features of PCRE that are not relevant to Perl.
582 nigel 3
583 ph10 461 The sixth test (which is Perl-5.10 compatible) checks the support for Unicode
584     character properties. It it not run automatically unless PCRE is built with
585     Unicode property support. To to this you must set --enable-unicode-properties
586     when running "configure".
587 nigel 63
588 nigel 77 The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
589     matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
590     property support, respectively. The eighth and ninth tests are not run
591     automatically unless PCRE is build with the relevant support.
592 nigel 75
593 ph10 461 The tenth test checks some internal offsets and code size features; it is run
594     only when the default "link size" of 2 is set (in other cases the sizes
595     change).
596 nigel 77
597 ph10 461 The eleventh test checks out features that are new in Perl 5.10, and the
598     twelfth test checks a number internals and non-Perl features concerned with
599     Unicode property support. It it not run automatically unless PCRE is built with
600     Unicode property support. To to this you must set --enable-unicode-properties
601     when running "configure".
604 nigel 3 Character tables
605     ----------------
607 ph10 122 For speed, PCRE uses four tables for manipulating and identifying characters
608     whose code point values are less than 256. The final argument of the
609     pcre_compile() function is a pointer to a block of memory containing the
610     concatenated tables. A call to pcre_maketables() can be used to generate a set
611     of tables in the current locale. If the final argument for pcre_compile() is
612     passed as NULL, a set of default tables that is built into the binary is used.
613 nigel 3
614 ph10 128 The source file called pcre_chartables.c contains the default set of tables. By
615     default, this is created as a copy of pcre_chartables.c.dist, which contains
616     tables for ASCII coding. However, if --enable-rebuild-chartables is specified
617     for ./configure, a different version of pcre_chartables.c is built by the
618     program dftables (compiled from dftables.c), which uses the ANSI C character
619     handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to
620     build the table sources. This means that the default C locale which is set for
621     your system will control the contents of these default tables. You can change
622     the default tables by editing pcre_chartables.c and then re-building PCRE. If
623     you do this, you should take care to ensure that the file does not get
624     automatically re-generated. The best way to do this is to move
625     pcre_chartables.c.dist out of the way and replace it with your customized
626     tables.
627 nigel 3
628 ph10 128 When the dftables program is run as a result of --enable-rebuild-chartables,
629     it uses the default C locale that is set on your system. It does not pay
630     attention to the LC_xxx environment variables. In other words, it uses the
631     system's default locale rather than whatever the compiling user happens to have
632     set. If you really do want to build a source set of character tables in a
633     locale that is specified by the LC_xxx variables, you can run the dftables
634     program by hand with the -L option. For example:
636     ./dftables -L pcre_chartables.c.special
638 nigel 25 The first two 256-byte tables provide lower casing and case flipping functions,
639     respectively. The next table consists of three 32-byte bit maps which identify
640     digits, "word" characters, and white space, respectively. These are used when
641 ph10 111 building 32-byte bit maps that represent character classes for code points less
642 ph10 109 than 256.
643 nigel 25
644     The final 256-byte table has bits indicating various character types, as
645 nigel 3 follows:
647     1 white space character
648     2 letter
649     4 decimal digit
650     8 hexadecimal digit
651     16 alphanumeric or '_'
652     128 regular expression metacharacter or binary zero
654     You should not alter the set of characters that contain the 128 bit, as that
655     will cause PCRE to malfunction.
658 ph10 109 File manifest
659     -------------
660 nigel 3
661 nigel 41 The distribution should contain the following files:
662 nigel 3
663 ph10 109 (A) Source files of the PCRE library functions and their headers:
664 nigel 3
665 ph10 128 dftables.c auxiliary program for building pcre_chartables.c
666     when --enable-rebuild-chartables is specified
667 ph10 111
668 ph10 128 pcre_chartables.c.dist a default set of character tables that assume ASCII
669     coding; used, unless --enable-rebuild-chartables is
670     specified, by copying to pcre_chartables.c
671 ph10 111
672 ph10 128 pcreposix.c )
673     pcre_compile.c )
674     pcre_config.c )
675     pcre_dfa_exec.c )
676     pcre_exec.c )
677     pcre_fullinfo.c )
678     pcre_get.c ) sources for the functions in the library,
679     pcre_globals.c ) and some internal functions that they use
680     pcre_info.c )
681     pcre_maketables.c )
682     pcre_newline.c )
683     pcre_ord2utf8.c )
684     pcre_refcount.c )
685     pcre_study.c )
686     pcre_tables.c )
687     pcre_try_flipped.c )
688 ph10 374 pcre_ucd.c )
689 ph10 128 pcre_valid_utf8.c )
690     pcre_version.c )
691     pcre_xclass.c )
692     pcre_printint.src ) debugging function that is #included in pcretest,
693     ) and can also be #included in pcre_compile()
694     pcre.h.in template for pcre.h when built by "configure"
695     pcreposix.h header for the external POSIX wrapper API
696     pcre_internal.h header for internal use
697 ph10 374 ucp.h header for Unicode property handling
698 ph10 111
699 ph10 128 config.h.in template for config.h, which is built by "configure"
700 ph10 111
701 ph10 128 pcrecpp.h public header file for the C++ wrapper
702     pcrecpparg.h.in template for another C++ header file
703     pcre_scanner.h public header file for C++ scanner functions
704     pcrecpp.cc )
705     pcre_scanner.cc ) source for the C++ wrapper library
706 ph10 111
707 ph10 128 pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the
708     C++ stringpiece functions
709     pcre_stringpiece.cc source for the C++ stringpiece functions
711 ph10 109 (B) Source files for programs that use PCRE:
712 nigel 75
713 ph10 128 pcredemo.c simple demonstration of coding calls to PCRE
714     pcregrep.c source of a grep utility that uses PCRE
715     pcretest.c comprehensive test program
716 ph10 111
717     (C) Auxiliary files:
719 ph10 128 132html script to turn "man" pages into HTML
720     AUTHORS information about the author of PCRE
721     ChangeLog log of changes to the code
722     CleanTxt script to clean nroff output for txt man pages
723     Detrail script to remove trailing spaces
724     HACKING some notes about the internals of PCRE
725     INSTALL generic installation instructions
726     LICENCE conditions for the use of PCRE
727     COPYING the same, using GNU's standard name
728     Makefile.in ) template for Unix Makefile, which is built by
729     ) "configure"
730     Makefile.am ) the automake input that was used to create
731     ) Makefile.in
732     NEWS important changes in this release
733     NON-UNIX-USE notes on building PCRE on non-Unix systems
734     PrepareRelease script to make preparations for "make dist"
735     README this file
736 ph10 138 RunTest a Unix shell script for running tests
737     RunGrepTest a Unix shell script for pcregrep tests
738 ph10 128 aclocal.m4 m4 macros (generated by "aclocal")
739     config.guess ) files used by libtool,
740     config.sub ) used only when building a shared library
741     configure a configuring shell script (built by autoconf)
742     configure.ac ) the autoconf input that was used to build
743     ) "configure" and config.h
744     depcomp ) script to find program dependencies, generated by
745     ) automake
746 ph10 429 doc/*.3 man page sources for PCRE
747 ph10 128 doc/*.1 man page sources for pcregrep and pcretest
748     doc/index.html.src the base HTML page
749     doc/html/* HTML documentation
750     doc/pcre.txt plain text version of the man pages
751     doc/pcretest.txt plain text documentation of test program
752     doc/perltest.txt plain text documentation of Perl test program
753     install-sh a shell script for installing files
754     libpcre.pc.in template for libpcre.pc for pkg-config
755 ph10 461 libpcreposix.pc.in template for libpcreposix.pc for pkg-config
756 ph10 128 libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
757     ltmain.sh file used to build a libtool script
758     missing ) common stub for a few missing GNU programs while
759     ) installing, generated by automake
760     mkinstalldirs script for making install directories
761     perltest.pl Perl test program
762     pcre-config.in source of script which retains PCRE information
763 ph10 111 pcrecpp_unittest.cc )
764     pcre_scanner_unittest.cc ) test programs for the C++ wrapper
765     pcre_stringpiece_unittest.cc )
766 ph10 128 testdata/testinput* test data for main library tests
767     testdata/testoutput* expected test results
768     testdata/grep* input and output for pcregrep tests
769 ph10 111
770 ph10 109 (D) Auxiliary files for cmake support
771 nigel 3
772 ph10 374 cmake/COPYING-CMAKE-SCRIPTS
773     cmake/FindPackageHandleStandardArgs.cmake
774     cmake/FindReadline.cmake
775 ph10 109 CMakeLists.txt
776 ph10 111 config-cmake.h.in
777 nigel 29
778 ph10 109 (E) Auxiliary files for VPASCAL
779 nigel 29
780 nigel 63 makevp.bat
781 ph10 135 makevp_c.txt
782     makevp_l.txt
783 ph10 111 pcregexp.pas
785     (F) Auxiliary files for building PCRE "by hand"
787 ph10 128 pcre.h.generic ) a version of the public PCRE header file
788     ) for use in non-"configure" environments
789     config.h.generic ) a version of config.h for use in non-"configure"
790     ) environments
791 ph10 111
792 ph10 109 (F) Miscellaneous
793 nigel 63
794 ph10 109 RunTest.bat a script for running tests under Windows
796 nigel 77 Philip Hazel
797     Email local part: ph10
798     Email domain: cam.ac.uk
799 ph10 489 Last updated: 19 January 2010


Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

ViewVC Help
Powered by ViewVC 1.1.12