ViewVC logotype

Contents of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log

Revision 87 - (hide annotations) (download)
Sat Feb 24 21:41:21 2007 UTC (8 years, 1 month ago) by nigel
File size: 23456 byte(s)
Load pcre-6.5 into code/trunk.

1 nigel 41 README file for PCRE (Perl-compatible regular expression library)
2     -----------------------------------------------------------------
3 nigel 3
4 nigel 43 The latest release of PCRE is always available from
6     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
8 nigel 41 Please read the NEWS file if you are upgrading from a previous release.
9 nigel 23
10 nigel 35
11 nigel 77 The PCRE APIs
12     -------------
14     PCRE is written in C, and it has its own API. The distribution now includes a
15     set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
16     for details).
18     Also included are a set of C wrapper functions that are based on the POSIX
19     API. These end up in the library called libpcreposix. Note that this just
20     provides a POSIX calling interface to PCRE: the regular expressions themselves
21     still follow Perl syntax and semantics. The header file for the POSIX-style
22     functions is called pcreposix.h. The official POSIX name is regex.h, but I
23     didn't want to risk possible problems with existing files of that name by
24     distributing it that way. To use it with an existing program that uses the
25     POSIX API, it will have to be renamed or pointed at by a link.
27 nigel 73 If you are using the POSIX interface to PCRE and there is already a POSIX regex
28     library installed on your system, you must take care when linking programs to
29     ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
30     up the "real" POSIX functions of the same name.
31 nigel 49
32 nigel 73
33 nigel 75 Documentation for PCRE
34     ----------------------
36     If you install PCRE in the normal way, you will end up with an installed set of
37     man pages whose names all start with "pcre". The one that is called "pcre"
38     lists all the others. In addition to these man pages, the PCRE documentation is
39     supplied in two other forms; however, as there is no standard place to install
40     them, they are left in the doc directory of the unpacked source distribution.
41     These forms are:
43     1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
44     first of these is a concatenation of the text forms of all the section 3
45     man pages except those that summarize individual functions. The other two
46     are the text forms of the section 1 man pages for the pcregrep and
47     pcretest commands. Text forms are provided for ease of scanning with text
48     editors or similar tools.
50     2. A subdirectory called doc/html contains all the documentation in HTML
51     form, hyperlinked in various ways, and rooted in a file called
52     doc/index.html.
55 nigel 53 Contributions by users of PCRE
56     ------------------------------
58     You can find contributions from PCRE users in the directory
60     ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
62     where there is also a README file giving brief descriptions of what they are.
63     Several of them provide support for compiling PCRE on various flavours of
64     Windows systems (I myself do not use Windows). Some are complete in themselves;
65     others are pointers to URLs containing relevant files.
68 nigel 63 Building PCRE on a Unix-like system
69     -----------------------------------
70 nigel 3
71 nigel 87 If you are using HP's ANSI C++ compiler (aCC), please see the special note
72     in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
74 nigel 63 To build PCRE on a Unix-like system, first run the "configure" command from the
75     PCRE distribution directory, with your current directory set to the directory
76     where you want the files to be created. This command is a standard GNU
77     "autoconf" configuration script, for which generic instructions are supplied in
78     INSTALL.
79 nigel 3
80 nigel 53 Most commonly, people build PCRE within its own distribution directory, and in
81     this case, on many systems, just running "./configure" is sufficient, but the
82 nigel 75 usual methods of changing standard defaults are available. For example:
83 nigel 53
84 nigel 41 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
86     specifies that the C compiler should be run with the flags '-O2 -Wall' instead
87     of the default, and that "make install" should install PCRE under /opt/local
88 nigel 49 instead of the default /usr/local.
89 nigel 41
90 nigel 53 If you want to build in a different directory, just run "configure" with that
91     directory as current. For example, suppose you have unpacked the PCRE source
92     into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
94     cd /build/pcre/pcre-xxx
95     /source/pcre/pcre-xxx/configure
97 nigel 87 PCRE is written in C and is normally compiled as a C library. However, it is
98     possible to build it as a C++ library, though the provided building apparatus
99     does not have any features to support this.
101 nigel 63 There are some optional features that can be included or omitted from the PCRE
102     library. You can read more about them in the pcrebuild man page.
103 nigel 49
104 nigel 83 . If you want to suppress the building of the C++ wrapper library, you can add
105     --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
106     will try to find a C++ compiler and C++ header files, and if it succeeds, it
107     will try to build the C++ wrapper.
109 nigel 63 . If you want to make use of the support for UTF-8 character strings in PCRE,
110     you must add --enable-utf8 to the "configure" command. Without it, the code
111     for handling UTF-8 is not included in the library. (Even when included, it
112     still has to be enabled by an option at run time.)
114 nigel 75 . If, in addition to support for UTF-8 character strings, you want to include
115     support for the \P, \p, and \X sequences that recognize Unicode character
116     properties, you must add --enable-unicode-properties to the "configure"
117     command. This adds about 90K to the size of the library (in the form of a
118     property table); only the basic two-letter properties such as Lu are
119     supported.
121 nigel 87 . You can build PCRE to recognize either CR or LF as the newline character,
122     instead of whatever your compiler uses for "\n", by adding --newline-is-cr or
123     --newline-is-lf to the "configure" command, respectively. Only do this if you
124 nigel 63 really understand what you are doing. On traditional Unix-like systems, the
125 nigel 87 newline character is LF.
126 nigel 63
127     . When called via the POSIX interface, PCRE uses malloc() to get additional
128     storage for processing capturing parentheses if there are more than 10 of
129     them. You can increase this threshold by setting, for example,
131     --with-posix-malloc-threshold=20
133     on the "configure" command.
135 nigel 77 . PCRE has a counter that can be set to limit the amount of resources it uses.
136 nigel 63 If the limit is exceeded during a match, the match fails. The default is ten
137     million. You can change the default by setting, for example,
139     --with-match-limit=500000
141     on the "configure" command. This is just the default; individual calls to
142     pcre_exec() can supply their own value. There is discussion on the pcreapi
143     man page.
145     . The default maximum compiled pattern size is around 64K. You can increase
146     this by adding --with-link-size=3 to the "configure" command. You can
147     increase it even more by setting --with-link-size=4, but this is unlikely
148     ever to be necessary. If you build PCRE with an increased link size, test 2
149     (and 5 if you are using UTF-8) will fail. Part of the output of these tests
150     is a representation of the compiled pattern, and this changes with the link
151     size.
153 nigel 77 . You can build PCRE so that its internal match() function that is called from
154     pcre_exec() does not call itself recursively. Instead, it uses blocks of data
155     from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
156     to save data that would otherwise be saved on the stack. To build PCRE like
157     this, use
158 nigel 73
159     --disable-stack-for-recursion
161     on the "configure" command. PCRE runs more slowly in this mode, but it may be
162 nigel 77 necessary in environments with limited stack sizes. This applies only to the
163     pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
164     use deeply nested recursion.
165 nigel 73
166 nigel 77 The "configure" script builds eight files for the basic C library:
167 nigel 49
168 nigel 77 . pcre.h is the header file for C programs that call PCRE
169     . Makefile is the makefile that builds the library
170     . config.h contains build-time configuration options for the library
171     . pcre-config is a script that shows the settings of "configure" options
172     . libpcre.pc is data for the pkg-config command
173 nigel 75 . libtool is a script that builds shared and/or static libraries
174 nigel 77 . RunTest is a script for running tests on the library
175     . RunGrepTest is a script for running tests on the pcregrep command
176 nigel 41
177 nigel 77 In addition, if a C++ compiler is found, the following are also built:
179     . pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
180     . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
182     The "configure" script also creates config.status, which is an executable
183     script that can be run to recreate the configuration, and config.log, which
184     contains compiler output from tests that "configure" runs.
186     Once "configure" has run, you can run "make". It builds two libraries, called
187 nigel 49 libpcre and libpcreposix, a test program called pcretest, and the pcregrep
188 nigel 77 command. If a C++ compiler was found on your system, it also builds the C++
189     wrapper library, which is called libpcrecpp, and some test programs called
190     pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
191 nigel 3
192 nigel 77 The command "make test" runs all the appropriate tests. Details of the PCRE
193     tests are given in a separate section of this document, below.
194 nigel 75
195 nigel 77 You can use "make install" to copy the libraries, the public header files
196     pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
197     the C++ wrapper was built), and the man pages to appropriate live directories
198     on your system, in the normal way.
200     If you want to remove PCRE from your system, you can run "make uninstall".
201     This removes all the files that "make install" installed. However, it does not
202     remove any directories, because these are often shared with other programs.
205 nigel 75 Retrieving configuration information on Unix-like systems
206     ---------------------------------------------------------
208 nigel 43 Running "make install" also installs the command pcre-config, which can be used
209     to recall information about the PCRE configuration and installation. For
210 nigel 75 example:
211 nigel 37
212 nigel 43 pcre-config --version
214     prints the version number, and
216 nigel 75 pcre-config --libs
217 nigel 43
218     outputs information about where the library is installed. This command can be
219     included in makefiles for programs that use PCRE, saving the programmer from
220     having to remember too many details.
222 nigel 75 The pkg-config command is another system for saving and retrieving information
223     about installed libraries. Instead of separate commands for each library, a
224     single command is used. For example:
225 nigel 43
226 nigel 75 pkg-config --cflags pcre
228     The data is held in *.pc files that are installed in a directory called
229     pkgconfig.
232 nigel 63 Shared libraries on Unix-like systems
233     -------------------------------------
234 nigel 53
235 nigel 77 The default distribution builds PCRE as shared libraries and static libraries,
236     as long as the operating system supports shared libraries. Shared library
237     support relies on the "libtool" script which is built as part of the
238 nigel 53 "configure" process.
239 nigel 39
240 nigel 53 The libtool script is used to compile and link both shared and static
241     libraries. They are placed in a subdirectory called .libs when they are newly
242     built. The programs pcretest and pcregrep are built to use these uninstalled
243     libraries (by means of wrapper scripts in the case of shared libraries). When
244     you use "make install" to install shared libraries, pcregrep and pcretest are
245     automatically re-built to use the newly installed shared libraries before being
246     installed themselves. However, the versions left in the source directory still
247     use the uninstalled libraries.
249     To build PCRE using static libraries only you must use --disable-shared when
250 nigel 75 configuring it. For example:
251 nigel 3
252 nigel 43 ./configure --prefix=/usr/gnu --disable-shared
253 nigel 41
254 nigel 53 Then run "make" in the usual way. Similarly, you can use --disable-static to
255     build only shared libraries.
256 nigel 41
257 nigel 43
258 nigel 63 Cross-compiling on a Unix-like system
259     -------------------------------------
261     You can specify CC and CFLAGS in the normal way to the "configure" command, in
262     order to cross-compile PCRE for some other host. However, during the building
263     process, the dftables.c source file is compiled *and run* on the local host, in
264     order to generate the default character tables (the chartables.c file). It
265     therefore needs to be compiled with the local compiler, not the cross compiler.
266 nigel 77 You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
267     there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
268 nigel 73 when calling the "configure" command. If they are not specified, they default
269     to the values of CC and CFLAGS.
270 nigel 63
272 nigel 87 Using HP's ANSI C++ compiler (aCC)
273     ----------------------------------
275     Unless C++ support is disabled by specifiying the "--disable-cpp" option of the
276     "configure" script, you *must* include the "-AA" option in the CXXFLAGS
277     environment variable in order for the C++ components to compile correctly.
279     Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
280     needed libraries fail to get included when specifying the "-AA" compiler
281     option. If you experience unresolved symbols when linking the C++ programs,
282     use the workaround of specifying the following environment variable prior to
283     running the "configure" script:
285     CXXLDFLAGS="-lstd_v2 -lCsup_v2"
288 nigel 41 Building on non-Unix systems
289     ----------------------------
291 nigel 73 For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
292     the system supports the use of "configure" and "make" you may be able to build
293     PCRE in the same way as for Unix systems.
295     PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
296     the details because I don't use those systems. It should be straightforward to
297 nigel 41 build PCRE on any system that has a Standard C compiler, because it uses only
298     Standard C functions.
301     Testing PCRE
302     ------------
304 nigel 53 To test PCRE on a Unix system, run the RunTest script that is created by the
305 nigel 77 configuring process. There is also a script called RunGrepTest that tests the
306     options of the pcregrep command. If the C++ wrapper library is build, three
307     test programs called pcrecpp_unittest, pcre_scanner_unittest, and
308     pcre_stringpiece_unittest are provided.
309 nigel 41
310 nigel 77 Both the scripts and all the program tests are run if you obey "make runtest",
311     "make check", or "make test". For other systems, see the instructions in
312     NON-UNIX-USE.
314     The RunTest script runs the pcretest test program (which is documented in its
315     own man page) on each of the testinput files (in the testdata directory) in
316     turn, and compares the output with the contents of the corresponding testoutput
317     file. A file called testtry is used to hold the main output from pcretest
318 nigel 75 (testsavedregex is also used as a working file). To run pcretest on just one of
319     the test files, give its number as an argument to RunTest, for example:
320 nigel 41
321 nigel 63 RunTest 2
322 nigel 3
323 nigel 63 The first file can also be fed directly into the perltest script to check that
324     Perl gives the same results. The only difference you should see is in the first
325     few lines, where the Perl version is given instead of the PCRE version.
326 nigel 3
327 nigel 49 The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
328     pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
329     detection, and run-time flags that are specific to PCRE, as well as the POSIX
330     wrapper API. It also uses the debugging flag to check some of the internals of
331     pcre_compile().
332 nigel 7
333 nigel 49 If you build PCRE with a locale setting that is not the standard C locale, the
334     character tables may be different (see next paragraph). In some cases, this may
335     cause failures in the second set of tests. For example, in a locale where the
336     isprint() function yields TRUE for characters in the range 128-255, the use of
337     [:isascii:] inside a character class defines a different set of characters, and
338     this shows up in this test as a difference in the compiled code, which is being
339     listed for checking. Where the comparison test output contains [\x00-\x7f] the
340     test will contain [\x00-\xff], and similarly in some other cases. This is not a
341     bug in PCRE.
343 nigel 63 The third set of tests checks pcre_maketables(), the facility for building a
344 nigel 25 set of character tables for a specific locale and using them instead of the
345 nigel 73 default tables. The tests make use of the "fr_FR" (French) locale. Before
346     running the test, the script checks for the presence of this locale by running
347     the "locale" command. If that command fails, or if it doesn't include "fr_FR"
348     in the list of available locales, the third test cannot be run, and a comment
349     is output to say why. If running this test produces instances of the error
350 nigel 25
351 nigel 73 ** Failed to set locale "fr_FR"
352 nigel 25
353     in the comparison output, it means that locale is not available on your system,
354     despite being listed by "locale". This does not mean that PCRE is broken.
356 nigel 63 The fourth test checks the UTF-8 support. It is not run automatically unless
357     PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
358     running "configure". This file can be also fed directly to the perltest script,
359     provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
360     commented in the script, can be be used.)
361 nigel 3
362 nigel 75 The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
363     features of PCRE that are not relevant to Perl.
364 nigel 3
365 nigel 77 The sixth and test checks the support for Unicode character properties. It it
366     not run automatically unless PCRE is built with Unicode property support. To to
367     this you must set --enable-unicode-properties when running "configure".
368 nigel 63
369 nigel 77 The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
370     matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
371     property support, respectively. The eighth and ninth tests are not run
372     automatically unless PCRE is build with the relevant support.
373 nigel 75
374 nigel 77
375 nigel 3 Character tables
376     ----------------
378 nigel 75 PCRE uses four tables for manipulating and identifying characters whose values
379     are less than 256. The final argument of the pcre_compile() function is a
380     pointer to a block of memory containing the concatenated tables. A call to
381     pcre_maketables() can be used to generate a set of tables in the current
382     locale. If the final argument for pcre_compile() is passed as NULL, a set of
383     default tables that is built into the binary is used.
384 nigel 3
385 nigel 25 The source file called chartables.c contains the default set of tables. This is
386 nigel 27 not supplied in the distribution, but is built by the program dftables
387     (compiled from dftables.c), which uses the ANSI C character handling functions
388 nigel 25 such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
389 nigel 35 sources. This means that the default C locale which is set for your system will
390     control the contents of these default tables. You can change the default tables
391     by editing chartables.c and then re-building PCRE. If you do this, you should
392     probably also edit Makefile to ensure that the file doesn't ever get
393     re-generated.
394 nigel 3
395 nigel 25 The first two 256-byte tables provide lower casing and case flipping functions,
396     respectively. The next table consists of three 32-byte bit maps which identify
397     digits, "word" characters, and white space, respectively. These are used when
398     building 32-byte bit maps that represent character classes.
400     The final 256-byte table has bits indicating various character types, as
401 nigel 3 follows:
403     1 white space character
404     2 letter
405     4 decimal digit
406     8 hexadecimal digit
407     16 alphanumeric or '_'
408     128 regular expression metacharacter or binary zero
410     You should not alter the set of characters that contain the 128 bit, as that
411     will cause PCRE to malfunction.
414 nigel 41 Manifest
415     --------
416 nigel 3
417 nigel 41 The distribution should contain the following files:
418 nigel 3
419 nigel 41 (A) The actual source files of the PCRE library functions and their
420     headers:
421 nigel 3
422 nigel 41 dftables.c auxiliary program for building chartables.c
423 nigel 75
424 nigel 41 pcreposix.c )
425 nigel 77 pcre_compile.c )
426     pcre_config.c )
427     pcre_dfa_exec.c )
428     pcre_exec.c )
429     pcre_fullinfo.c )
430     pcre_get.c ) sources for the functions in the library,
431     pcre_globals.c ) and some internal functions that they use
432     pcre_info.c )
433     pcre_maketables.c )
434     pcre_ord2utf8.c )
435     pcre_printint.c )
436     pcre_study.c )
437     pcre_tables.c )
438     pcre_try_flipped.c )
439     pcre_ucp_findchar.c )
440     pcre_valid_utf8.c )
441     pcre_version.c )
442     pcre_xclass.c )
443 nigel 75
444 nigel 77 ucp_findchar.c )
445 nigel 75 ucp.h ) source for the code that is used for
446     ucpinternal.h ) Unicode property handling
447     ucptable.c )
448     ucptypetable.c )
450 nigel 43 pcre.in "source" for the header for the external API; pcre.h
451     is built from this by "configure"
452 nigel 41 pcreposix.h header for the external POSIX wrapper API
453 nigel 77 pcre_internal.h header for internal use
454 nigel 41 config.in template for config.h, which is built by configure
455 nigel 3
456 nigel 87 pcrecpp.h the header file for the C++ wrapper
457     pcrecpparg.h.in "source" for another C++ header file
458 nigel 77 pcrecpp.cc )
459     pcre_scanner.cc ) source for the C++ wrapper library
461     pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
462     C++ stringpiece functions
463     pcre_stringpiece.cc source for the C++ stringpiece functions
465 nigel 41 (B) Auxiliary files:
466 nigel 3
467 nigel 41 AUTHORS information about the author of PCRE
468     ChangeLog log of changes to the code
469     INSTALL generic installation instructions
470     LICENCE conditions for the use of PCRE
471 nigel 43 COPYING the same, using GNU's standard name
472 nigel 41 Makefile.in template for Unix Makefile, which is built by configure
473     NEWS important changes in this release
474     NON-UNIX-USE notes on building PCRE on non-Unix systems
475     README this file
476 nigel 49 RunTest.in template for a Unix shell script for running tests
477 nigel 77 RunGrepTest.in template for a Unix shell script for pcregrep tests
478 nigel 41 config.guess ) files used by libtool,
479     config.sub ) used only when building a shared library
480     configure a configuring shell script (built by autoconf)
481     configure.in the autoconf input used to build configure
482     doc/Tech.Notes notes on the encoding
483 nigel 63 doc/*.3 man page sources for the PCRE functions
484     doc/*.1 man page sources for pcregrep and pcretest
485     doc/html/* HTML documentation
486     doc/pcre.txt plain text version of the man pages
487     doc/pcretest.txt plain text documentation of test program
488     doc/perltest.txt plain text documentation of Perl test program
489 nigel 41 install-sh a shell script for installing files
490 nigel 75 libpcre.pc.in "source" for libpcre.pc for pkg-config
491 nigel 53 ltmain.sh file used to build a libtool script
492 nigel 75 mkinstalldirs script for making install directories
493 nigel 53 pcretest.c comprehensive test program
494     pcredemo.c simple demonstration of coding calls to PCRE
495 nigel 41 perltest Perl test program
496 nigel 49 pcregrep.c source of a grep utility that uses PCRE
497 nigel 43 pcre-config.in source of script which retains PCRE information
498 nigel 77 pcrecpp_unittest.c )
499     pcre_scanner_unittest.c ) test programs for the C++ wrapper
500     pcre_stringpiece_unittest.c )
501     testdata/testinput* test data for main library tests
502     testdata/testoutput* expected test results
503     testdata/grep* input and output for pcregrep tests
504 nigel 3
505 nigel 41 (C) Auxiliary files for Win32 DLL
506 nigel 29
507 nigel 75 libpcre.def
508     libpcreposix.def
509 nigel 41 pcre.def
510 nigel 29
511 nigel 63 (D) Auxiliary file for VPASCAL
513     makevp.bat
515 nigel 77 Philip Hazel
516     Email local part: ph10
517     Email domain: cam.ac.uk
518 nigel 87 January 2006

ViewVC Help
Powered by ViewVC 1.1.12