/[pcre]/code/trunk/README
ViewVC logotype

Contents of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 83 - (show annotations) (download)
Sat Feb 24 21:41:06 2007 UTC (7 years, 6 months ago) by nigel
File size: 22383 byte(s)
Load pcre-6.3 into code/trunk.

1 README file for PCRE (Perl-compatible regular expression library)
2 -----------------------------------------------------------------
3
4 The latest release of PCRE is always available from
5
6 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
7
8 Please read the NEWS file if you are upgrading from a previous release.
9
10
11 The PCRE APIs
12 -------------
13
14 PCRE is written in C, and it has its own API. The distribution now includes a
15 set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
16 for details).
17
18 Also included are a set of C wrapper functions that are based on the POSIX
19 API. These end up in the library called libpcreposix. Note that this just
20 provides a POSIX calling interface to PCRE: the regular expressions themselves
21 still follow Perl syntax and semantics. The header file for the POSIX-style
22 functions is called pcreposix.h. The official POSIX name is regex.h, but I
23 didn't want to risk possible problems with existing files of that name by
24 distributing it that way. To use it with an existing program that uses the
25 POSIX API, it will have to be renamed or pointed at by a link.
26
27 If you are using the POSIX interface to PCRE and there is already a POSIX regex
28 library installed on your system, you must take care when linking programs to
29 ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
30 up the "real" POSIX functions of the same name.
31
32
33 Documentation for PCRE
34 ----------------------
35
36 If you install PCRE in the normal way, you will end up with an installed set of
37 man pages whose names all start with "pcre". The one that is called "pcre"
38 lists all the others. In addition to these man pages, the PCRE documentation is
39 supplied in two other forms; however, as there is no standard place to install
40 them, they are left in the doc directory of the unpacked source distribution.
41 These forms are:
42
43 1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
44 first of these is a concatenation of the text forms of all the section 3
45 man pages except those that summarize individual functions. The other two
46 are the text forms of the section 1 man pages for the pcregrep and
47 pcretest commands. Text forms are provided for ease of scanning with text
48 editors or similar tools.
49
50 2. A subdirectory called doc/html contains all the documentation in HTML
51 form, hyperlinked in various ways, and rooted in a file called
52 doc/index.html.
53
54
55 Contributions by users of PCRE
56 ------------------------------
57
58 You can find contributions from PCRE users in the directory
59
60 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
61
62 where there is also a README file giving brief descriptions of what they are.
63 Several of them provide support for compiling PCRE on various flavours of
64 Windows systems (I myself do not use Windows). Some are complete in themselves;
65 others are pointers to URLs containing relevant files.
66
67
68 Building PCRE on a Unix-like system
69 -----------------------------------
70
71 To build PCRE on a Unix-like system, first run the "configure" command from the
72 PCRE distribution directory, with your current directory set to the directory
73 where you want the files to be created. This command is a standard GNU
74 "autoconf" configuration script, for which generic instructions are supplied in
75 INSTALL.
76
77 Most commonly, people build PCRE within its own distribution directory, and in
78 this case, on many systems, just running "./configure" is sufficient, but the
79 usual methods of changing standard defaults are available. For example:
80
81 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
82
83 specifies that the C compiler should be run with the flags '-O2 -Wall' instead
84 of the default, and that "make install" should install PCRE under /opt/local
85 instead of the default /usr/local.
86
87 If you want to build in a different directory, just run "configure" with that
88 directory as current. For example, suppose you have unpacked the PCRE source
89 into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
90
91 cd /build/pcre/pcre-xxx
92 /source/pcre/pcre-xxx/configure
93
94 There are some optional features that can be included or omitted from the PCRE
95 library. You can read more about them in the pcrebuild man page.
96
97 . If you want to suppress the building of the C++ wrapper library, you can add
98 --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
99 will try to find a C++ compiler and C++ header files, and if it succeeds, it
100 will try to build the C++ wrapper.
101
102 . If you want to make use of the support for UTF-8 character strings in PCRE,
103 you must add --enable-utf8 to the "configure" command. Without it, the code
104 for handling UTF-8 is not included in the library. (Even when included, it
105 still has to be enabled by an option at run time.)
106
107 . If, in addition to support for UTF-8 character strings, you want to include
108 support for the \P, \p, and \X sequences that recognize Unicode character
109 properties, you must add --enable-unicode-properties to the "configure"
110 command. This adds about 90K to the size of the library (in the form of a
111 property table); only the basic two-letter properties such as Lu are
112 supported.
113
114 . You can build PCRE to recognized CR or NL as the newline character, instead
115 of whatever your compiler uses for "\n", by adding --newline-is-cr or
116 --newline-is-nl to the "configure" command, respectively. Only do this if you
117 really understand what you are doing. On traditional Unix-like systems, the
118 newline character is NL.
119
120 . When called via the POSIX interface, PCRE uses malloc() to get additional
121 storage for processing capturing parentheses if there are more than 10 of
122 them. You can increase this threshold by setting, for example,
123
124 --with-posix-malloc-threshold=20
125
126 on the "configure" command.
127
128 . PCRE has a counter that can be set to limit the amount of resources it uses.
129 If the limit is exceeded during a match, the match fails. The default is ten
130 million. You can change the default by setting, for example,
131
132 --with-match-limit=500000
133
134 on the "configure" command. This is just the default; individual calls to
135 pcre_exec() can supply their own value. There is discussion on the pcreapi
136 man page.
137
138 . The default maximum compiled pattern size is around 64K. You can increase
139 this by adding --with-link-size=3 to the "configure" command. You can
140 increase it even more by setting --with-link-size=4, but this is unlikely
141 ever to be necessary. If you build PCRE with an increased link size, test 2
142 (and 5 if you are using UTF-8) will fail. Part of the output of these tests
143 is a representation of the compiled pattern, and this changes with the link
144 size.
145
146 . You can build PCRE so that its internal match() function that is called from
147 pcre_exec() does not call itself recursively. Instead, it uses blocks of data
148 from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
149 to save data that would otherwise be saved on the stack. To build PCRE like
150 this, use
151
152 --disable-stack-for-recursion
153
154 on the "configure" command. PCRE runs more slowly in this mode, but it may be
155 necessary in environments with limited stack sizes. This applies only to the
156 pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
157 use deeply nested recursion.
158
159 The "configure" script builds eight files for the basic C library:
160
161 . pcre.h is the header file for C programs that call PCRE
162 . Makefile is the makefile that builds the library
163 . config.h contains build-time configuration options for the library
164 . pcre-config is a script that shows the settings of "configure" options
165 . libpcre.pc is data for the pkg-config command
166 . libtool is a script that builds shared and/or static libraries
167 . RunTest is a script for running tests on the library
168 . RunGrepTest is a script for running tests on the pcregrep command
169
170 In addition, if a C++ compiler is found, the following are also built:
171
172 . pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
173 . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
174
175 The "configure" script also creates config.status, which is an executable
176 script that can be run to recreate the configuration, and config.log, which
177 contains compiler output from tests that "configure" runs.
178
179 Once "configure" has run, you can run "make". It builds two libraries, called
180 libpcre and libpcreposix, a test program called pcretest, and the pcregrep
181 command. If a C++ compiler was found on your system, it also builds the C++
182 wrapper library, which is called libpcrecpp, and some test programs called
183 pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
184
185 The command "make test" runs all the appropriate tests. Details of the PCRE
186 tests are given in a separate section of this document, below.
187
188 You can use "make install" to copy the libraries, the public header files
189 pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
190 the C++ wrapper was built), and the man pages to appropriate live directories
191 on your system, in the normal way.
192
193 If you want to remove PCRE from your system, you can run "make uninstall".
194 This removes all the files that "make install" installed. However, it does not
195 remove any directories, because these are often shared with other programs.
196
197
198 Retrieving configuration information on Unix-like systems
199 ---------------------------------------------------------
200
201 Running "make install" also installs the command pcre-config, which can be used
202 to recall information about the PCRE configuration and installation. For
203 example:
204
205 pcre-config --version
206
207 prints the version number, and
208
209 pcre-config --libs
210
211 outputs information about where the library is installed. This command can be
212 included in makefiles for programs that use PCRE, saving the programmer from
213 having to remember too many details.
214
215 The pkg-config command is another system for saving and retrieving information
216 about installed libraries. Instead of separate commands for each library, a
217 single command is used. For example:
218
219 pkg-config --cflags pcre
220
221 The data is held in *.pc files that are installed in a directory called
222 pkgconfig.
223
224
225 Shared libraries on Unix-like systems
226 -------------------------------------
227
228 The default distribution builds PCRE as shared libraries and static libraries,
229 as long as the operating system supports shared libraries. Shared library
230 support relies on the "libtool" script which is built as part of the
231 "configure" process.
232
233 The libtool script is used to compile and link both shared and static
234 libraries. They are placed in a subdirectory called .libs when they are newly
235 built. The programs pcretest and pcregrep are built to use these uninstalled
236 libraries (by means of wrapper scripts in the case of shared libraries). When
237 you use "make install" to install shared libraries, pcregrep and pcretest are
238 automatically re-built to use the newly installed shared libraries before being
239 installed themselves. However, the versions left in the source directory still
240 use the uninstalled libraries.
241
242 To build PCRE using static libraries only you must use --disable-shared when
243 configuring it. For example:
244
245 ./configure --prefix=/usr/gnu --disable-shared
246
247 Then run "make" in the usual way. Similarly, you can use --disable-static to
248 build only shared libraries.
249
250
251 Cross-compiling on a Unix-like system
252 -------------------------------------
253
254 You can specify CC and CFLAGS in the normal way to the "configure" command, in
255 order to cross-compile PCRE for some other host. However, during the building
256 process, the dftables.c source file is compiled *and run* on the local host, in
257 order to generate the default character tables (the chartables.c file). It
258 therefore needs to be compiled with the local compiler, not the cross compiler.
259 You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
260 there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
261 when calling the "configure" command. If they are not specified, they default
262 to the values of CC and CFLAGS.
263
264
265 Building on non-Unix systems
266 ----------------------------
267
268 For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
269 the system supports the use of "configure" and "make" you may be able to build
270 PCRE in the same way as for Unix systems.
271
272 PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
273 the details because I don't use those systems. It should be straightforward to
274 build PCRE on any system that has a Standard C compiler, because it uses only
275 Standard C functions.
276
277
278 Testing PCRE
279 ------------
280
281 To test PCRE on a Unix system, run the RunTest script that is created by the
282 configuring process. There is also a script called RunGrepTest that tests the
283 options of the pcregrep command. If the C++ wrapper library is build, three
284 test programs called pcrecpp_unittest, pcre_scanner_unittest, and
285 pcre_stringpiece_unittest are provided.
286
287 Both the scripts and all the program tests are run if you obey "make runtest",
288 "make check", or "make test". For other systems, see the instructions in
289 NON-UNIX-USE.
290
291 The RunTest script runs the pcretest test program (which is documented in its
292 own man page) on each of the testinput files (in the testdata directory) in
293 turn, and compares the output with the contents of the corresponding testoutput
294 file. A file called testtry is used to hold the main output from pcretest
295 (testsavedregex is also used as a working file). To run pcretest on just one of
296 the test files, give its number as an argument to RunTest, for example:
297
298 RunTest 2
299
300 The first file can also be fed directly into the perltest script to check that
301 Perl gives the same results. The only difference you should see is in the first
302 few lines, where the Perl version is given instead of the PCRE version.
303
304 The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
305 pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
306 detection, and run-time flags that are specific to PCRE, as well as the POSIX
307 wrapper API. It also uses the debugging flag to check some of the internals of
308 pcre_compile().
309
310 If you build PCRE with a locale setting that is not the standard C locale, the
311 character tables may be different (see next paragraph). In some cases, this may
312 cause failures in the second set of tests. For example, in a locale where the
313 isprint() function yields TRUE for characters in the range 128-255, the use of
314 [:isascii:] inside a character class defines a different set of characters, and
315 this shows up in this test as a difference in the compiled code, which is being
316 listed for checking. Where the comparison test output contains [\x00-\x7f] the
317 test will contain [\x00-\xff], and similarly in some other cases. This is not a
318 bug in PCRE.
319
320 The third set of tests checks pcre_maketables(), the facility for building a
321 set of character tables for a specific locale and using them instead of the
322 default tables. The tests make use of the "fr_FR" (French) locale. Before
323 running the test, the script checks for the presence of this locale by running
324 the "locale" command. If that command fails, or if it doesn't include "fr_FR"
325 in the list of available locales, the third test cannot be run, and a comment
326 is output to say why. If running this test produces instances of the error
327
328 ** Failed to set locale "fr_FR"
329
330 in the comparison output, it means that locale is not available on your system,
331 despite being listed by "locale". This does not mean that PCRE is broken.
332
333 The fourth test checks the UTF-8 support. It is not run automatically unless
334 PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
335 running "configure". This file can be also fed directly to the perltest script,
336 provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
337 commented in the script, can be be used.)
338
339 The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
340 features of PCRE that are not relevant to Perl.
341
342 The sixth and test checks the support for Unicode character properties. It it
343 not run automatically unless PCRE is built with Unicode property support. To to
344 this you must set --enable-unicode-properties when running "configure".
345
346 The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
347 matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
348 property support, respectively. The eighth and ninth tests are not run
349 automatically unless PCRE is build with the relevant support.
350
351
352 Character tables
353 ----------------
354
355 PCRE uses four tables for manipulating and identifying characters whose values
356 are less than 256. The final argument of the pcre_compile() function is a
357 pointer to a block of memory containing the concatenated tables. A call to
358 pcre_maketables() can be used to generate a set of tables in the current
359 locale. If the final argument for pcre_compile() is passed as NULL, a set of
360 default tables that is built into the binary is used.
361
362 The source file called chartables.c contains the default set of tables. This is
363 not supplied in the distribution, but is built by the program dftables
364 (compiled from dftables.c), which uses the ANSI C character handling functions
365 such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
366 sources. This means that the default C locale which is set for your system will
367 control the contents of these default tables. You can change the default tables
368 by editing chartables.c and then re-building PCRE. If you do this, you should
369 probably also edit Makefile to ensure that the file doesn't ever get
370 re-generated.
371
372 The first two 256-byte tables provide lower casing and case flipping functions,
373 respectively. The next table consists of three 32-byte bit maps which identify
374 digits, "word" characters, and white space, respectively. These are used when
375 building 32-byte bit maps that represent character classes.
376
377 The final 256-byte table has bits indicating various character types, as
378 follows:
379
380 1 white space character
381 2 letter
382 4 decimal digit
383 8 hexadecimal digit
384 16 alphanumeric or '_'
385 128 regular expression metacharacter or binary zero
386
387 You should not alter the set of characters that contain the 128 bit, as that
388 will cause PCRE to malfunction.
389
390
391 Manifest
392 --------
393
394 The distribution should contain the following files:
395
396 (A) The actual source files of the PCRE library functions and their
397 headers:
398
399 dftables.c auxiliary program for building chartables.c
400
401 pcreposix.c )
402 pcre_compile.c )
403 pcre_config.c )
404 pcre_dfa_exec.c )
405 pcre_exec.c )
406 pcre_fullinfo.c )
407 pcre_get.c ) sources for the functions in the library,
408 pcre_globals.c ) and some internal functions that they use
409 pcre_info.c )
410 pcre_maketables.c )
411 pcre_ord2utf8.c )
412 pcre_printint.c )
413 pcre_study.c )
414 pcre_tables.c )
415 pcre_try_flipped.c )
416 pcre_ucp_findchar.c )
417 pcre_valid_utf8.c )
418 pcre_version.c )
419 pcre_xclass.c )
420
421 ucp_findchar.c )
422 ucp.h ) source for the code that is used for
423 ucpinternal.h ) Unicode property handling
424 ucptable.c )
425 ucptypetable.c )
426
427 pcre.in "source" for the header for the external API; pcre.h
428 is built from this by "configure"
429 pcreposix.h header for the external POSIX wrapper API
430 pcre_internal.h header for internal use
431 config.in template for config.h, which is built by configure
432
433 pcrecpp.h.in "source" for the header file for the C++ wrapper
434 pcrecpp.cc )
435 pcre_scanner.cc ) source for the C++ wrapper library
436
437 pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
438 C++ stringpiece functions
439 pcre_stringpiece.cc source for the C++ stringpiece functions
440
441 (B) Auxiliary files:
442
443 AUTHORS information about the author of PCRE
444 ChangeLog log of changes to the code
445 INSTALL generic installation instructions
446 LICENCE conditions for the use of PCRE
447 COPYING the same, using GNU's standard name
448 Makefile.in template for Unix Makefile, which is built by configure
449 NEWS important changes in this release
450 NON-UNIX-USE notes on building PCRE on non-Unix systems
451 README this file
452 RunTest.in template for a Unix shell script for running tests
453 RunGrepTest.in template for a Unix shell script for pcregrep tests
454 config.guess ) files used by libtool,
455 config.sub ) used only when building a shared library
456 configure a configuring shell script (built by autoconf)
457 configure.in the autoconf input used to build configure
458 doc/Tech.Notes notes on the encoding
459 doc/*.3 man page sources for the PCRE functions
460 doc/*.1 man page sources for pcregrep and pcretest
461 doc/html/* HTML documentation
462 doc/pcre.txt plain text version of the man pages
463 doc/pcretest.txt plain text documentation of test program
464 doc/perltest.txt plain text documentation of Perl test program
465 install-sh a shell script for installing files
466 libpcre.pc.in "source" for libpcre.pc for pkg-config
467 ltmain.sh file used to build a libtool script
468 mkinstalldirs script for making install directories
469 pcretest.c comprehensive test program
470 pcredemo.c simple demonstration of coding calls to PCRE
471 perltest Perl test program
472 pcregrep.c source of a grep utility that uses PCRE
473 pcre-config.in source of script which retains PCRE information
474 pcrecpp_unittest.c )
475 pcre_scanner_unittest.c ) test programs for the C++ wrapper
476 pcre_stringpiece_unittest.c )
477 testdata/testinput* test data for main library tests
478 testdata/testoutput* expected test results
479 testdata/grep* input and output for pcregrep tests
480
481 (C) Auxiliary files for Win32 DLL
482
483 libpcre.def
484 libpcreposix.def
485 pcre.def
486
487 (D) Auxiliary file for VPASCAL
488
489 makevp.bat
490
491 Philip Hazel
492 Email local part: ph10
493 Email domain: cam.ac.uk
494 August 2005

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12