/[pcre]/code/trunk/README
ViewVC logotype

Contents of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 101 - (show annotations) (download)
Tue Mar 6 15:19:44 2007 UTC (7 years, 7 months ago) by ph10
File size: 24494 byte(s)
Updated the support (such as it is) for Virtual Pascal, thanks to Stefan 
Weber: (1) pcre_internal.h was missing some function renames; (2) updated 
makevp.bat for the current PCRE, using the additional files !compile.txt, 
!linklib.txt, and pcregexp.pas.

1 README file for PCRE (Perl-compatible regular expression library)
2 -----------------------------------------------------------------
3
4 The latest release of PCRE is always available from
5
6 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
7
8 Please read the NEWS file if you are upgrading from a previous release.
9
10
11 The PCRE APIs
12 -------------
13
14 PCRE is written in C, and it has its own API. The distribution now includes a
15 set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
16 for details).
17
18 Also included are a set of C wrapper functions that are based on the POSIX
19 API. These end up in the library called libpcreposix. Note that this just
20 provides a POSIX calling interface to PCRE: the regular expressions themselves
21 still follow Perl syntax and semantics. The header file for the POSIX-style
22 functions is called pcreposix.h. The official POSIX name is regex.h, but I
23 didn't want to risk possible problems with existing files of that name by
24 distributing it that way. To use it with an existing program that uses the
25 POSIX API, it will have to be renamed or pointed at by a link.
26
27 If you are using the POSIX interface to PCRE and there is already a POSIX regex
28 library installed on your system, you must take care when linking programs to
29 ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
30 up the "real" POSIX functions of the same name.
31
32
33 Documentation for PCRE
34 ----------------------
35
36 If you install PCRE in the normal way, you will end up with an installed set of
37 man pages whose names all start with "pcre". The one that is just called "pcre"
38 lists all the others. In addition to these man pages, the PCRE documentation is
39 supplied in two other forms; however, as there is no standard place to install
40 them, they are left in the doc directory of the unpacked source distribution.
41 These forms are:
42
43 1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
44 first of these is a concatenation of the text forms of all the section 3
45 man pages except those that summarize individual functions. The other two
46 are the text forms of the section 1 man pages for the pcregrep and
47 pcretest commands. Text forms are provided for ease of scanning with text
48 editors or similar tools.
49
50 2. A subdirectory called doc/html contains all the documentation in HTML
51 form, hyperlinked in various ways, and rooted in a file called
52 doc/index.html.
53
54
55 Contributions by users of PCRE
56 ------------------------------
57
58 You can find contributions from PCRE users in the directory
59
60 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
61
62 where there is also a README file giving brief descriptions of what they are.
63 Several of them provide support for compiling PCRE on various flavours of
64 Windows systems (I myself do not use Windows). Some are complete in themselves;
65 others are pointers to URLs containing relevant files.
66
67
68 Building on non-Unix systems
69 ----------------------------
70
71 For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
72 the system supports the use of "configure" and "make" you may be able to build
73 PCRE in the same way as for Unix systems.
74
75 PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
76 the details because I don't use those systems. It should be straightforward to
77 build PCRE on any system that has a Standard C compiler and library, because it
78 uses only Standard C functions.
79
80
81 Building PCRE on a Unix-like system
82 -----------------------------------
83
84 If you are using HP's ANSI C++ compiler (aCC), please see the special note
85 in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
86
87 To build PCRE on a Unix-like system, first run the "configure" command from the
88 PCRE distribution directory, with your current directory set to the directory
89 where you want the files to be created. This command is a standard GNU
90 "autoconf" configuration script, for which generic instructions are supplied in
91 INSTALL.
92
93 Most commonly, people build PCRE within its own distribution directory, and in
94 this case, on many systems, just running "./configure" is sufficient, but the
95 usual methods of changing standard defaults are available. For example:
96
97 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
98
99 specifies that the C compiler should be run with the flags '-O2 -Wall' instead
100 of the default, and that "make install" should install PCRE under /opt/local
101 instead of the default /usr/local.
102
103 If you want to build in a different directory, just run "configure" with that
104 directory as current. For example, suppose you have unpacked the PCRE source
105 into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
106
107 cd /build/pcre/pcre-xxx
108 /source/pcre/pcre-xxx/configure
109
110 PCRE is written in C and is normally compiled as a C library. However, it is
111 possible to build it as a C++ library, though the provided building apparatus
112 does not have any features to support this.
113
114 There are some optional features that can be included or omitted from the PCRE
115 library. You can read more about them in the pcrebuild man page.
116
117 . If you want to suppress the building of the C++ wrapper library, you can add
118 --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
119 will try to find a C++ compiler and C++ header files, and if it succeeds, it
120 will try to build the C++ wrapper.
121
122 . If you want to make use of the support for UTF-8 character strings in PCRE,
123 you must add --enable-utf8 to the "configure" command. Without it, the code
124 for handling UTF-8 is not included in the library. (Even when included, it
125 still has to be enabled by an option at run time.)
126
127 . If, in addition to support for UTF-8 character strings, you want to include
128 support for the \P, \p, and \X sequences that recognize Unicode character
129 properties, you must add --enable-unicode-properties to the "configure"
130 command. This adds about 30K to the size of the library (in the form of a
131 property table); only the basic two-letter properties such as Lu are
132 supported.
133
134 . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
135 of the Unicode newline sequences as indicating the end of a line. Whatever
136 you specify at build time is the default; the caller of PCRE can change the
137 selection at run time. The default newline indicator is a single LF character
138 (the Unix standard). You can specify the default newline indicator by adding
139 --newline-is-cr or --newline-is-lf or --newline-is-crlf or --newline-is-any
140 to the "configure" command, respectively.
141
142 If you specify --newline-is-cr or --newline-is-crlf, some of the standard
143 tests will fail, because the lines in the test files end with LF. Even if
144 the files are edited to change the line endings, there are likely to be some
145 failures. With --newline-is-any, many tests should succeed, but there may be
146 some failures.
147
148 . When called via the POSIX interface, PCRE uses malloc() to get additional
149 storage for processing capturing parentheses if there are more than 10 of
150 them. You can increase this threshold by setting, for example,
151
152 --with-posix-malloc-threshold=20
153
154 on the "configure" command.
155
156 . PCRE has a counter that can be set to limit the amount of resources it uses.
157 If the limit is exceeded during a match, the match fails. The default is ten
158 million. You can change the default by setting, for example,
159
160 --with-match-limit=500000
161
162 on the "configure" command. This is just the default; individual calls to
163 pcre_exec() can supply their own value. There is discussion on the pcreapi
164 man page.
165
166 . There is a separate counter that limits the depth of recursive function calls
167 during a matching process. This also has a default of ten million, which is
168 essentially "unlimited". You can change the default by setting, for example,
169
170 --with-match-limit-recursion=500000
171
172 Recursive function calls use up the runtime stack; running out of stack can
173 cause programs to crash in strange ways. There is a discussion about stack
174 sizes in the pcrestack man page.
175
176 . The default maximum compiled pattern size is around 64K. You can increase
177 this by adding --with-link-size=3 to the "configure" command. You can
178 increase it even more by setting --with-link-size=4, but this is unlikely
179 ever to be necessary. If you build PCRE with an increased link size, test 2
180 (and 5 if you are using UTF-8) will fail. Part of the output of these tests
181 is a representation of the compiled pattern, and this changes with the link
182 size.
183
184 . You can build PCRE so that its internal match() function that is called from
185 pcre_exec() does not call itself recursively. Instead, it uses blocks of data
186 from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
187 to save data that would otherwise be saved on the stack. To build PCRE like
188 this, use
189
190 --disable-stack-for-recursion
191
192 on the "configure" command. PCRE runs more slowly in this mode, but it may be
193 necessary in environments with limited stack sizes. This applies only to the
194 pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
195 use deeply nested recursion.
196
197 The "configure" script builds eight files for the basic C library:
198
199 . Makefile is the makefile that builds the library
200 . config.h contains build-time configuration options for the library
201 . pcre-config is a script that shows the settings of "configure" options
202 . libpcre.pc is data for the pkg-config command
203 . libtool is a script that builds shared and/or static libraries
204 . RunTest is a script for running tests on the library
205 . RunGrepTest is a script for running tests on the pcregrep command
206
207 In addition, if a C++ compiler is found, the following are also built:
208
209 . pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
210 . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
211
212 The "configure" script also creates config.status, which is an executable
213 script that can be run to recreate the configuration, and config.log, which
214 contains compiler output from tests that "configure" runs.
215
216 Once "configure" has run, you can run "make". It builds two libraries, called
217 libpcre and libpcreposix, a test program called pcretest, and the pcregrep
218 command. If a C++ compiler was found on your system, it also builds the C++
219 wrapper library, which is called libpcrecpp, and some test programs called
220 pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
221
222 The command "make test" runs all the appropriate tests. Details of the PCRE
223 tests are given in a separate section of this document, below.
224
225 You can use "make install" to copy the libraries, the public header files
226 pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
227 the C++ wrapper was built), and the man pages to appropriate live directories
228 on your system, in the normal way.
229
230 If you want to remove PCRE from your system, you can run "make uninstall".
231 This removes all the files that "make install" installed. However, it does not
232 remove any directories, because these are often shared with other programs.
233
234
235 Retrieving configuration information on Unix-like systems
236 ---------------------------------------------------------
237
238 Running "make install" also installs the command pcre-config, which can be used
239 to recall information about the PCRE configuration and installation. For
240 example:
241
242 pcre-config --version
243
244 prints the version number, and
245
246 pcre-config --libs
247
248 outputs information about where the library is installed. This command can be
249 included in makefiles for programs that use PCRE, saving the programmer from
250 having to remember too many details.
251
252 The pkg-config command is another system for saving and retrieving information
253 about installed libraries. Instead of separate commands for each library, a
254 single command is used. For example:
255
256 pkg-config --cflags pcre
257
258 The data is held in *.pc files that are installed in a directory called
259 pkgconfig.
260
261
262 Shared libraries on Unix-like systems
263 -------------------------------------
264
265 The default distribution builds PCRE as shared libraries and static libraries,
266 as long as the operating system supports shared libraries. Shared library
267 support relies on the "libtool" script which is built as part of the
268 "configure" process.
269
270 The libtool script is used to compile and link both shared and static
271 libraries. They are placed in a subdirectory called .libs when they are newly
272 built. The programs pcretest and pcregrep are built to use these uninstalled
273 libraries (by means of wrapper scripts in the case of shared libraries). When
274 you use "make install" to install shared libraries, pcregrep and pcretest are
275 automatically re-built to use the newly installed shared libraries before being
276 installed themselves. However, the versions left in the source directory still
277 use the uninstalled libraries.
278
279 To build PCRE using static libraries only you must use --disable-shared when
280 configuring it. For example:
281
282 ./configure --prefix=/usr/gnu --disable-shared
283
284 Then run "make" in the usual way. Similarly, you can use --disable-static to
285 build only shared libraries.
286
287
288 Cross-compiling on a Unix-like system
289 -------------------------------------
290
291 You can specify CC and CFLAGS in the normal way to the "configure" command, in
292 order to cross-compile PCRE for some other host. However, during the building
293 process, the dftables.c source file is compiled *and run* on the local host, in
294 order to generate the default character tables (the chartables.c file). It
295 therefore needs to be compiled with the local compiler, not the cross compiler.
296 You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
297 there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
298 when calling the "configure" command. If they are not specified, they default
299 to the values of CC and CFLAGS.
300
301
302 Using HP's ANSI C++ compiler (aCC)
303 ----------------------------------
304
305 Unless C++ support is disabled by specifying the "--disable-cpp" option of the
306 "configure" script, you *must* include the "-AA" option in the CXXFLAGS
307 environment variable in order for the C++ components to compile correctly.
308
309 Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
310 needed libraries fail to get included when specifying the "-AA" compiler
311 option. If you experience unresolved symbols when linking the C++ programs,
312 use the workaround of specifying the following environment variable prior to
313 running the "configure" script:
314
315 CXXLDFLAGS="-lstd_v2 -lCsup_v2"
316
317
318 Testing PCRE
319 ------------
320
321 To test PCRE on a Unix system, run the RunTest script that is created by the
322 configuring process. There is also a script called RunGrepTest that tests the
323 options of the pcregrep command. If the C++ wrapper library is build, three
324 test programs called pcrecpp_unittest, pcre_scanner_unittest, and
325 pcre_stringpiece_unittest are provided.
326
327 Both the scripts and all the program tests are run if you obey "make runtest",
328 "make check", or "make test". For other systems, see the instructions in
329 NON-UNIX-USE.
330
331 The RunTest script runs the pcretest test program (which is documented in its
332 own man page) on each of the testinput files (in the testdata directory) in
333 turn, and compares the output with the contents of the corresponding testoutput
334 files. A file called testtry is used to hold the main output from pcretest
335 (testsavedregex is also used as a working file). To run pcretest on just one of
336 the test files, give its number as an argument to RunTest, for example:
337
338 RunTest 2
339
340 The first test file can also be fed directly into the perltest script to check
341 that Perl gives the same results. The only difference you should see is in the
342 first few lines, where the Perl version is given instead of the PCRE version.
343
344 The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
345 pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
346 detection, and run-time flags that are specific to PCRE, as well as the POSIX
347 wrapper API. It also uses the debugging flag to check some of the internals of
348 pcre_compile().
349
350 If you build PCRE with a locale setting that is not the standard C locale, the
351 character tables may be different (see next paragraph). In some cases, this may
352 cause failures in the second set of tests. For example, in a locale where the
353 isprint() function yields TRUE for characters in the range 128-255, the use of
354 [:isascii:] inside a character class defines a different set of characters, and
355 this shows up in this test as a difference in the compiled code, which is being
356 listed for checking. Where the comparison test output contains [\x00-\x7f] the
357 test will contain [\x00-\xff], and similarly in some other cases. This is not a
358 bug in PCRE.
359
360 The third set of tests checks pcre_maketables(), the facility for building a
361 set of character tables for a specific locale and using them instead of the
362 default tables. The tests make use of the "fr_FR" (French) locale. Before
363 running the test, the script checks for the presence of this locale by running
364 the "locale" command. If that command fails, or if it doesn't include "fr_FR"
365 in the list of available locales, the third test cannot be run, and a comment
366 is output to say why. If running this test produces instances of the error
367
368 ** Failed to set locale "fr_FR"
369
370 in the comparison output, it means that locale is not available on your system,
371 despite being listed by "locale". This does not mean that PCRE is broken.
372
373 The fourth test checks the UTF-8 support. It is not run automatically unless
374 PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
375 running "configure". This file can be also fed directly to the perltest script,
376 provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
377 commented in the script, can be be used.)
378
379 The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
380 features of PCRE that are not relevant to Perl.
381
382 The sixth and test checks the support for Unicode character properties. It it
383 not run automatically unless PCRE is built with Unicode property support. To to
384 this you must set --enable-unicode-properties when running "configure".
385
386 The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
387 matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
388 property support, respectively. The eighth and ninth tests are not run
389 automatically unless PCRE is build with the relevant support.
390
391
392 Character tables
393 ----------------
394
395 PCRE uses four tables for manipulating and identifying characters whose values
396 are less than 256. The final argument of the pcre_compile() function is a
397 pointer to a block of memory containing the concatenated tables. A call to
398 pcre_maketables() can be used to generate a set of tables in the current
399 locale. If the final argument for pcre_compile() is passed as NULL, a set of
400 default tables that is built into the binary is used.
401
402 The source file called chartables.c contains the default set of tables. This is
403 not supplied in the distribution, but is built by the program dftables
404 (compiled from dftables.c), which uses the ANSI C character handling functions
405 such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
406 sources. This means that the default C locale which is set for your system will
407 control the contents of these default tables. You can change the default tables
408 by editing chartables.c and then re-building PCRE. If you do this, you should
409 probably also edit Makefile to ensure that the file doesn't ever get
410 re-generated.
411
412 The first two 256-byte tables provide lower casing and case flipping functions,
413 respectively. The next table consists of three 32-byte bit maps which identify
414 digits, "word" characters, and white space, respectively. These are used when
415 building 32-byte bit maps that represent character classes.
416
417 The final 256-byte table has bits indicating various character types, as
418 follows:
419
420 1 white space character
421 2 letter
422 4 decimal digit
423 8 hexadecimal digit
424 16 alphanumeric or '_'
425 128 regular expression metacharacter or binary zero
426
427 You should not alter the set of characters that contain the 128 bit, as that
428 will cause PCRE to malfunction.
429
430
431 Manifest
432 --------
433
434 The distribution should contain the following files:
435
436 (A) The actual source files of the PCRE library functions and their
437 headers:
438
439 dftables.c auxiliary program for building chartables.c
440
441 pcreposix.c )
442 pcre_compile.c )
443 pcre_config.c )
444 pcre_dfa_exec.c )
445 pcre_exec.c )
446 pcre_fullinfo.c )
447 pcre_get.c ) sources for the functions in the library,
448 pcre_globals.c ) and some internal functions that they use
449 pcre_info.c )
450 pcre_maketables.c )
451 pcre_newline.c )
452 pcre_ord2utf8.c )
453 pcre_refcount.c )
454 pcre_study.c )
455 pcre_tables.c )
456 pcre_try_flipped.c )
457 pcre_ucp_searchfuncs.c)
458 pcre_valid_utf8.c )
459 pcre_version.c )
460 pcre_xclass.c )
461
462 pcre_printint.src ) debugging function that is #included in pcretest, and
463 ) can also be #included in pcre_compile()
464
465 pcre.h the public PCRE header file
466 pcreposix.h header for the external POSIX wrapper API
467 pcre_internal.h header for internal use
468 ucp.h ) headers concerned with
469 ucpinternal.h ) Unicode property handling
470 ucptable.h ) (this one is the data table)
471 config.in template for config.h, which is built by configure
472
473 pcrecpp.h the header file for the C++ wrapper
474 pcrecpparg.h.in "source" for another C++ header file
475 pcrecpp.cc )
476 pcre_scanner.cc ) source for the C++ wrapper library
477
478 pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
479 C++ stringpiece functions
480 pcre_stringpiece.cc source for the C++ stringpiece functions
481
482 (B) Auxiliary files:
483
484 AUTHORS information about the author of PCRE
485 ChangeLog log of changes to the code
486 INSTALL generic installation instructions
487 LICENCE conditions for the use of PCRE
488 COPYING the same, using GNU's standard name
489 Makefile.in template for Unix Makefile, which is built by configure
490 NEWS important changes in this release
491 NON-UNIX-USE notes on building PCRE on non-Unix systems
492 README this file
493 RunTest.in template for a Unix shell script for running tests
494 RunGrepTest.in template for a Unix shell script for pcregrep tests
495 config.guess ) files used by libtool,
496 config.sub ) used only when building a shared library
497 config.h.in "source" for the config.h header file
498 configure a configuring shell script (built by autoconf)
499 configure.ac the autoconf input used to build configure
500 doc/Tech.Notes notes on the encoding
501 doc/*.3 man page sources for the PCRE functions
502 doc/*.1 man page sources for pcregrep and pcretest
503 doc/html/* HTML documentation
504 doc/pcre.txt plain text version of the man pages
505 doc/pcretest.txt plain text documentation of test program
506 doc/perltest.txt plain text documentation of Perl test program
507 install-sh a shell script for installing files
508 libpcre.pc.in "source" for libpcre.pc for pkg-config
509 ltmain.sh file used to build a libtool script
510 mkinstalldirs script for making install directories
511 pcretest.c comprehensive test program
512 pcredemo.c simple demonstration of coding calls to PCRE
513 perltest.pl Perl test program
514 pcregrep.c source of a grep utility that uses PCRE
515 pcre-config.in source of script which retains PCRE information
516 pcrecpp_unittest.c )
517 pcre_scanner_unittest.c ) test programs for the C++ wrapper
518 pcre_stringpiece_unittest.c )
519 testdata/testinput* test data for main library tests
520 testdata/testoutput* expected test results
521 testdata/grep* input and output for pcregrep tests
522
523 (C) Auxiliary files for Win32 DLL
524
525 libpcre.def
526 libpcreposix.def
527
528 (D) Auxiliary file for VPASCAL
529
530 makevp.bat
531
532 Philip Hazel
533 Email local part: ph10
534 Email domain: cam.ac.uk
535 March 2007

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12