| 7 |
|
|
| 8 |
Please read the NEWS file if you are upgrading from a previous release. |
Please read the NEWS file if you are upgrading from a previous release. |
| 9 |
|
|
| 10 |
PCRE has its own native API, but a set of "wrapper" functions that are based on |
|
| 11 |
the POSIX API are also supplied in the library libpcreposix. Note that this |
The PCRE APIs |
| 12 |
just provides a POSIX calling interface to PCRE: the regular expressions |
------------- |
| 13 |
themselves still follow Perl syntax and semantics. The header file |
|
| 14 |
for the POSIX-style functions is called pcreposix.h. The official POSIX name is |
PCRE is written in C, and it has its own API. The distribution now includes a |
| 15 |
regex.h, but I didn't want to risk possible problems with existing files of |
set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page |
| 16 |
that name by distributing it that way. To use it with an existing program that |
for details). |
| 17 |
uses the POSIX API, it will have to be renamed or pointed at by a link. |
|
| 18 |
|
Also included are a set of C wrapper functions that are based on the POSIX |
| 19 |
|
API. These end up in the library called libpcreposix. Note that this just |
| 20 |
|
provides a POSIX calling interface to PCRE: the regular expressions themselves |
| 21 |
|
still follow Perl syntax and semantics. The header file for the POSIX-style |
| 22 |
|
functions is called pcreposix.h. The official POSIX name is regex.h, but I |
| 23 |
|
didn't want to risk possible problems with existing files of that name by |
| 24 |
|
distributing it that way. To use it with an existing program that uses the |
| 25 |
|
POSIX API, it will have to be renamed or pointed at by a link. |
| 26 |
|
|
| 27 |
If you are using the POSIX interface to PCRE and there is already a POSIX regex |
If you are using the POSIX interface to PCRE and there is already a POSIX regex |
| 28 |
library installed on your system, you must take care when linking programs to |
library installed on your system, you must take care when linking programs to |
| 120 |
|
|
| 121 |
on the "configure" command. |
on the "configure" command. |
| 122 |
|
|
| 123 |
. PCRE has a counter which can be set to limit the amount of resources it uses. |
. PCRE has a counter that can be set to limit the amount of resources it uses. |
| 124 |
If the limit is exceeded during a match, the match fails. The default is ten |
If the limit is exceeded during a match, the match fails. The default is ten |
| 125 |
million. You can change the default by setting, for example, |
million. You can change the default by setting, for example, |
| 126 |
|
|
| 138 |
is a representation of the compiled pattern, and this changes with the link |
is a representation of the compiled pattern, and this changes with the link |
| 139 |
size. |
size. |
| 140 |
|
|
| 141 |
. You can build PCRE so that its match() function does not call itself |
. You can build PCRE so that its internal match() function that is called from |
| 142 |
recursively. Instead, it uses blocks of data from the heap via special |
pcre_exec() does not call itself recursively. Instead, it uses blocks of data |
| 143 |
functions pcre_stack_malloc() and pcre_stack_free() to save data that would |
from the heap via special functions pcre_stack_malloc() and pcre_stack_free() |
| 144 |
otherwise be saved on the stack. To build PCRE like this, use |
to save data that would otherwise be saved on the stack. To build PCRE like |
| 145 |
|
this, use |
| 146 |
|
|
| 147 |
--disable-stack-for-recursion |
--disable-stack-for-recursion |
| 148 |
|
|
| 149 |
on the "configure" command. PCRE runs more slowly in this mode, but it may be |
on the "configure" command. PCRE runs more slowly in this mode, but it may be |
| 150 |
necessary in environments with limited stack sizes. |
necessary in environments with limited stack sizes. This applies only to the |
| 151 |
|
pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not |
| 152 |
|
use deeply nested recursion. |
| 153 |
|
|
| 154 |
|
The "configure" script builds eight files for the basic C library: |
| 155 |
|
|
| 156 |
|
. pcre.h is the header file for C programs that call PCRE |
| 157 |
|
. Makefile is the makefile that builds the library |
| 158 |
|
. config.h contains build-time configuration options for the library |
| 159 |
|
. pcre-config is a script that shows the settings of "configure" options |
| 160 |
|
. libpcre.pc is data for the pkg-config command |
| 161 |
|
. libtool is a script that builds shared and/or static libraries |
| 162 |
|
. RunTest is a script for running tests on the library |
| 163 |
|
. RunGrepTest is a script for running tests on the pcregrep command |
| 164 |
|
|
| 165 |
The "configure" script builds seven files: |
In addition, if a C++ compiler is found, the following are also built: |
| 166 |
|
|
| 167 |
. pcre.h is build by copying pcre.in and making substitutions |
. pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper |
| 168 |
. Makefile is built by copying Makefile.in and making substitutions. |
. pcre_stringpiece.h is the header for the C++ "stringpiece" functions |
|
. config.h is built by copying config.in and making substitutions. |
|
|
. pcre-config is built by copying pcre-config.in and making substitutions. |
|
|
. libpcre.pc is data for the pkg-config command, built from libpcre.pc.in |
|
|
. libtool is a script that builds shared and/or static libraries |
|
|
. RunTest is a script for running tests |
|
| 169 |
|
|
| 170 |
Once "configure" has run, you can run "make". It builds two libraries called |
The "configure" script also creates config.status, which is an executable |
| 171 |
|
script that can be run to recreate the configuration, and config.log, which |
| 172 |
|
contains compiler output from tests that "configure" runs. |
| 173 |
|
|
| 174 |
|
Once "configure" has run, you can run "make". It builds two libraries, called |
| 175 |
libpcre and libpcreposix, a test program called pcretest, and the pcregrep |
libpcre and libpcreposix, a test program called pcretest, and the pcregrep |
| 176 |
command. You can use "make install" to copy these, the public header files |
command. If a C++ compiler was found on your system, it also builds the C++ |
| 177 |
pcre.h and pcreposix.h, and the man pages to appropriate live directories on |
wrapper library, which is called libpcrecpp, and some test programs called |
| 178 |
your system, in the normal way. |
pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest. |
| 179 |
|
|
| 180 |
|
The command "make test" runs all the appropriate tests. Details of the PCRE |
| 181 |
|
tests are given in a separate section of this document, below. |
| 182 |
|
|
| 183 |
|
You can use "make install" to copy the libraries, the public header files |
| 184 |
|
pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if |
| 185 |
|
the C++ wrapper was built), and the man pages to appropriate live directories |
| 186 |
|
on your system, in the normal way. |
| 187 |
|
|
| 188 |
|
If you want to remove PCRE from your system, you can run "make uninstall". |
| 189 |
|
This removes all the files that "make install" installed. However, it does not |
| 190 |
|
remove any directories, because these are often shared with other programs. |
| 191 |
|
|
| 192 |
|
|
| 193 |
Retrieving configuration information on Unix-like systems |
Retrieving configuration information on Unix-like systems |
| 220 |
Shared libraries on Unix-like systems |
Shared libraries on Unix-like systems |
| 221 |
------------------------------------- |
------------------------------------- |
| 222 |
|
|
| 223 |
The default distribution builds PCRE as two shared libraries and two static |
The default distribution builds PCRE as shared libraries and static libraries, |
| 224 |
libraries, as long as the operating system supports shared libraries. Shared |
as long as the operating system supports shared libraries. Shared library |
| 225 |
library support relies on the "libtool" script which is built as part of the |
support relies on the "libtool" script which is built as part of the |
| 226 |
"configure" process. |
"configure" process. |
| 227 |
|
|
| 228 |
The libtool script is used to compile and link both shared and static |
The libtool script is used to compile and link both shared and static |
| 251 |
process, the dftables.c source file is compiled *and run* on the local host, in |
process, the dftables.c source file is compiled *and run* on the local host, in |
| 252 |
order to generate the default character tables (the chartables.c file). It |
order to generate the default character tables (the chartables.c file). It |
| 253 |
therefore needs to be compiled with the local compiler, not the cross compiler. |
therefore needs to be compiled with the local compiler, not the cross compiler. |
| 254 |
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD) |
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD; |
| 255 |
|
there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper) |
| 256 |
when calling the "configure" command. If they are not specified, they default |
when calling the "configure" command. If they are not specified, they default |
| 257 |
to the values of CC and CFLAGS. |
to the values of CC and CFLAGS. |
| 258 |
|
|
| 274 |
------------ |
------------ |
| 275 |
|
|
| 276 |
To test PCRE on a Unix system, run the RunTest script that is created by the |
To test PCRE on a Unix system, run the RunTest script that is created by the |
| 277 |
configuring process. (This can also be run by "make runtest", "make check", or |
configuring process. There is also a script called RunGrepTest that tests the |
| 278 |
"make test".) For other systems, see the instructions in NON-UNIX-USE. |
options of the pcregrep command. If the C++ wrapper library is build, three |
| 279 |
|
test programs called pcrecpp_unittest, pcre_scanner_unittest, and |
| 280 |
The script runs the pcretest test program (which is documented in its own man |
pcre_stringpiece_unittest are provided. |
| 281 |
page) on each of the testinput files (in the testdata directory) in turn, |
|
| 282 |
and compares the output with the contents of the corresponding testoutput file. |
Both the scripts and all the program tests are run if you obey "make runtest", |
| 283 |
A file called testtry is used to hold the main output from pcretest |
"make check", or "make test". For other systems, see the instructions in |
| 284 |
|
NON-UNIX-USE. |
| 285 |
|
|
| 286 |
|
The RunTest script runs the pcretest test program (which is documented in its |
| 287 |
|
own man page) on each of the testinput files (in the testdata directory) in |
| 288 |
|
turn, and compares the output with the contents of the corresponding testoutput |
| 289 |
|
file. A file called testtry is used to hold the main output from pcretest |
| 290 |
(testsavedregex is also used as a working file). To run pcretest on just one of |
(testsavedregex is also used as a working file). To run pcretest on just one of |
| 291 |
the test files, give its number as an argument to RunTest, for example: |
the test files, give its number as an argument to RunTest, for example: |
| 292 |
|
|
| 334 |
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8 |
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8 |
| 335 |
features of PCRE that are not relevant to Perl. |
features of PCRE that are not relevant to Perl. |
| 336 |
|
|
| 337 |
The sixth and final test checks the support for Unicode character properties. |
The sixth and test checks the support for Unicode character properties. It it |
| 338 |
It it not run automatically unless PCRE is built with Unicode property support. |
not run automatically unless PCRE is built with Unicode property support. To to |
| 339 |
To to this you must set --enable-unicode-properties when running "configure". |
this you must set --enable-unicode-properties when running "configure". |
| 340 |
|
|
| 341 |
|
The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative |
| 342 |
|
matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode |
| 343 |
|
property support, respectively. The eighth and ninth tests are not run |
| 344 |
|
automatically unless PCRE is build with the relevant support. |
| 345 |
|
|
| 346 |
|
|
| 347 |
Character tables |
Character tables |
| 393 |
|
|
| 394 |
dftables.c auxiliary program for building chartables.c |
dftables.c auxiliary program for building chartables.c |
| 395 |
|
|
|
get.c ) |
|
|
maketables.c ) |
|
|
study.c ) source of the functions |
|
|
pcre.c ) in the library |
|
| 396 |
pcreposix.c ) |
pcreposix.c ) |
| 397 |
printint.c ) |
pcre_compile.c ) |
| 398 |
|
pcre_config.c ) |
| 399 |
|
pcre_dfa_exec.c ) |
| 400 |
|
pcre_exec.c ) |
| 401 |
|
pcre_fullinfo.c ) |
| 402 |
|
pcre_get.c ) sources for the functions in the library, |
| 403 |
|
pcre_globals.c ) and some internal functions that they use |
| 404 |
|
pcre_info.c ) |
| 405 |
|
pcre_maketables.c ) |
| 406 |
|
pcre_ord2utf8.c ) |
| 407 |
|
pcre_printint.c ) |
| 408 |
|
pcre_study.c ) |
| 409 |
|
pcre_tables.c ) |
| 410 |
|
pcre_try_flipped.c ) |
| 411 |
|
pcre_ucp_findchar.c ) |
| 412 |
|
pcre_valid_utf8.c ) |
| 413 |
|
pcre_version.c ) |
| 414 |
|
pcre_xclass.c ) |
| 415 |
|
|
| 416 |
ucp.c ) |
ucp_findchar.c ) |
| 417 |
ucp.h ) source for the code that is used for |
ucp.h ) source for the code that is used for |
| 418 |
ucpinternal.h ) Unicode property handling |
ucpinternal.h ) Unicode property handling |
| 419 |
ucptable.c ) |
ucptable.c ) |
| 422 |
pcre.in "source" for the header for the external API; pcre.h |
pcre.in "source" for the header for the external API; pcre.h |
| 423 |
is built from this by "configure" |
is built from this by "configure" |
| 424 |
pcreposix.h header for the external POSIX wrapper API |
pcreposix.h header for the external POSIX wrapper API |
| 425 |
internal.h header for internal use |
pcre_internal.h header for internal use |
| 426 |
config.in template for config.h, which is built by configure |
config.in template for config.h, which is built by configure |
| 427 |
|
|
| 428 |
|
pcrecpp.h.in "source" for the header file for the C++ wrapper |
| 429 |
|
pcrecpp.cc ) |
| 430 |
|
pcre_scanner.cc ) source for the C++ wrapper library |
| 431 |
|
|
| 432 |
|
pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the |
| 433 |
|
C++ stringpiece functions |
| 434 |
|
pcre_stringpiece.cc source for the C++ stringpiece functions |
| 435 |
|
|
| 436 |
(B) Auxiliary files: |
(B) Auxiliary files: |
| 437 |
|
|
| 438 |
AUTHORS information about the author of PCRE |
AUTHORS information about the author of PCRE |
| 445 |
NON-UNIX-USE notes on building PCRE on non-Unix systems |
NON-UNIX-USE notes on building PCRE on non-Unix systems |
| 446 |
README this file |
README this file |
| 447 |
RunTest.in template for a Unix shell script for running tests |
RunTest.in template for a Unix shell script for running tests |
| 448 |
|
RunGrepTest.in template for a Unix shell script for pcregrep tests |
| 449 |
config.guess ) files used by libtool, |
config.guess ) files used by libtool, |
| 450 |
config.sub ) used only when building a shared library |
config.sub ) used only when building a shared library |
| 451 |
configure a configuring shell script (built by autoconf) |
configure a configuring shell script (built by autoconf) |
| 466 |
perltest Perl test program |
perltest Perl test program |
| 467 |
pcregrep.c source of a grep utility that uses PCRE |
pcregrep.c source of a grep utility that uses PCRE |
| 468 |
pcre-config.in source of script which retains PCRE information |
pcre-config.in source of script which retains PCRE information |
| 469 |
testdata/testinput1 test data, compatible with Perl |
pcrecpp_unittest.c ) |
| 470 |
testdata/testinput2 test data for error messages and non-Perl things |
pcre_scanner_unittest.c ) test programs for the C++ wrapper |
| 471 |
testdata/testinput3 test data for locale-specific tests |
pcre_stringpiece_unittest.c ) |
| 472 |
testdata/testinput4 test data for UTF-8 tests compatible with Perl |
testdata/testinput* test data for main library tests |
| 473 |
testdata/testinput5 test data for other UTF-8 tests |
testdata/testoutput* expected test results |
| 474 |
testdata/testinput6 test data for Unicode property support tests |
testdata/grep* input and output for pcregrep tests |
|
testdata/testoutput1 test results corresponding to testinput1 |
|
|
testdata/testoutput2 test results corresponding to testinput2 |
|
|
testdata/testoutput3 test results corresponding to testinput3 |
|
|
testdata/testoutput4 test results corresponding to testinput4 |
|
|
testdata/testoutput5 test results corresponding to testinput5 |
|
|
testdata/testoutput6 test results corresponding to testinput6 |
|
| 475 |
|
|
| 476 |
(C) Auxiliary files for Win32 DLL |
(C) Auxiliary files for Win32 DLL |
| 477 |
|
|
|
dll.mk |
|
| 478 |
libpcre.def |
libpcre.def |
| 479 |
libpcreposix.def |
libpcreposix.def |
| 480 |
pcre.def |
pcre.def |
| 483 |
|
|
| 484 |
makevp.bat |
makevp.bat |
| 485 |
|
|
| 486 |
Philip Hazel <ph10@cam.ac.uk> |
Philip Hazel |
| 487 |
September 2004 |
Email local part: ph10 |
| 488 |
|
Email domain: cam.ac.uk |
| 489 |
|
June 2005 |