| 34 |
---------------------- |
---------------------- |
| 35 |
|
|
| 36 |
If you install PCRE in the normal way, you will end up with an installed set of |
If you install PCRE in the normal way, you will end up with an installed set of |
| 37 |
man pages whose names all start with "pcre". The one that is called "pcre" |
man pages whose names all start with "pcre". The one that is just called "pcre" |
| 38 |
lists all the others. In addition to these man pages, the PCRE documentation is |
lists all the others. In addition to these man pages, the PCRE documentation is |
| 39 |
supplied in two other forms; however, as there is no standard place to install |
supplied in two other forms; however, as there is no standard place to install |
| 40 |
them, they are left in the doc directory of the unpacked source distribution. |
them, they are left in the doc directory of the unpacked source distribution. |
| 68 |
Building PCRE on a Unix-like system |
Building PCRE on a Unix-like system |
| 69 |
----------------------------------- |
----------------------------------- |
| 70 |
|
|
| 71 |
|
If you are using HP's ANSI C++ compiler (aCC), please see the special note |
| 72 |
|
in the section entitled "Using HP's ANSI C++ compiler (aCC)" below. |
| 73 |
|
|
| 74 |
To build PCRE on a Unix-like system, first run the "configure" command from the |
To build PCRE on a Unix-like system, first run the "configure" command from the |
| 75 |
PCRE distribution directory, with your current directory set to the directory |
PCRE distribution directory, with your current directory set to the directory |
| 76 |
where you want the files to be created. This command is a standard GNU |
where you want the files to be created. This command is a standard GNU |
| 94 |
cd /build/pcre/pcre-xxx |
cd /build/pcre/pcre-xxx |
| 95 |
/source/pcre/pcre-xxx/configure |
/source/pcre/pcre-xxx/configure |
| 96 |
|
|
| 97 |
|
PCRE is written in C and is normally compiled as a C library. However, it is |
| 98 |
|
possible to build it as a C++ library, though the provided building apparatus |
| 99 |
|
does not have any features to support this. |
| 100 |
|
|
| 101 |
There are some optional features that can be included or omitted from the PCRE |
There are some optional features that can be included or omitted from the PCRE |
| 102 |
library. You can read more about them in the pcrebuild man page. |
library. You can read more about them in the pcrebuild man page. |
| 103 |
|
|
| 104 |
|
. If you want to suppress the building of the C++ wrapper library, you can add |
| 105 |
|
--disable-cpp to the "configure" command. Otherwise, when "configure" is run, |
| 106 |
|
will try to find a C++ compiler and C++ header files, and if it succeeds, it |
| 107 |
|
will try to build the C++ wrapper. |
| 108 |
|
|
| 109 |
. If you want to make use of the support for UTF-8 character strings in PCRE, |
. If you want to make use of the support for UTF-8 character strings in PCRE, |
| 110 |
you must add --enable-utf8 to the "configure" command. Without it, the code |
you must add --enable-utf8 to the "configure" command. Without it, the code |
| 111 |
for handling UTF-8 is not included in the library. (Even when included, it |
for handling UTF-8 is not included in the library. (Even when included, it |
| 114 |
. If, in addition to support for UTF-8 character strings, you want to include |
. If, in addition to support for UTF-8 character strings, you want to include |
| 115 |
support for the \P, \p, and \X sequences that recognize Unicode character |
support for the \P, \p, and \X sequences that recognize Unicode character |
| 116 |
properties, you must add --enable-unicode-properties to the "configure" |
properties, you must add --enable-unicode-properties to the "configure" |
| 117 |
command. This adds about 90K to the size of the library (in the form of a |
command. This adds about 30K to the size of the library (in the form of a |
| 118 |
property table); only the basic two-letter properties such as Lu are |
property table); only the basic two-letter properties such as Lu are |
| 119 |
supported. |
supported. |
| 120 |
|
|
| 121 |
. You can build PCRE to recognized CR or NL as the newline character, instead |
. You can build PCRE to recognize either CR or LF or the sequence CRLF or any |
| 122 |
of whatever your compiler uses for "\n", by adding --newline-is-cr or |
of the Unicode newline sequences as indicating the end of a line. Whatever |
| 123 |
--newline-is-nl to the "configure" command, respectively. Only do this if you |
you specify at build time is the default; the caller of PCRE can change the |
| 124 |
really understand what you are doing. On traditional Unix-like systems, the |
selection at run time. The default newline indicator is a single LF character |
| 125 |
newline character is NL. |
(the Unix standard). You can specify the default newline indicator by adding |
| 126 |
|
--newline-is-cr or --newline-is-lf or --newline-is-crlf or --newline-is-any |
| 127 |
|
to the "configure" command, respectively. |
| 128 |
|
|
| 129 |
. When called via the POSIX interface, PCRE uses malloc() to get additional |
. When called via the POSIX interface, PCRE uses malloc() to get additional |
| 130 |
storage for processing capturing parentheses if there are more than 10 of |
storage for processing capturing parentheses if there are more than 10 of |
| 144 |
pcre_exec() can supply their own value. There is discussion on the pcreapi |
pcre_exec() can supply their own value. There is discussion on the pcreapi |
| 145 |
man page. |
man page. |
| 146 |
|
|
| 147 |
|
. There is a separate counter that limits the depth of recursive function calls |
| 148 |
|
during a matching process. This also has a default of ten million, which is |
| 149 |
|
essentially "unlimited". You can change the default by setting, for example, |
| 150 |
|
|
| 151 |
|
--with-match-limit-recursion=500000 |
| 152 |
|
|
| 153 |
|
Recursive function calls use up the runtime stack; running out of stack can |
| 154 |
|
cause programs to crash in strange ways. There is a discussion about stack |
| 155 |
|
sizes in the pcrestack man page. |
| 156 |
|
|
| 157 |
. The default maximum compiled pattern size is around 64K. You can increase |
. The default maximum compiled pattern size is around 64K. You can increase |
| 158 |
this by adding --with-link-size=3 to the "configure" command. You can |
this by adding --with-link-size=3 to the "configure" command. You can |
| 159 |
increase it even more by setting --with-link-size=4, but this is unlikely |
increase it even more by setting --with-link-size=4, but this is unlikely |
| 177 |
|
|
| 178 |
The "configure" script builds eight files for the basic C library: |
The "configure" script builds eight files for the basic C library: |
| 179 |
|
|
|
. pcre.h is the header file for C programs that call PCRE |
|
| 180 |
. Makefile is the makefile that builds the library |
. Makefile is the makefile that builds the library |
| 181 |
. config.h contains build-time configuration options for the library |
. config.h contains build-time configuration options for the library |
| 182 |
. pcre-config is a script that shows the settings of "configure" options |
. pcre-config is a script that shows the settings of "configure" options |
| 280 |
to the values of CC and CFLAGS. |
to the values of CC and CFLAGS. |
| 281 |
|
|
| 282 |
|
|
| 283 |
|
Using HP's ANSI C++ compiler (aCC) |
| 284 |
|
---------------------------------- |
| 285 |
|
|
| 286 |
|
Unless C++ support is disabled by specifying the "--disable-cpp" option of the |
| 287 |
|
"configure" script, you *must* include the "-AA" option in the CXXFLAGS |
| 288 |
|
environment variable in order for the C++ components to compile correctly. |
| 289 |
|
|
| 290 |
|
Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby |
| 291 |
|
needed libraries fail to get included when specifying the "-AA" compiler |
| 292 |
|
option. If you experience unresolved symbols when linking the C++ programs, |
| 293 |
|
use the workaround of specifying the following environment variable prior to |
| 294 |
|
running the "configure" script: |
| 295 |
|
|
| 296 |
|
CXXLDFLAGS="-lstd_v2 -lCsup_v2" |
| 297 |
|
|
| 298 |
|
|
| 299 |
Building on non-Unix systems |
Building on non-Unix systems |
| 300 |
---------------------------- |
---------------------------- |
| 301 |
|
|
| 305 |
|
|
| 306 |
PCRE has been compiled on Windows systems and on Macintoshes, but I don't know |
PCRE has been compiled on Windows systems and on Macintoshes, but I don't know |
| 307 |
the details because I don't use those systems. It should be straightforward to |
the details because I don't use those systems. It should be straightforward to |
| 308 |
build PCRE on any system that has a Standard C compiler, because it uses only |
build PCRE on any system that has a Standard C compiler and library, because it |
| 309 |
Standard C functions. |
uses only Standard C functions. |
| 310 |
|
|
| 311 |
|
|
| 312 |
Testing PCRE |
Testing PCRE |
| 325 |
The RunTest script runs the pcretest test program (which is documented in its |
The RunTest script runs the pcretest test program (which is documented in its |
| 326 |
own man page) on each of the testinput files (in the testdata directory) in |
own man page) on each of the testinput files (in the testdata directory) in |
| 327 |
turn, and compares the output with the contents of the corresponding testoutput |
turn, and compares the output with the contents of the corresponding testoutput |
| 328 |
file. A file called testtry is used to hold the main output from pcretest |
files. A file called testtry is used to hold the main output from pcretest |
| 329 |
(testsavedregex is also used as a working file). To run pcretest on just one of |
(testsavedregex is also used as a working file). To run pcretest on just one of |
| 330 |
the test files, give its number as an argument to RunTest, for example: |
the test files, give its number as an argument to RunTest, for example: |
| 331 |
|
|
| 332 |
RunTest 2 |
RunTest 2 |
| 333 |
|
|
| 334 |
The first file can also be fed directly into the perltest script to check that |
The first test file can also be fed directly into the perltest script to check |
| 335 |
Perl gives the same results. The only difference you should see is in the first |
that Perl gives the same results. The only difference you should see is in the |
| 336 |
few lines, where the Perl version is given instead of the PCRE version. |
first few lines, where the Perl version is given instead of the PCRE version. |
| 337 |
|
|
| 338 |
The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(), |
The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(), |
| 339 |
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error |
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error |
| 442 |
pcre_globals.c ) and some internal functions that they use |
pcre_globals.c ) and some internal functions that they use |
| 443 |
pcre_info.c ) |
pcre_info.c ) |
| 444 |
pcre_maketables.c ) |
pcre_maketables.c ) |
| 445 |
|
pcre_newline.c ) |
| 446 |
pcre_ord2utf8.c ) |
pcre_ord2utf8.c ) |
| 447 |
pcre_printint.c ) |
pcre_refcount.c ) |
| 448 |
pcre_study.c ) |
pcre_study.c ) |
| 449 |
pcre_tables.c ) |
pcre_tables.c ) |
| 450 |
pcre_try_flipped.c ) |
pcre_try_flipped.c ) |
| 451 |
pcre_ucp_findchar.c ) |
pcre_ucp_searchfuncs.c) |
| 452 |
pcre_valid_utf8.c ) |
pcre_valid_utf8.c ) |
| 453 |
pcre_version.c ) |
pcre_version.c ) |
| 454 |
pcre_xclass.c ) |
pcre_xclass.c ) |
|
|
|
|
ucp_findchar.c ) |
|
|
ucp.h ) source for the code that is used for |
|
|
ucpinternal.h ) Unicode property handling |
|
| 455 |
ucptable.c ) |
ucptable.c ) |
|
ucptypetable.c ) |
|
| 456 |
|
|
| 457 |
pcre.in "source" for the header for the external API; pcre.h |
pcre_printint.src ) debugging function that is #included in pcretest, and |
| 458 |
is built from this by "configure" |
) can also be #included in pcre_compile() |
| 459 |
|
|
| 460 |
|
pcre.h the public PCRE header file |
| 461 |
pcreposix.h header for the external POSIX wrapper API |
pcreposix.h header for the external POSIX wrapper API |
| 462 |
pcre_internal.h header for internal use |
pcre_internal.h header for internal use |
| 463 |
|
ucp.h ) headers concerned with |
| 464 |
|
ucpinternal.h ) Unicode property handling |
| 465 |
config.in template for config.h, which is built by configure |
config.in template for config.h, which is built by configure |
| 466 |
|
|
| 467 |
pcrecpp.h.in "source" for the header file for the C++ wrapper |
pcrecpp.h the header file for the C++ wrapper |
| 468 |
|
pcrecpparg.h.in "source" for another C++ header file |
| 469 |
pcrecpp.cc ) |
pcrecpp.cc ) |
| 470 |
pcre_scanner.cc ) source for the C++ wrapper library |
pcre_scanner.cc ) source for the C++ wrapper library |
| 471 |
|
|
| 488 |
RunGrepTest.in template for a Unix shell script for pcregrep tests |
RunGrepTest.in template for a Unix shell script for pcregrep tests |
| 489 |
config.guess ) files used by libtool, |
config.guess ) files used by libtool, |
| 490 |
config.sub ) used only when building a shared library |
config.sub ) used only when building a shared library |
| 491 |
|
config.h.in "source" for the config.h header file |
| 492 |
configure a configuring shell script (built by autoconf) |
configure a configuring shell script (built by autoconf) |
| 493 |
configure.in the autoconf input used to build configure |
configure.ac the autoconf input used to build configure |
| 494 |
doc/Tech.Notes notes on the encoding |
doc/Tech.Notes notes on the encoding |
| 495 |
doc/*.3 man page sources for the PCRE functions |
doc/*.3 man page sources for the PCRE functions |
| 496 |
doc/*.1 man page sources for pcregrep and pcretest |
doc/*.1 man page sources for pcregrep and pcretest |
| 518 |
|
|
| 519 |
libpcre.def |
libpcre.def |
| 520 |
libpcreposix.def |
libpcreposix.def |
|
pcre.def |
|
| 521 |
|
|
| 522 |
(D) Auxiliary file for VPASCAL |
(D) Auxiliary file for VPASCAL |
| 523 |
|
|
| 526 |
Philip Hazel |
Philip Hazel |
| 527 |
Email local part: ph10 |
Email local part: ph10 |
| 528 |
Email domain: cam.ac.uk |
Email domain: cam.ac.uk |
| 529 |
June 2005 |
November 2006 |