| 30 |
The PCRE APIs |
The PCRE APIs |
| 31 |
------------- |
------------- |
| 32 |
|
|
| 33 |
PCRE is written in C, and it has its own API. The distribution now includes a |
PCRE is written in C, and it has its own API. The distribution also includes a |
| 34 |
set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page |
set of C++ wrapper functions (see the pcrecpp man page for details), courtesy |
| 35 |
for details). |
of Google Inc. |
| 36 |
|
|
| 37 |
Also included in the distribution are a set of C wrapper functions that are |
In addition, there is a set of C wrapper functions that are based on the POSIX |
| 38 |
based on the POSIX API. These end up in the library called libpcreposix. Note |
regular expression API (see the pcreposix man page). These end up in the |
| 39 |
that this just provides a POSIX calling interface to PCRE; the regular |
library called libpcreposix. Note that this just provides a POSIX calling |
| 40 |
expressions themselves still follow Perl syntax and semantics. The POSIX API is |
interface to PCRE; the regular expressions themselves still follow Perl syntax |
| 41 |
restricted, and does not give full access to all of PCRE's facilities. |
and semantics. The POSIX API is restricted, and does not give full access to |
| 42 |
|
all of PCRE's facilities. |
| 43 |
|
|
| 44 |
The header file for the POSIX-style functions is called pcreposix.h. The |
The header file for the POSIX-style functions is called pcreposix.h. The |
| 45 |
official POSIX name is regex.h, but I did not want to risk possible problems |
official POSIX name is regex.h, but I did not want to risk possible problems |
| 92 |
|
|
| 93 |
There is a README file giving brief descriptions of what they are. Some are |
There is a README file giving brief descriptions of what they are. Some are |
| 94 |
complete in themselves; others are pointers to URLs containing relevant files. |
complete in themselves; others are pointers to URLs containing relevant files. |
| 95 |
Some of this material is likely to be well out-of-date. In particular, several |
Some of this material is likely to be well out-of-date. Several of the earlier |
| 96 |
of the contributions provide support for compiling PCRE on various flavours of |
contributions provided support for compiling PCRE on various flavours of |
| 97 |
Windows (I myself do not use Windows), but nowadays there is more Windows |
Windows (I myself do not use Windows). Nowadays there is more Windows support |
| 98 |
support in the standard distribution. |
in the standard distribution, so these contibutions have been archived. |
| 99 |
|
|
| 100 |
|
|
| 101 |
Building PCRE on non-Unix systems |
Building PCRE on non-Unix systems |
| 148 |
|
|
| 149 |
. If you want to suppress the building of the C++ wrapper library, you can add |
. If you want to suppress the building of the C++ wrapper library, you can add |
| 150 |
--disable-cpp to the "configure" command. Otherwise, when "configure" is run, |
--disable-cpp to the "configure" command. Otherwise, when "configure" is run, |
| 151 |
will try to find a C++ compiler and C++ header files, and if it succeeds, it |
it will try to find a C++ compiler and C++ header files, and if it succeeds, |
| 152 |
will try to build the C++ wrapper. |
it will try to build the C++ wrapper. |
| 153 |
|
|
| 154 |
. If you want to make use of the support for UTF-8 character strings in PCRE, |
. If you want to make use of the support for UTF-8 character strings in PCRE, |
| 155 |
you must add --enable-utf8 to the "configure" command. Without it, the code |
you must add --enable-utf8 to the "configure" command. Without it, the code |
| 179 |
|
|
| 180 |
. When called via the POSIX interface, PCRE uses malloc() to get additional |
. When called via the POSIX interface, PCRE uses malloc() to get additional |
| 181 |
storage for processing capturing parentheses if there are more than 10 of |
storage for processing capturing parentheses if there are more than 10 of |
| 182 |
them. You can increase this threshold by setting, for example, |
them in a pattern. You can increase this threshold by setting, for example, |
| 183 |
|
|
| 184 |
--with-posix-malloc-threshold=20 |
--with-posix-malloc-threshold=20 |
| 185 |
|
|
| 208 |
. The default maximum compiled pattern size is around 64K. You can increase |
. The default maximum compiled pattern size is around 64K. You can increase |
| 209 |
this by adding --with-link-size=3 to the "configure" command. You can |
this by adding --with-link-size=3 to the "configure" command. You can |
| 210 |
increase it even more by setting --with-link-size=4, but this is unlikely |
increase it even more by setting --with-link-size=4, but this is unlikely |
| 211 |
ever to be necessary. |
ever to be necessary. Increasing the internal link size will reduce |
| 212 |
|
performance. |
| 213 |
|
|
| 214 |
. You can build PCRE so that its internal match() function that is called from |
. You can build PCRE so that its internal match() function that is called from |
| 215 |
pcre_exec() does not call itself recursively. Instead, it uses memory blocks |
pcre_exec() does not call itself recursively. Instead, it uses memory blocks |
| 225 |
use deeply nested recursion. There is a discussion about stack sizes in the |
use deeply nested recursion. There is a discussion about stack sizes in the |
| 226 |
pcrestack man page. |
pcrestack man page. |
| 227 |
|
|
| 228 |
|
. For speed, PCRE uses four tables for manipulating and identifying characters |
| 229 |
|
whose code point values are less than 256. By default, it uses a set of |
| 230 |
|
tables for ASCII encoding that is part of the distribution. If you specify |
| 231 |
|
|
| 232 |
|
--enable-rebuild-chartables |
| 233 |
|
|
| 234 |
|
a program called dftables is compiled and run in the default C locale when |
| 235 |
|
you obey "make". It builds a source file called pcre_chartables.c. If you do |
| 236 |
|
not specify this option, pcre_chartables.c is created as a copy of |
| 237 |
|
pcre_chartables.c.dist. See "Character tables" below for further information. |
| 238 |
|
|
| 239 |
|
. It is possible to compile PCRE for use on systems that use EBCDIC as their |
| 240 |
|
default character code (as opposed to ASCII) by specifying |
| 241 |
|
|
| 242 |
|
--enable-ebcdic |
| 243 |
|
|
| 244 |
|
This automatically implies --enable-rebuild-chartables (see above). |
| 245 |
|
|
| 246 |
The "configure" script builds the following files for the basic C library: |
The "configure" script builds the following files for the basic C library: |
| 247 |
|
|
| 248 |
. Makefile is the makefile that builds the library |
. Makefile is the makefile that builds the library |
| 391 |
------------------------------------ |
------------------------------------ |
| 392 |
|
|
| 393 |
You can specify CC and CFLAGS in the normal way to the "configure" command, in |
You can specify CC and CFLAGS in the normal way to the "configure" command, in |
| 394 |
order to cross-compile PCRE for some other host. However, during the building |
order to cross-compile PCRE for some other host. However, you should NOT |
| 395 |
process, the dftables.c source file is compiled *and run* on the local host, in |
specify --enable-rebuild-chartables, because if you do, the dftables.c source |
| 396 |
order to generate the default character tables (the chartables.c file). It |
file is compiled and run on the local host, in order to generate the inbuilt |
| 397 |
therefore needs to be compiled with the local compiler, not the cross compiler. |
character tables (the pcre_chartables.c file). This will probably not work, |
| 398 |
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD; |
because dftables.c needs to be compiled with the local compiler, not the cross |
| 399 |
there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper) |
compiler. |
| 400 |
when calling the "configure" command. If they are not specified, they default |
|
| 401 |
to the values of CC and CFLAGS. |
When --enable-rebuild-chartables is not specified, pcre_chartables.c is created |
| 402 |
|
by making a copy of pcre_chartables.c.dist, which is a default set of tables |
| 403 |
|
that assumes ASCII code. Cross-compiling with the default tables should not be |
| 404 |
|
a problem. |
| 405 |
|
|
| 406 |
|
If you need to modify the character tables when cross-compiling, you should |
| 407 |
|
move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and |
| 408 |
|
run it on the local host to make a new version of pcre_chartables.c.dist. |
| 409 |
|
Then when you cross-compile PCRE this new version of the tables will be used. |
| 410 |
|
|
| 411 |
|
|
| 412 |
Using HP's ANSI C++ compiler (aCC) |
Using HP's ANSI C++ compiler (aCC) |
| 518 |
of tables in the current locale. If the final argument for pcre_compile() is |
of tables in the current locale. If the final argument for pcre_compile() is |
| 519 |
passed as NULL, a set of default tables that is built into the binary is used. |
passed as NULL, a set of default tables that is built into the binary is used. |
| 520 |
|
|
| 521 |
The source file called chartables.c contains the default set of tables. This is |
The source file called pcre_chartables.c contains the default set of tables. By |
| 522 |
not supplied in the distribution, but is built by the program dftables |
default, this is created as a copy of pcre_chartables.c.dist, which contains |
| 523 |
(compiled from dftables.c), which uses the ANSI C character handling functions |
tables for ASCII coding. However, if --enable-rebuild-chartables is specified |
| 524 |
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table |
for ./configure, a different version of pcre_chartables.c is built by the |
| 525 |
sources. This means that the default C locale which is set for your system will |
program dftables (compiled from dftables.c), which uses the ANSI C character |
| 526 |
control the contents of these default tables. You can change the default tables |
handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to |
| 527 |
by editing chartables.c and then re-building PCRE. If you do this, you should |
build the table sources. This means that the default C locale which is set for |
| 528 |
take care to ensure that the file does not get automaticaly re-generated. |
your system will control the contents of these default tables. You can change |
| 529 |
|
the default tables by editing pcre_chartables.c and then re-building PCRE. If |
| 530 |
|
you do this, you should take care to ensure that the file does not get |
| 531 |
|
automatically re-generated. The best way to do this is to move |
| 532 |
|
pcre_chartables.c.dist out of the way and replace it with your customized |
| 533 |
|
tables. |
| 534 |
|
|
| 535 |
|
When the dftables program is run as a result of --enable-rebuild-chartables, |
| 536 |
|
it uses the default C locale that is set on your system. It does not pay |
| 537 |
|
attention to the LC_xxx environment variables. In other words, it uses the |
| 538 |
|
system's default locale rather than whatever the compiling user happens to have |
| 539 |
|
set. If you really do want to build a source set of character tables in a |
| 540 |
|
locale that is specified by the LC_xxx variables, you can run the dftables |
| 541 |
|
program by hand with the -L option. For example: |
| 542 |
|
|
| 543 |
|
./dftables -L pcre_chartables.c.special |
| 544 |
|
|
| 545 |
The first two 256-byte tables provide lower casing and case flipping functions, |
The first two 256-byte tables provide lower casing and case flipping functions, |
| 546 |
respectively. The next table consists of three 32-byte bit maps which identify |
respectively. The next table consists of three 32-byte bit maps which identify |
| 569 |
|
|
| 570 |
(A) Source files of the PCRE library functions and their headers: |
(A) Source files of the PCRE library functions and their headers: |
| 571 |
|
|
| 572 |
dftables.c auxiliary program for building chartables.c |
dftables.c auxiliary program for building pcre_chartables.c |
| 573 |
|
when --enable-rebuild-chartables is specified |
| 574 |
|
|
| 575 |
pcreposix.c ) |
pcre_chartables.c.dist a default set of character tables that assume ASCII |
| 576 |
pcre_compile.c ) |
coding; used, unless --enable-rebuild-chartables is |
| 577 |
pcre_config.c ) |
specified, by copying to pcre_chartables.c |
| 578 |
pcre_dfa_exec.c ) |
|
| 579 |
pcre_exec.c ) |
pcreposix.c ) |
| 580 |
pcre_fullinfo.c ) |
pcre_compile.c ) |
| 581 |
pcre_get.c ) sources for the functions in the library, |
pcre_config.c ) |
| 582 |
pcre_globals.c ) and some internal functions that they use |
pcre_dfa_exec.c ) |
| 583 |
pcre_info.c ) |
pcre_exec.c ) |
| 584 |
pcre_maketables.c ) |
pcre_fullinfo.c ) |
| 585 |
pcre_newline.c ) |
pcre_get.c ) sources for the functions in the library, |
| 586 |
pcre_ord2utf8.c ) |
pcre_globals.c ) and some internal functions that they use |
| 587 |
pcre_refcount.c ) |
pcre_info.c ) |
| 588 |
pcre_study.c ) |
pcre_maketables.c ) |
| 589 |
pcre_tables.c ) |
pcre_newline.c ) |
| 590 |
pcre_try_flipped.c ) |
pcre_ord2utf8.c ) |
| 591 |
pcre_ucp_searchfuncs.c ) |
pcre_refcount.c ) |
| 592 |
pcre_valid_utf8.c ) |
pcre_study.c ) |
| 593 |
pcre_version.c ) |
pcre_tables.c ) |
| 594 |
pcre_xclass.c ) |
pcre_try_flipped.c ) |
| 595 |
pcre_printint.src ) debugging function that is #included in pcretest, |
pcre_ucp_searchfuncs.c ) |
| 596 |
) and can also be #included in pcre_compile() |
pcre_valid_utf8.c ) |
| 597 |
pcre.h.in template for pcre.h when built by "configure" |
pcre_version.c ) |
| 598 |
pcreposix.h header for the external POSIX wrapper API |
pcre_xclass.c ) |
| 599 |
pcre_internal.h header for internal use |
pcre_printint.src ) debugging function that is #included in pcretest, |
| 600 |
ucp.h ) headers concerned with |
) and can also be #included in pcre_compile() |
| 601 |
ucpinternal.h ) Unicode property handling |
pcre.h.in template for pcre.h when built by "configure" |
| 602 |
ucptable.h ) (this one is the data table) |
pcreposix.h header for the external POSIX wrapper API |
| 603 |
|
pcre_internal.h header for internal use |
| 604 |
config.h.in template for config.h, which is built by "configure" |
ucp.h ) headers concerned with |
| 605 |
|
ucpinternal.h ) Unicode property handling |
| 606 |
pcrecpp.h public header file for the C++ wrapper |
ucptable.h ) (this one is the data table) |
| 607 |
pcrecpparg.h.in template for another C++ header file |
|
| 608 |
pcre_scanner.h public header file for C++ scanner functions |
config.h.in template for config.h, which is built by "configure" |
| 609 |
pcrecpp.cc ) |
|
| 610 |
pcre_scanner.cc ) source for the C++ wrapper library |
pcrecpp.h public header file for the C++ wrapper |
| 611 |
|
pcrecpparg.h.in template for another C++ header file |
| 612 |
pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the |
pcre_scanner.h public header file for C++ scanner functions |
| 613 |
C++ stringpiece functions |
pcrecpp.cc ) |
| 614 |
pcre_stringpiece.cc source for the C++ stringpiece functions |
pcre_scanner.cc ) source for the C++ wrapper library |
| 615 |
|
|
| 616 |
|
pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the |
| 617 |
|
C++ stringpiece functions |
| 618 |
|
pcre_stringpiece.cc source for the C++ stringpiece functions |
| 619 |
|
|
| 620 |
(B) Source files for programs that use PCRE: |
(B) Source files for programs that use PCRE: |
| 621 |
|
|
| 622 |
pcredemo.c simple demonstration of coding calls to PCRE |
pcredemo.c simple demonstration of coding calls to PCRE |
| 623 |
pcregrep.c source of a grep utility that uses PCRE |
pcregrep.c source of a grep utility that uses PCRE |
| 624 |
pcretest.c comprehensive test program |
pcretest.c comprehensive test program |
| 625 |
|
|
| 626 |
(C) Auxiliary files: |
(C) Auxiliary files: |
| 627 |
|
|
| 628 |
132html script to turn "man" pages into HTML |
132html script to turn "man" pages into HTML |
| 629 |
AUTHORS information about the author of PCRE |
AUTHORS information about the author of PCRE |
| 630 |
ChangeLog log of changes to the code |
ChangeLog log of changes to the code |
| 631 |
CleanTxt script to clean nroff output for txt man pages |
CleanTxt script to clean nroff output for txt man pages |
| 632 |
Detrail script to remove trailing spaces |
Detrail script to remove trailing spaces |
| 633 |
HACKING some notes about the internals of PCRE |
HACKING some notes about the internals of PCRE |
| 634 |
INSTALL generic installation instructions |
INSTALL generic installation instructions |
| 635 |
LICENCE conditions for the use of PCRE |
LICENCE conditions for the use of PCRE |
| 636 |
COPYING the same, using GNU's standard name |
COPYING the same, using GNU's standard name |
| 637 |
Makefile.in ) template for Unix Makefile, which is built by |
Makefile.in ) template for Unix Makefile, which is built by |
| 638 |
) "configure" |
) "configure" |
| 639 |
Makefile.am ) the automake input that was used to create |
Makefile.am ) the automake input that was used to create |
| 640 |
) Makefile.in |
) Makefile.in |
| 641 |
NEWS important changes in this release |
NEWS important changes in this release |
| 642 |
NON-UNIX-USE notes on building PCRE on non-Unix systems |
NON-UNIX-USE notes on building PCRE on non-Unix systems |
| 643 |
PrepareRelease script to make preparations for "make dist" |
PrepareRelease script to make preparations for "make dist" |
| 644 |
README this file |
README this file |
| 645 |
RunTest.in template for a Unix shell script for running tests |
RunTest.in template for a Unix shell script for running tests |
| 646 |
RunGrepTest.in template for a Unix shell script for pcregrep tests |
RunGrepTest.in template for a Unix shell script for pcregrep tests |
| 647 |
aclocal.m4 m4 macros (generated by "aclocal") |
aclocal.m4 m4 macros (generated by "aclocal") |
| 648 |
config.guess ) files used by libtool, |
config.guess ) files used by libtool, |
| 649 |
config.sub ) used only when building a shared library |
config.sub ) used only when building a shared library |
| 650 |
configure a configuring shell script (built by autoconf) |
configure a configuring shell script (built by autoconf) |
| 651 |
configure.ac ) the autoconf input that was used to build |
configure.ac ) the autoconf input that was used to build |
| 652 |
) "configure" and config.h |
) "configure" and config.h |
| 653 |
depcomp ) script to find program dependencies, generated by |
depcomp ) script to find program dependencies, generated by |
| 654 |
) automake |
) automake |
| 655 |
doc/*.3 man page sources for the PCRE functions |
doc/*.3 man page sources for the PCRE functions |
| 656 |
doc/*.1 man page sources for pcregrep and pcretest |
doc/*.1 man page sources for pcregrep and pcretest |
| 657 |
doc/index.html.src the base HTML page |
doc/index.html.src the base HTML page |
| 658 |
doc/html/* HTML documentation |
doc/html/* HTML documentation |
| 659 |
doc/pcre.txt plain text version of the man pages |
doc/pcre.txt plain text version of the man pages |
| 660 |
doc/pcretest.txt plain text documentation of test program |
doc/pcretest.txt plain text documentation of test program |
| 661 |
doc/perltest.txt plain text documentation of Perl test program |
doc/perltest.txt plain text documentation of Perl test program |
| 662 |
install-sh a shell script for installing files |
install-sh a shell script for installing files |
| 663 |
libpcre.pc.in template for libpcre.pc for pkg-config |
libpcre.pc.in template for libpcre.pc for pkg-config |
| 664 |
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config |
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config |
| 665 |
ltmain.sh file used to build a libtool script |
ltmain.sh file used to build a libtool script |
| 666 |
missing ) common stub for a few missing GNU programs while |
missing ) common stub for a few missing GNU programs while |
| 667 |
) installing, generated by automake |
) installing, generated by automake |
| 668 |
mkinstalldirs script for making install directories |
mkinstalldirs script for making install directories |
| 669 |
perltest.pl Perl test program |
perltest.pl Perl test program |
| 670 |
pcre-config.in source of script which retains PCRE information |
pcre-config.in source of script which retains PCRE information |
| 671 |
pcrecpp_unittest.cc ) |
pcrecpp_unittest.cc ) |
| 672 |
pcre_scanner_unittest.cc ) test programs for the C++ wrapper |
pcre_scanner_unittest.cc ) test programs for the C++ wrapper |
| 673 |
pcre_stringpiece_unittest.cc ) |
pcre_stringpiece_unittest.cc ) |
| 674 |
testdata/testinput* test data for main library tests |
testdata/testinput* test data for main library tests |
| 675 |
testdata/testoutput* expected test results |
testdata/testoutput* expected test results |
| 676 |
testdata/grep* input and output for pcregrep tests |
testdata/grep* input and output for pcregrep tests |
| 677 |
|
|
| 678 |
(D) Auxiliary files for cmake support |
(D) Auxiliary files for cmake support |
| 679 |
|
|
| 689 |
|
|
| 690 |
(F) Auxiliary files for building PCRE "by hand" |
(F) Auxiliary files for building PCRE "by hand" |
| 691 |
|
|
| 692 |
pcre.h.generic ) a version of the public PCRE header file |
pcre.h.generic ) a version of the public PCRE header file |
| 693 |
) for use in non-"configure" environments |
) for use in non-"configure" environments |
| 694 |
config.h.generic ) a version of config.h for use in non-"configure" |
config.h.generic ) a version of config.h for use in non-"configure" |
| 695 |
) environments |
) environments |
| 696 |
|
|
| 697 |
(F) Miscellaneous |
(F) Miscellaneous |
| 698 |
|
|
| 701 |
Philip Hazel |
Philip Hazel |
| 702 |
Email local part: ph10 |
Email local part: ph10 |
| 703 |
Email domain: cam.ac.uk |
Email domain: cam.ac.uk |
| 704 |
Last updated: March 2007 |
Last updated: 20 March 2007 |