| 197 |
8. Similarly, characters that match the POSIX named character classes |
8. Similarly, characters that match the POSIX named character classes |
| 198 |
are all low-valued characters. |
are all low-valued characters. |
| 199 |
|
|
| 200 |
9. Case-insensitive matching applies only to characters whose values |
9. However, the Perl 5.10 horizontal and vertical whitespace matching |
| 201 |
|
escapes (\h, \H, \v, and \V) do match all the appropriate Unicode char- |
| 202 |
|
acters. |
| 203 |
|
|
| 204 |
|
10. Case-insensitive matching applies only to characters whose values |
| 205 |
are less than 128, unless PCRE is built with Unicode property support. |
are less than 128, unless PCRE is built with Unicode property support. |
| 206 |
Even when Unicode property support is available, PCRE still uses its |
Even when Unicode property support is available, PCRE still uses its |
| 207 |
own character tables when checking the case of low-valued characters, |
own character tables when checking the case of low-valued characters, |
| 226 |
|
|
| 227 |
REVISION |
REVISION |
| 228 |
|
|
| 229 |
Last updated: 18 April 2007 |
Last updated: 13 June 2007 |
| 230 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 231 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
| 232 |
|
|
| 394 |
|
|
| 395 |
to the configure command. With this configuration, PCRE will use the |
to the configure command. With this configuration, PCRE will use the |
| 396 |
pcre_stack_malloc and pcre_stack_free variables to call memory manage- |
pcre_stack_malloc and pcre_stack_free variables to call memory manage- |
| 397 |
ment functions. Separate functions are provided because the usage is |
ment functions. By default these point to malloc() and free(), but you |
| 398 |
very predictable: the block sizes requested are always the same, and |
can replace the pointers so that your own functions are used. |
| 399 |
the blocks are always freed in reverse order. A calling program might |
|
| 400 |
be able to implement optimized functions that perform better than the |
Separate functions are provided rather than using pcre_malloc and |
| 401 |
standard malloc() and free() functions. PCRE runs noticeably more |
pcre_free because the usage is very predictable: the block sizes |
| 402 |
slowly when built in this way. This option affects only the pcre_exec() |
requested are always the same, and the blocks are always freed in |
| 403 |
function; it is not relevant for the the pcre_dfa_exec() function. |
reverse order. A calling program might be able to implement optimized |
| 404 |
|
functions that perform better than malloc() and free(). PCRE runs |
| 405 |
|
noticeably more slowly when built in this way. This option affects only |
| 406 |
|
the pcre_exec() function; it is not relevant for the the |
| 407 |
|
pcre_dfa_exec() function. |
| 408 |
|
|
| 409 |
|
|
| 410 |
LIMITING PCRE RESOURCE USAGE |
LIMITING PCRE RESOURCE USAGE |
| 482 |
|
|
| 483 |
REVISION |
REVISION |
| 484 |
|
|
| 485 |
Last updated: 16 April 2007 |
Last updated: 05 June 2007 |
| 486 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 487 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
| 488 |
|
|
| 1267 |
26 malformed number or name after (?( |
26 malformed number or name after (?( |
| 1268 |
27 conditional group contains more than two branches |
27 conditional group contains more than two branches |
| 1269 |
28 assertion expected after (?( |
28 assertion expected after (?( |
| 1270 |
29 (?R or (?digits must be followed by ) |
29 (?R or (?[+-]digits must be followed by ) |
| 1271 |
30 unknown POSIX class name |
30 unknown POSIX class name |
| 1272 |
31 POSIX collating elements are not supported |
31 POSIX collating elements are not supported |
| 1273 |
32 this version of PCRE is not compiled with PCRE_UTF8 support |
32 this version of PCRE is not compiled with PCRE_UTF8 support |
| 1296 |
54 DEFINE group contains more than one branch |
54 DEFINE group contains more than one branch |
| 1297 |
55 repeating a DEFINE group is not allowed |
55 repeating a DEFINE group is not allowed |
| 1298 |
56 inconsistent NEWLINE options" |
56 inconsistent NEWLINE options" |
| 1299 |
|
57 \g is not followed by a braced name or an optionally braced |
| 1300 |
|
non-zero number |
| 1301 |
|
58 (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number |
| 1302 |
|
|
| 1303 |
|
|
| 1304 |
STUDYING A PATTERN |
STUDYING A PATTERN |
| 1491 |
|
|
| 1492 |
Return 1 if the (?J) option setting is used in the pattern, otherwise |
Return 1 if the (?J) option setting is used in the pattern, otherwise |
| 1493 |
0. The fourth argument should point to an int variable. The (?J) inter- |
0. The fourth argument should point to an int variable. The (?J) inter- |
| 1494 |
nal option setting changes the local PCRE_DUPNAMES value. |
nal option setting changes the local PCRE_DUPNAMES option. |
| 1495 |
|
|
| 1496 |
PCRE_INFO_LASTLITERAL |
PCRE_INFO_LASTLITERAL |
| 1497 |
|
|
| 2417 |
|
|
| 2418 |
REVISION |
REVISION |
| 2419 |
|
|
| 2420 |
Last updated: 04 June 2007 |
Last updated: 13 June 2007 |
| 2421 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 2422 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
| 2423 |
|
|
| 2604 |
|
|
| 2605 |
This document describes the differences in the ways that PCRE and Perl |
This document describes the differences in the ways that PCRE and Perl |
| 2606 |
handle regular expressions. The differences described here are mainly |
handle regular expressions. The differences described here are mainly |
| 2607 |
with respect to Perl 5.8, though PCRE version 7.0 contains some fea- |
with respect to Perl 5.8, though PCRE versions 7.0 and later contain |
| 2608 |
tures that are expected to be in the forthcoming Perl 5.10. |
some features that are expected to be in the forthcoming Perl 5.10. |
| 2609 |
|
|
| 2610 |
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details |
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details |
| 2611 |
of what it does have are given in the section on UTF-8 support in the |
of what it does have are given in the section on UTF-8 support in the |
| 2683 |
meta-character matches only at the very end of the string. |
meta-character matches only at the very end of the string. |
| 2684 |
|
|
| 2685 |
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe- |
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe- |
| 2686 |
cial meaning is faulted. Otherwise, like Perl, the backslash is |
cial meaning is faulted. Otherwise, like Perl, the backslash is quietly |
| 2687 |
ignored. (Perl can be made to issue a warning.) |
ignored. (Perl can be made to issue a warning.) |
| 2688 |
|
|
| 2689 |
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quanti- |
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quanti- |
| 2690 |
fiers is inverted, that is, by default they are not greedy, but if fol- |
fiers is inverted, that is, by default they are not greedy, but if fol- |
| 2716 |
|
|
| 2717 |
REVISION |
REVISION |
| 2718 |
|
|
| 2719 |
Last updated: 06 March 2007 |
Last updated: 13 June 2007 |
| 2720 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 2721 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
| 2722 |
|
|
| 2949 |
|
|
| 2950 |
\d any decimal digit |
\d any decimal digit |
| 2951 |
\D any character that is not a decimal digit |
\D any character that is not a decimal digit |
| 2952 |
|
\h any horizontal whitespace character |
| 2953 |
|
\H any character that is not a horizontal whitespace character |
| 2954 |
\s any whitespace character |
\s any whitespace character |
| 2955 |
\S any character that is not a whitespace character |
\S any character that is not a whitespace character |
| 2956 |
|
\v any vertical whitespace character |
| 2957 |
|
\V any character that is not a vertical whitespace character |
| 2958 |
\w any "word" character |
\w any "word" character |
| 2959 |
\W any "non-word" character |
\W any "non-word" character |
| 2960 |
|
|
| 2969 |
|
|
| 2970 |
For compatibility with Perl, \s does not match the VT character (code |
For compatibility with Perl, \s does not match the VT character (code |
| 2971 |
11). This makes it different from the the POSIX "space" class. The \s |
11). This makes it different from the the POSIX "space" class. The \s |
| 2972 |
characters are HT (9), LF (10), FF (12), CR (13), and space (32). (If |
characters are HT (9), LF (10), FF (12), CR (13), and space (32). If |
| 2973 |
"use locale;" is included in a Perl script, \s may match the VT charac- |
"use locale;" is included in a Perl script, \s may match the VT charac- |
| 2974 |
ter. In PCRE, it never does.) |
ter. In PCRE, it never does. |
| 2975 |
|
|
| 2976 |
|
In UTF-8 mode, characters with values greater than 128 never match \d, |
| 2977 |
|
\s, or \w, and always match \D, \S, and \W. This is true even when Uni- |
| 2978 |
|
code character property support is available. These sequences retain |
| 2979 |
|
their original meanings from before UTF-8 support was available, mainly |
| 2980 |
|
for efficiency reasons. |
| 2981 |
|
|
| 2982 |
|
The sequences \h, \H, \v, and \V are Perl 5.10 features. In contrast to |
| 2983 |
|
the other sequences, these do match certain high-valued codepoints in |
| 2984 |
|
UTF-8 mode. The horizontal space characters are: |
| 2985 |
|
|
| 2986 |
|
U+0009 Horizontal tab |
| 2987 |
|
U+0020 Space |
| 2988 |
|
U+00A0 Non-break space |
| 2989 |
|
U+1680 Ogham space mark |
| 2990 |
|
U+180E Mongolian vowel separator |
| 2991 |
|
U+2000 En quad |
| 2992 |
|
U+2001 Em quad |
| 2993 |
|
U+2002 En space |
| 2994 |
|
U+2003 Em space |
| 2995 |
|
U+2004 Three-per-em space |
| 2996 |
|
U+2005 Four-per-em space |
| 2997 |
|
U+2006 Six-per-em space |
| 2998 |
|
U+2007 Figure space |
| 2999 |
|
U+2008 Punctuation space |
| 3000 |
|
U+2009 Thin space |
| 3001 |
|
U+200A Hair space |
| 3002 |
|
U+202F Narrow no-break space |
| 3003 |
|
U+205F Medium mathematical space |
| 3004 |
|
U+3000 Ideographic space |
| 3005 |
|
|
| 3006 |
|
The vertical space characters are: |
| 3007 |
|
|
| 3008 |
|
U+000A Linefeed |
| 3009 |
|
U+000B Vertical tab |
| 3010 |
|
U+000C Formfeed |
| 3011 |
|
U+000D Carriage return |
| 3012 |
|
U+0085 Next line |
| 3013 |
|
U+2028 Line separator |
| 3014 |
|
U+2029 Paragraph separator |
| 3015 |
|
|
| 3016 |
A "word" character is an underscore or any character less than 256 that |
A "word" character is an underscore or any character less than 256 that |
| 3017 |
is a letter or digit. The definition of letters and digits is con- |
is a letter or digit. The definition of letters and digits is con- |
| 3019 |
specific matching is taking place (see "Locale support" in the pcreapi |
specific matching is taking place (see "Locale support" in the pcreapi |
| 3020 |
page). For example, in a French locale such as "fr_FR" in Unix-like |
page). For example, in a French locale such as "fr_FR" in Unix-like |
| 3021 |
systems, or "french" in Windows, some character codes greater than 128 |
systems, or "french" in Windows, some character codes greater than 128 |
| 3022 |
are used for accented letters, and these are matched by \w. |
are used for accented letters, and these are matched by \w. The use of |
| 3023 |
|
locales with Unicode is discouraged. |
|
In UTF-8 mode, characters with values greater than 128 never match \d, |
|
|
\s, or \w, and always match \D, \S, and \W. This is true even when Uni- |
|
|
code character property support is available. The use of locales with |
|
|
Unicode is discouraged. |
|
| 3024 |
|
|
| 3025 |
Newline sequences |
Newline sequences |
| 3026 |
|
|
| 3027 |
Outside a character class, the escape sequence \R matches any Unicode |
Outside a character class, the escape sequence \R matches any Unicode |
| 3028 |
newline sequence. This is an extension to Perl. In non-UTF-8 mode \R is |
newline sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is |
| 3029 |
equivalent to the following: |
equivalent to the following: |
| 3030 |
|
|
| 3031 |
(?>\r\n|\n|\x0b|\f|\r|\x85) |
(?>\r\n|\n|\x0b|\f|\r|\x85) |
| 3588 |
"Saturday". |
"Saturday". |
| 3589 |
|
|
| 3590 |
|
|
| 3591 |
|
DUPLICATE SUBPATTERN NUMBERS |
| 3592 |
|
|
| 3593 |
|
Perl 5.10 introduced a feature whereby each alternative in a subpattern |
| 3594 |
|
uses the same numbers for its capturing parentheses. Such a subpattern |
| 3595 |
|
starts with (?| and is itself a non-capturing subpattern. For example, |
| 3596 |
|
consider this pattern: |
| 3597 |
|
|
| 3598 |
|
(?|(Sat)ur|(Sun))day |
| 3599 |
|
|
| 3600 |
|
Because the two alternatives are inside a (?| group, both sets of cap- |
| 3601 |
|
turing parentheses are numbered one. Thus, when the pattern matches, |
| 3602 |
|
you can look at captured substring number one, whichever alternative |
| 3603 |
|
matched. This construct is useful when you want to capture part, but |
| 3604 |
|
not all, of one of a number of alternatives. Inside a (?| group, paren- |
| 3605 |
|
theses are numbered as usual, but the number is reset at the start of |
| 3606 |
|
each branch. The numbers of any capturing buffers that follow the sub- |
| 3607 |
|
pattern start after the highest number used in any branch. The follow- |
| 3608 |
|
ing example is taken from the Perl documentation. The numbers under- |
| 3609 |
|
neath show in which buffer the captured content will be stored. |
| 3610 |
|
|
| 3611 |
|
# before ---------------branch-reset----------- after |
| 3612 |
|
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x |
| 3613 |
|
# 1 2 2 3 2 3 4 |
| 3614 |
|
|
| 3615 |
|
A backreference or a recursive call to a numbered subpattern always |
| 3616 |
|
refers to the first one in the pattern with the given number. |
| 3617 |
|
|
| 3618 |
|
An alternative approach to using this "branch reset" feature is to use |
| 3619 |
|
duplicate named subpatterns, as described in the next section. |
| 3620 |
|
|
| 3621 |
|
|
| 3622 |
NAMED SUBPATTERNS |
NAMED SUBPATTERNS |
| 3623 |
|
|
| 3624 |
Identifying capturing parentheses by number is simple, but it can be |
Identifying capturing parentheses by number is simple, but it can be |
| 3658 |
(?<DN>Sat)(?:urday)? |
(?<DN>Sat)(?:urday)? |
| 3659 |
|
|
| 3660 |
There are five capturing substrings, but only one is ever set after a |
There are five capturing substrings, but only one is ever set after a |
| 3661 |
match. The convenience function for extracting the data by name |
match. (An alternative way of solving this problem is to use a "branch |
| 3662 |
returns the substring for the first (and in this example, the only) |
reset" subpattern, as described in the previous section.) |
| 3663 |
subpattern of that name that matched. This saves searching to find |
|
| 3664 |
which numbered subpattern it was. If you make a reference to a non- |
The convenience function for extracting the data by name returns the |
| 3665 |
unique named subpattern from elsewhere in the pattern, the one that |
substring for the first (and in this example, the only) subpattern of |
| 3666 |
corresponds to the lowest number is used. For further details of the |
that name that matched. This saves searching to find which numbered |
| 3667 |
interfaces for handling named subpatterns, see the pcreapi documenta- |
subpattern it was. If you make a reference to a non-unique named sub- |
| 3668 |
tion. |
pattern from elsewhere in the pattern, the one that corresponds to the |
| 3669 |
|
lowest number is used. For further details of the interfaces for han- |
| 3670 |
|
dling named subpatterns, see the pcreapi documentation. |
| 3671 |
|
|
| 3672 |
|
|
| 3673 |
REPETITION |
REPETITION |
| 4539 |
|
|
| 4540 |
REVISION |
REVISION |
| 4541 |
|
|
| 4542 |
Last updated: 29 May 2007 |
Last updated: 13 June 2007 |
| 4543 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 4544 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
| 4545 |
|
|
| 4870 |
|
|
| 4871 |
COMPATIBILITY WITH DIFFERENT PCRE RELEASES |
COMPATIBILITY WITH DIFFERENT PCRE RELEASES |
| 4872 |
|
|
| 4873 |
The layout of the control block that is at the start of the data that |
In general, it is safest to recompile all saved patterns when you |
| 4874 |
makes up a compiled pattern was changed for release 5.0. If you have |
update to a new PCRE release, though not all updates actually require |
| 4875 |
any saved patterns that were compiled with previous releases (not a |
this. Recompiling is definitely needed for release 7.2. |
|
facility that was previously advertised), you will have to recompile |
|
|
them for release 5.0 and above. |
|
|
|
|
|
If you have any saved patterns in UTF-8 mode that use \p or \P that |
|
|
were compiled with any release up to and including 6.4, you will have |
|
|
to recompile them for release 6.5 and above. |
|
|
|
|
|
All saved patterns from earlier releases must be recompiled for release |
|
|
7.0 or higher, because there was an internal reorganization at that |
|
|
release. |
|
| 4876 |
|
|
| 4877 |
|
|
| 4878 |
AUTHOR |
AUTHOR |
| 4884 |
|
|
| 4885 |
REVISION |
REVISION |
| 4886 |
|
|
| 4887 |
Last updated: 24 April 2007 |
Last updated: 13 June 2007 |
| 4888 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 4889 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
| 4890 |
|
|
| 5619 |
bility of matching an empty string. Comments in the code explain what |
bility of matching an empty string. Comments in the code explain what |
| 5620 |
is going on. |
is going on. |
| 5621 |
|
|
| 5622 |
If PCRE is installed in the standard include and library directories |
The demonstration program is automatically built if you use "./config- |
| 5623 |
for your system, you should be able to compile the demonstration pro- |
ure;make" to build PCRE. Otherwise, if PCRE is installed in the stan- |
| 5624 |
gram using this command: |
dard include and library directories for your system, you should be |
| 5625 |
|
able to compile the demonstration program using this command: |
| 5626 |
|
|
| 5627 |
gcc -o pcredemo pcredemo.c -lpcre |
gcc -o pcredemo pcredemo.c -lpcre |
| 5628 |
|
|
| 5629 |
If PCRE is installed elsewhere, you may need to add additional options |
If PCRE is installed elsewhere, you may need to add additional options |
| 5630 |
to the command line. For example, on a Unix-like system that has PCRE |
to the command line. For example, on a Unix-like system that has PCRE |
| 5631 |
installed in /usr/local, you can compile the demonstration program |
installed in /usr/local, you can compile the demonstration program |
| 5632 |
using a command like this: |
using a command like this: |
| 5633 |
|
|
| 5634 |
gcc -o pcredemo -I/usr/local/include pcredemo.c \ |
gcc -o pcredemo -I/usr/local/include pcredemo.c \ |
| 5635 |
-L/usr/local/lib -lpcre |
-L/usr/local/lib -lpcre |
| 5636 |
|
|
| 5637 |
Once you have compiled the demonstration program, you can run simple |
Once you have compiled the demonstration program, you can run simple |
| 5638 |
tests like this: |
tests like this: |
| 5639 |
|
|
| 5640 |
./pcredemo 'cat|dog' 'the cat sat on the mat' |
./pcredemo 'cat|dog' 'the cat sat on the mat' |
| 5641 |
./pcredemo -g 'cat|dog' 'the dog sat on the cat' |
./pcredemo -g 'cat|dog' 'the dog sat on the cat' |
| 5642 |
|
|
| 5643 |
Note that there is a much more comprehensive test program, called |
Note that there is a much more comprehensive test program, called |
| 5644 |
pcretest, which supports many more facilities for testing regular |
pcretest, which supports many more facilities for testing regular |
| 5645 |
expressions and the PCRE library. The pcredemo program is provided as a |
expressions and the PCRE library. The pcredemo program is provided as a |
| 5646 |
simple coding example. |
simple coding example. |
| 5647 |
|
|
| 5649 |
the standard library directory, you may get an error like this when you |
the standard library directory, you may get an error like this when you |
| 5650 |
try to run pcredemo: |
try to run pcredemo: |
| 5651 |
|
|
| 5652 |
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or |
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or |
| 5653 |
directory |
directory |
| 5654 |
|
|
| 5655 |
This is caused by the way shared library support works on those sys- |
This is caused by the way shared library support works on those sys- |
| 5656 |
tems. You need to add |
tems. You need to add |
| 5657 |
|
|
| 5658 |
-R/usr/local/lib |
-R/usr/local/lib |
| 5669 |
|
|
| 5670 |
REVISION |
REVISION |
| 5671 |
|
|
| 5672 |
Last updated: 06 March 2007 |
Last updated: 13 June 2007 |
| 5673 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 5674 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
| 5675 |
PCRESTACK(3) PCRESTACK(3) |
PCRESTACK(3) PCRESTACK(3) |
| 5739 |
In environments where stack memory is constrained, you might want to |
In environments where stack memory is constrained, you might want to |
| 5740 |
compile PCRE to use heap memory instead of stack for remembering back- |
compile PCRE to use heap memory instead of stack for remembering back- |
| 5741 |
up points. This makes it run a lot more slowly, however. Details of how |
up points. This makes it run a lot more slowly, however. Details of how |
| 5742 |
to do this are given in the pcrebuild documentation. |
to do this are given in the pcrebuild documentation. When built in this |
| 5743 |
|
way, instead of using the stack, PCRE obtains and frees memory by call- |
| 5744 |
In Unix-like environments, there is not often a problem with the stack |
ing the functions that are pointed to by the pcre_stack_malloc and |
| 5745 |
unless very long strings are involved, though the default limit on |
pcre_stack_free variables. By default, these point to malloc() and |
| 5746 |
stack size varies from system to system. Values from 8Mb to 64Mb are |
free(), but you can replace the pointers to cause PCRE to use your own |
| 5747 |
|
functions. Since the block sizes are always the same, and are always |
| 5748 |
|
freed in reverse order, it may be possible to implement customized mem- |
| 5749 |
|
ory handlers that are more efficient than the standard functions. |
| 5750 |
|
|
| 5751 |
|
In Unix-like environments, there is not often a problem with the stack |
| 5752 |
|
unless very long strings are involved, though the default limit on |
| 5753 |
|
stack size varies from system to system. Values from 8Mb to 64Mb are |
| 5754 |
common. You can find your default limit by running the command: |
common. You can find your default limit by running the command: |
| 5755 |
|
|
| 5756 |
ulimit -s |
ulimit -s |
| 5757 |
|
|
| 5758 |
Unfortunately, the effect of running out of stack is often SIGSEGV, |
Unfortunately, the effect of running out of stack is often SIGSEGV, |
| 5759 |
though sometimes a more explicit error message is given. You can nor- |
though sometimes a more explicit error message is given. You can nor- |
| 5760 |
mally increase the limit on stack size by code such as this: |
mally increase the limit on stack size by code such as this: |
| 5761 |
|
|
| 5762 |
struct rlimit rlim; |
struct rlimit rlim; |
| 5764 |
rlim.rlim_cur = 100*1024*1024; |
rlim.rlim_cur = 100*1024*1024; |
| 5765 |
setrlimit(RLIMIT_STACK, &rlim); |
setrlimit(RLIMIT_STACK, &rlim); |
| 5766 |
|
|
| 5767 |
This reads the current limits (soft and hard) using getrlimit(), then |
This reads the current limits (soft and hard) using getrlimit(), then |
| 5768 |
attempts to increase the soft limit to 100Mb using setrlimit(). You |
attempts to increase the soft limit to 100Mb using setrlimit(). You |
| 5769 |
must do this before calling pcre_exec(). |
must do this before calling pcre_exec(). |
| 5770 |
|
|
| 5771 |
PCRE has an internal counter that can be used to limit the depth of |
PCRE has an internal counter that can be used to limit the depth of |
| 5772 |
recursion, and thus cause pcre_exec() to give an error code before it |
recursion, and thus cause pcre_exec() to give an error code before it |
| 5773 |
runs out of stack. By default, the limit is very large, and unlikely |
runs out of stack. By default, the limit is very large, and unlikely |
| 5774 |
ever to operate. It can be changed when PCRE is built, and it can also |
ever to operate. It can be changed when PCRE is built, and it can also |
| 5775 |
be set when pcre_exec() is called. For details of these interfaces, see |
be set when pcre_exec() is called. For details of these interfaces, see |
| 5776 |
the pcrebuild and pcreapi documentation. |
the pcrebuild and pcreapi documentation. |
| 5777 |
|
|
| 5778 |
As a very rough rule of thumb, you should reckon on about 500 bytes per |
As a very rough rule of thumb, you should reckon on about 500 bytes per |
| 5779 |
recursion. Thus, if you want to limit your stack usage to 8Mb, you |
recursion. Thus, if you want to limit your stack usage to 8Mb, you |
| 5780 |
should set the limit at 16000 recursions. A 64Mb stack, on the other |
should set the limit at 16000 recursions. A 64Mb stack, on the other |
| 5781 |
hand, can support around 128000 recursions. The pcretest test program |
hand, can support around 128000 recursions. The pcretest test program |
| 5782 |
has a command line option (-S) that can be used to increase the size of |
has a command line option (-S) that can be used to increase the size of |
| 5783 |
its stack. |
its stack. |
| 5784 |
|
|
| 5792 |
|
|
| 5793 |
REVISION |
REVISION |
| 5794 |
|
|
| 5795 |
Last updated: 12 March 2007 |
Last updated: 05 June 2007 |
| 5796 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
| 5797 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
| 5798 |
|
|