/[pcre]/code/trunk/doc/pcrestack.3
ViewVC logotype

Contents of /code/trunk/doc/pcrestack.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 654 - (hide annotations) (download)
Tue Aug 2 11:00:40 2011 UTC (21 months, 3 weeks ago) by ph10
File size: 7207 byte(s)
Documentation and general text tidies in preparation for test release.

1 nigel 91 .TH PCRESTACK 3
2     .SH NAME
3     PCRE - Perl-compatible regular expressions
4     .SH "PCRE DISCUSSION OF STACK USAGE"
5     .rs
6     .sp
7     When you call \fBpcre_exec()\fP, it makes use of an internal function called
8     \fBmatch()\fP. This calls itself recursively at branch points in the pattern,
9     in order to remember the state of the match so that it can back up and try a
10     different alternative if the first one fails. As matching proceeds deeper and
11 ph10 631 deeper into the tree of possibilities, the recursion depth increases. The
12 ph10 654 \fBmatch()\fP function is also called in other circumstances, for example,
13 ph10 631 whenever a parenthesized sub-pattern is entered, and in certain cases of
14     repetition.
15 nigel 91 .P
16     Not all calls of \fBmatch()\fP increase the recursion depth; for an item such
17     as a* it may be called several times at the same level, after matching
18     different numbers of a's. Furthermore, in a number of cases where the result of
19     the recursive call would immediately be passed back as the result of the
20     current call (a "tail recursion"), the function is just restarted instead.
21     .P
22     The \fBpcre_dfa_exec()\fP function operates in an entirely different way, and
23 ph10 480 uses recursion only when there is a regular expression recursion or subroutine
24     call in the pattern. This includes the processing of assertion and "once-only"
25     subpatterns, which are handled like subroutine calls. Normally, these are never
26     very deep, and the limit on the complexity of \fBpcre_dfa_exec()\fP is
27     controlled by the amount of workspace it is given. However, it is possible to
28     write patterns with runaway infinite recursions; such patterns will cause
29     \fBpcre_dfa_exec()\fP to run out of stack. At present, there is no protection
30     against this.
31 nigel 91 .P
32 ph10 480 The comments that follow do NOT apply to \fBpcre_dfa_exec()\fP; they are
33     relevant only for \fBpcre_exec()\fP.
34     .
35     .
36     .SS "Reducing \fBpcre_exec()\fP's stack usage"
37     .rs
38     .sp
39 nigel 91 Each time that \fBmatch()\fP is actually called recursively, it uses memory
40     from the process stack. For certain kinds of pattern and data, very large
41     amounts of stack may be needed, despite the recognition of "tail recursion".
42     You can often reduce the amount of recursion, and therefore the amount of stack
43     used, by modifying the pattern that is being matched. Consider, for example,
44     this pattern:
45     .sp
46     ([^<]|<(?!inet))+
47     .sp
48     It matches from wherever it starts until it encounters "<inet" or the end of
49     the data, and is the kind of pattern that might be used when processing an XML
50     file. Each iteration of the outer parentheses matches either one character that
51     is not "<" or a "<" that is not followed by "inet". However, each time a
52     parenthesis is processed, a recursion occurs, so this formulation uses a stack
53     frame for each matched character. For a long string, a lot of stack is
54     required. Consider now this rewritten pattern, which matches exactly the same
55     strings:
56     .sp
57 ph10 122 ([^<]++|<(?!inet))+
58 nigel 91 .sp
59     This uses very much less stack, because runs of characters that do not contain
60     "<" are "swallowed" in one item inside the parentheses. Recursion happens only
61     when a "<" character that is not followed by "inet" is encountered (and we
62     assume this is relatively rare). A possessive quantifier is used to stop any
63     backtracking into the runs of non-"<" characters, but that is not related to
64     stack usage.
65     .P
66 nigel 93 This example shows that one way of avoiding stack problems when matching long
67     subject strings is to write repeated parenthesized subpatterns to match more
68     than one character whenever possible.
69 ph10 358 .
70 ph10 480 .
71     .SS "Compiling PCRE to use heap instead of stack for \fBpcre_exec()\fP"
72 ph10 358 .rs
73     .sp
74 nigel 91 In environments where stack memory is constrained, you might want to compile
75 ph10 480 PCRE to use heap memory instead of stack for remembering back-up points when
76     \fBpcre_exec()\fP is running. This makes it run a lot more slowly, however.
77     Details of how to do this are given in the
78 nigel 91 .\" HREF
79     \fBpcrebuild\fP
80     .\"
81 ph10 174 documentation. When built in this way, instead of using the stack, PCRE obtains
82     and frees memory by calling the functions that are pointed to by the
83     \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables. By default, these
84     point to \fBmalloc()\fP and \fBfree()\fP, but you can replace the pointers to
85 ph10 182 cause PCRE to use your own functions. Since the block sizes are always the
86     same, and are always freed in reverse order, it may be possible to implement
87 ph10 174 customized memory handlers that are more efficient than the standard functions.
88 ph10 358 .
89 ph10 480 .
90     .SS "Limiting \fBpcre_exec()\fP's stack usage"
91 ph10 358 .rs
92     .sp
93 ph10 480 You can set limits on the number of times that \fBmatch()\fP is called, both in
94     total and recursively. If a limit is exceeded, \fBpcre_exec()\fP returns an
95     error code. Setting suitable limits should prevent it from running out of
96     stack. The default values of the limits are very large, and unlikely ever to
97     operate. They can be changed when PCRE is built, and they can also be set when
98 ph10 358 \fBpcre_exec()\fP is called. For details of these interfaces, see the
99     .\" HREF
100     \fBpcrebuild\fP
101     .\"
102 ph10 480 documentation and the
103     .\" HTML <a href="pcreapi.html#extradata">
104     .\" </a>
105     section on extra data for \fBpcre_exec()\fP
106     .\"
107     in the
108 ph10 358 .\" HREF
109     \fBpcreapi\fP
110     .\"
111     documentation.
112 nigel 91 .P
113 ph10 358 As a very rough rule of thumb, you should reckon on about 500 bytes per
114     recursion. Thus, if you want to limit your stack usage to 8Mb, you
115     should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
116 ph10 487 support around 128000 recursions.
117 ph10 480 .P
118     In Unix-like environments, the \fBpcretest\fP test program has a command line
119     option (\fB-S\fP) that can be used to increase the size of its stack. As long
120     as the stack is large enough, another option (\fB-M\fP) can be used to find the
121     smallest limits that allow a particular pattern to match a given subject
122     string. This is done by calling \fBpcre_exec()\fP repeatedly with different
123     limits.
124 ph10 358 .
125 ph10 480 .
126 ph10 358 .SS "Changing stack size in Unix-like systems"
127     .rs
128     .sp
129 nigel 93 In Unix-like environments, there is not often a problem with the stack unless
130     very long strings are involved, though the default limit on stack size varies
131     from system to system. Values from 8Mb to 64Mb are common. You can find your
132     default limit by running the command:
133 nigel 91 .sp
134     ulimit -s
135     .sp
136 nigel 93 Unfortunately, the effect of running out of stack is often SIGSEGV, though
137     sometimes a more explicit error message is given. You can normally increase the
138     limit on stack size by code such as this:
139 nigel 91 .sp
140     struct rlimit rlim;
141     getrlimit(RLIMIT_STACK, &rlim);
142     rlim.rlim_cur = 100*1024*1024;
143     setrlimit(RLIMIT_STACK, &rlim);
144     .sp
145     This reads the current limits (soft and hard) using \fBgetrlimit()\fP, then
146     attempts to increase the soft limit to 100Mb using \fBsetrlimit()\fP. You must
147     do this before calling \fBpcre_exec()\fP.
148 ph10 358 .
149 ph10 480 .
150 ph10 358 .SS "Changing stack size in Mac OS X"
151     .rs
152     .sp
153     Using \fBsetrlimit()\fP, as described above, should also work on Mac OS X. It
154     is also possible to set a stack size when linking a program. There is a
155     discussion about stack sizes in Mac OS X at this web site:
156     .\" HTML <a href="http://developer.apple.com/qa/qa2005/qa1419.html">
157     .\" </a>
158     http://developer.apple.com/qa/qa2005/qa1419.html.
159 nigel 91 .\"
160 ph10 99 .
161     .
162     .SH AUTHOR
163     .rs
164     .sp
165     .nf
166     Philip Hazel
167     University Computing Service
168     Cambridge CB2 3QH, England.
169     .fi
170     .
171     .
172     .SH REVISION
173     .rs
174     .sp
175     .nf
176 ph10 631 Last updated: 22 July 2011
177     Copyright (c) 1997-2011 University of Cambridge.
178 ph10 99 .fi

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12