/[pcre]/code/trunk/doc/html/pcrecallout.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcrecallout.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 869 - (hide annotations) (download) (as text)
Sat Jan 14 11:16:23 2012 UTC (16 months, 1 week ago) by ph10
File MIME type: text/html
File size: 9766 byte(s)
Bring HTML docs up to date.

1 nigel 63 <html>
2     <head>
3     <title>pcrecallout specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 nigel 75 <h1>pcrecallout man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10 ph10 111 <p>
11 nigel 75 This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14 ph10 111 <br>
15 nigel 63 <ul>
16     <li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a>
17 nigel 75 <li><a name="TOC2" href="#SEC2">MISSING CALLOUTS</a>
18     <li><a name="TOC3" href="#SEC3">THE CALLOUT INTERFACE</a>
19     <li><a name="TOC4" href="#SEC4">RETURN VALUES</a>
20 ph10 99 <li><a name="TOC5" href="#SEC5">AUTHOR</a>
21     <li><a name="TOC6" href="#SEC6">REVISION</a>
22 nigel 63 </ul>
23     <br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br>
24     <P>
25     <b>int (*pcre_callout)(pcre_callout_block *);</b>
26     </P>
27     <P>
28 ph10 869 <b>int (*pcre16_callout)(pcre16_callout_block *);</b>
29     </P>
30     <P>
31 nigel 63 PCRE provides a feature called "callout", which is a means of temporarily
32     passing control to the caller of PCRE in the middle of pattern matching. The
33     caller of PCRE provides an external function by putting its entry point in the
34 ph10 869 global variable <i>pcre_callout</i> (<i>pcre16_callout</i> for the 16-bit
35     library). By default, this variable contains NULL, which disables all calling
36     out.
37 nigel 63 </P>
38     <P>
39     Within a regular expression, (?C) indicates the points at which the external
40     function is to be called. Different callout points can be identified by putting
41     a number less than 256 after the letter C. The default value is zero.
42     For example, this pattern has two callout points:
43 nigel 75 <pre>
44 ph10 155 (?C1)abc(?C2)def
45 nigel 75 </pre>
46 ph10 869 If the PCRE_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE
47     automatically inserts callouts, all with number 255, before each item in the
48     pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
49 nigel 75 <pre>
50     A(\d{2}|--)
51     </pre>
52     it is processed as if it were
53     <br>
54     <br>
55     (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
56     <br>
57     <br>
58     Notice that there is a callout before and after each parenthesis and
59     alternation bar. Automatic callouts can be used for tracking the progress of
60     pattern matching. The
61     <a href="pcretest.html"><b>pcretest</b></a>
62     command has an option that sets automatic callouts; when it is used, the output
63     indicates how the pattern is matched. This is useful information when you are
64     trying to optimize the performance of a particular pattern.
65 nigel 63 </P>
66 ph10 691 <P>
67     The use of callouts in a pattern makes it ineligible for optimization by the
68     just-in-time compiler. Studying such a pattern with the PCRE_STUDY_JIT_COMPILE
69     option always fails.
70     </P>
71 nigel 75 <br><a name="SEC2" href="#TOC1">MISSING CALLOUTS</a><br>
72 nigel 63 <P>
73 nigel 75 You should be aware that, because of optimizations in the way PCRE matches
74 ph10 392 patterns by default, callouts sometimes do not happen. For example, if the
75     pattern is
76 nigel 63 <pre>
77 nigel 75 ab(?C4)cd
78     </pre>
79     PCRE knows that any matching string must contain the letter "d". If the subject
80     string is "abyz", the lack of "d" means that matching doesn't ever start, and
81     the callout is never reached. However, with "abyd", though the result is still
82     no match, the callout is obeyed.
83 nigel 63 </P>
84 ph10 392 <P>
85 ph10 461 If the pattern is studied, PCRE knows the minimum length of a matching string,
86     and will immediately give a "no match" return without actually running a match
87     if the subject is not long enough, or, for unanchored patterns, if it has
88     been scanned far enough.
89     </P>
90     <P>
91 ph10 392 You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE
92 ph10 869 option to the matching function, or by starting the pattern with
93     (*NO_START_OPT). This slows down the matching process, but does ensure that
94     callouts such as the example above are obeyed.
95 ph10 392 </P>
96 nigel 75 <br><a name="SEC3" href="#TOC1">THE CALLOUT INTERFACE</a><br>
97 nigel 63 <P>
98 nigel 75 During matching, when PCRE reaches a callout point, the external function
99 ph10 869 defined by <i>pcre_callout</i> or <i>pcre16_callout</i> is called (if it is set).
100     This applies to both normal and DFA matching. The only argument to the callout
101     function is a pointer to a <b>pcre_callout</b> or <b>pcre16_callout</b> block.
102     These structures contains the following fields:
103 nigel 63 <pre>
104 ph10 869 int <i>version</i>;
105     int <i>callout_number</i>;
106     int *<i>offset_vector</i>;
107     const char *<i>subject</i>; (8-bit version)
108     PCRE_SPTR16 <i>subject</i>; (16-bit version)
109     int <i>subject_length</i>;
110     int <i>start_match</i>;
111     int <i>current_position</i>;
112     int <i>capture_top</i>;
113     int <i>capture_last</i>;
114     void *<i>callout_data</i>;
115     int <i>pattern_position</i>;
116     int <i>next_item_length</i>;
117     const unsigned char *<i>mark</i>; (8-bit version)
118     const PCRE_UCHAR16 *<i>mark</i>; (16-bit version)
119 nigel 75 </pre>
120 nigel 63 The <i>version</i> field is an integer containing the version number of the
121 ph10 654 block format. The initial version was 0; the current version is 2. The version
122 nigel 75 number will change again in future if additional fields are added, but the
123     intention is never to remove any of the existing fields.
124 nigel 63 </P>
125     <P>
126     The <i>callout_number</i> field contains the number of the callout, as compiled
127 nigel 75 into the pattern (that is, the number after ?C for manual callouts, and 255 for
128     automatically generated callouts).
129 nigel 63 </P>
130     <P>
131     The <i>offset_vector</i> field is a pointer to the vector of offsets that was
132 ph10 869 passed by the caller to the matching function. When <b>pcre_exec()</b> or
133     <b>pcre16_exec()</b> is used, the contents can be inspected, in order to extract
134 nigel 77 substrings that have been matched so far, in the same way as for extracting
135 ph10 869 substrings after a match has completed. For the DFA matching functions, this
136     field is not useful.
137 nigel 63 </P>
138     <P>
139 nigel 75 The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
140 ph10 869 that were passed to the matching function.
141 nigel 63 </P>
142     <P>
143 ph10 172 The <i>start_match</i> field normally contains the offset within the subject at
144     which the current match attempt started. However, if the escape sequence \K
145     has been encountered, this value is changed to reflect the modified starting
146     point. If the pattern is not anchored, the callout function may be called
147     several times from the same point in the pattern for different starting points
148     in the subject.
149 nigel 63 </P>
150     <P>
151     The <i>current_position</i> field contains the offset within the subject of the
152     current match pointer.
153     </P>
154     <P>
155 ph10 869 When the <b>pcre_exec()</b> or <b>pcre16_exec()</b> is used, the
156     <i>capture_top</i> field contains one more than the number of the highest
157     numbered captured substring so far. If no substrings have been captured, the
158     value of <i>capture_top</i> is one. This is always the case when the DFA
159     functions are used, because they do not support captured substrings.
160 nigel 63 </P>
161     <P>
162     The <i>capture_last</i> field contains the number of the most recently captured
163 nigel 77 substring. If no substrings have been captured, its value is -1. This is always
164 ph10 869 the case for the DFA matching functions.
165 nigel 63 </P>
166     <P>
167 ph10 869 The <i>callout_data</i> field contains a value that is passed to a matching
168     function specifically so that it can be passed back in callouts. It is passed
169     in the <i>callout_data</i> field of a <b>pcre_extra</b> or <b>pcre16_extra</b>
170     data structure. If no such data was passed, the value of <i>callout_data</i> in
171     a callout block is NULL. There is a description of the <b>pcre_extra</b>
172     structure in the
173 nigel 75 <a href="pcreapi.html"><b>pcreapi</b></a>
174     documentation.
175 nigel 63 </P>
176     <P>
177 ph10 869 The <i>pattern_position</i> field is present from version 1 of the callout
178     structure. It contains the offset to the next item to be matched in the pattern
179     string.
180 nigel 63 </P>
181     <P>
182 ph10 869 The <i>next_item_length</i> field is present from version 1 of the callout
183     structure. It contains the length of the next item to be matched in the pattern
184     string. When the callout immediately precedes an alternation bar, a closing
185     parenthesis, or the end of the pattern, the length is zero. When the callout
186     precedes an opening parenthesis, the length is that of the entire subpattern.
187 nigel 75 </P>
188     <P>
189     The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
190     help in distinguishing between different automatic callouts, which all have the
191     same callout number. However, they are set for all callouts.
192     </P>
193 ph10 654 <P>
194 ph10 869 The <i>mark</i> field is present from version 2 of the callout structure. In
195     callouts from <b>pcre_exec()</b> or <b>pcre16_exec()</b> it contains a pointer to
196     the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
197     (*THEN) item in the match, or NULL if no such items have been passed. Instances
198     of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
199     callouts from the DFA matching functions this field always contains NULL.
200 ph10 654 </P>
201 nigel 75 <br><a name="SEC4" href="#TOC1">RETURN VALUES</a><br>
202     <P>
203     The external callout function returns an integer to PCRE. If the value is zero,
204     matching proceeds as normal. If the value is greater than zero, matching fails
205 nigel 77 at the current point, but the testing of other matching possibilities goes
206     ahead, just as if a lookahead assertion had failed. If the value is less than
207 ph10 869 zero, the match is abandoned, the matching function returns the negative value.
208 nigel 75 </P>
209     <P>
210 nigel 63 Negative values should normally be chosen from the set of PCRE_ERROR_xxx
211     values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
212     The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
213     it will never be used by PCRE itself.
214     </P>
215 ph10 99 <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
216 nigel 63 <P>
217 ph10 99 Philip Hazel
218 nigel 63 <br>
219 ph10 99 University Computing Service
220     <br>
221     Cambridge CB2 3QH, England.
222     <br>
223     </P>
224     <br><a name="SEC6" href="#TOC1">REVISION</a><br>
225     <P>
226 ph10 869 Last updated: 08 Janurary 2012
227 ph10 99 <br>
228 ph10 869 Copyright &copy; 1997-2012 University of Cambridge.
229 ph10 99 <br>
230 nigel 75 <p>
231     Return to the <a href="index.html">PCRE index page</a>.
232     </p>

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12