/[pcre]/code/trunk/doc/pcrecpp.3
ViewVC logotype

Contents of /code/trunk/doc/pcrecpp.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 87 - (hide annotations) (download)
Sat Feb 24 21:41:21 2007 UTC (7 years, 8 months ago) by nigel
File size: 11381 byte(s)
Load pcre-6.5 into code/trunk.

1 nigel 79 .TH PCRECPP 3
2 nigel 77 .SH NAME
3     PCRE - Perl-compatible regular expressions.
4     .SH "SYNOPSIS OF C++ WRAPPER"
5     .rs
6     .sp
7     .B #include <pcrecpp.h>
8     .PP
9     .SM
10     .br
11     .SH DESCRIPTION
12     .rs
13     .sp
14 nigel 81 The C++ wrapper for PCRE was provided by Google Inc. Some additional
15     functionality was added by Giuseppe Maxia. This brief man page was constructed
16     from the notes in the \fIpcrecpp.h\fP file, which should be consulted for
17     further details.
18 nigel 77 .
19     .
20     .SH "MATCHING INTERFACE"
21     .rs
22     .sp
23     The "FullMatch" operation checks that supplied text matches a supplied pattern
24     exactly. If pointer arguments are supplied, it copies matched sub-strings that
25     match sub-patterns into them.
26     .sp
27     Example: successful match
28     pcrecpp::RE re("h.*o");
29     re.FullMatch("hello");
30     .sp
31     Example: unsuccessful match (requires full match):
32     pcrecpp::RE re("e");
33     !re.FullMatch("hello");
34     .sp
35     Example: creating a temporary RE object:
36     pcrecpp::RE("h.*o").FullMatch("hello");
37     .sp
38     You can pass in a "const char*" or a "string" for "text". The examples below
39     tend to use a const char*. You can, as in the different examples above, store
40     the RE object explicitly in a variable or use a temporary RE object. The
41     examples below use one mode or the other arbitrarily. Either could correctly be
42     used for any of these examples.
43     .P
44     You must supply extra pointer arguments to extract matched subpieces.
45     .sp
46     Example: extracts "ruby" into "s" and 1234 into "i"
47     int i;
48     string s;
49     pcrecpp::RE re("(\e\ew+):(\e\ed+)");
50     re.FullMatch("ruby:1234", &s, &i);
51     .sp
52     Example: does not try to extract any extra sub-patterns
53     re.FullMatch("ruby:1234", &s);
54     .sp
55     Example: does not try to extract into NULL
56     re.FullMatch("ruby:1234", NULL, &i);
57     .sp
58     Example: integer overflow causes failure
59     !re.FullMatch("ruby:1234567891234", NULL, &i);
60     .sp
61     Example: fails because there aren't enough sub-patterns:
62     !pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
63     .sp
64     Example: fails because string cannot be stored in integer
65     !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
66     .sp
67     The provided pointer arguments can be pointers to any scalar numeric
68     type, or one of:
69     .sp
70     string (matched piece is copied to string)
71     StringPiece (StringPiece is mutated to point to matched piece)
72     T (where "bool T::ParseFrom(const char*, int)" exists)
73     NULL (the corresponding matched sub-pattern is not copied)
74     .sp
75     The function returns true iff all of the following conditions are satisfied:
76     .sp
77     a. "text" matches "pattern" exactly;
78     .sp
79     b. The number of matched sub-patterns is >= number of supplied
80     pointers;
81     .sp
82     c. The "i"th argument has a suitable type for holding the
83     string captured as the "i"th sub-pattern. If you pass in
84     NULL for the "i"th argument, or pass fewer arguments than
85     number of sub-patterns, "i"th captured sub-pattern is
86     ignored.
87     .sp
88     The matching interface supports at most 16 arguments per call.
89     If you need more, consider using the more general interface
90     \fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
91     \fBDoMatch\fP.
92     .
93     .SH "PARTIAL MATCHES"
94     .rs
95     .sp
96     You can use the "PartialMatch" operation when you want the pattern
97     to match any substring of the text.
98     .sp
99     Example: simple search for a string:
100     pcrecpp::RE("ell").PartialMatch("hello");
101     .sp
102     Example: find first number in a string:
103     int number;
104     pcrecpp::RE re("(\e\ed+)");
105     re.PartialMatch("x*100 + 20", &number);
106     assert(number == 100);
107     .
108     .
109     .SH "UTF-8 AND THE MATCHING INTERFACE"
110     .rs
111     .sp
112     By default, pattern and text are plain text, one byte per character. The UTF8
113     flag, passed to the constructor, causes both pattern and string to be treated
114     as UTF-8 text, still a byte stream but potentially multiple bytes per
115     character. In practice, the text is likelier to be UTF-8 than the pattern, but
116     the match returned may depend on the UTF8 flag, so always use it when matching
117     UTF8 text. For example, "." will match one byte normally but with UTF8 set may
118     match up to three bytes of a multi-byte character.
119     .sp
120     Example:
121     pcrecpp::RE_Options options;
122     options.set_utf8();
123     pcrecpp::RE re(utf8_pattern, options);
124     re.FullMatch(utf8_string);
125     .sp
126     Example: using the convenience function UTF8():
127     pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
128     re.FullMatch(utf8_string);
129     .sp
130     NOTE: The UTF8 flag is ignored if pcre was not configured with the
131     --enable-utf8 flag.
132     .
133     .
134 nigel 81 .SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE"
135     .rs
136     .sp
137     PCRE defines some modifiers to change the behavior of the regular expression
138     engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
139     pass such modifiers to a RE class. Currently, the following modifiers are
140     supported:
141     .sp
142     modifier description Perl corresponding
143     .sp
144     PCRE_CASELESS case insensitive match /i
145     PCRE_MULTILINE multiple lines match /m
146     PCRE_DOTALL dot matches newlines /s
147     PCRE_DOLLAR_ENDONLY $ matches only at end N/A
148     PCRE_EXTRA strict escape parsing N/A
149     PCRE_EXTENDED ignore whitespaces /x
150     PCRE_UTF8 handles UTF8 chars built-in
151     PCRE_UNGREEDY reverses * and *? N/A
152     PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
153     .sp
154     (*) Both Perl and PCRE allow non capturing parentheses by means of the
155     "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
156     capture, while (ab|cd) does.
157     .P
158     For a full account on how each modifier works, please check the
159     PCRE API reference page.
160     .P
161     For each modifier, there are two member functions whose name is made
162     out of the modifier in lowercase, without the "PCRE_" prefix. For
163     instance, PCRE_CASELESS is handled by
164     .sp
165     bool caseless()
166     .sp
167     which returns true if the modifier is set, and
168     .sp
169     RE_Options & set_caseless(bool)
170     .sp
171 nigel 87 which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
172 nigel 81 accessed through the \fBset_match_limit()\fR and \fBmatch_limit()\fR member
173     functions. Setting \fImatch_limit\fR to a non-zero value will limit the
174     execution of pcre to keep it from doing bad things like blowing the stack or
175     taking an eternity to return a result. A value of 5000 is good enough to stop
176     stack blowup in a 2MB thread stack. Setting \fImatch_limit\fR to zero disables
177 nigel 87 match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP
178     which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
179     recurses. \fBmatch_limit()\fP limits the number of matches PCRE does;
180     \fBmatch_limit_recursion()\fP limits the depth of internal recursion, and
181     therefore the amount of stack that is used.
182 nigel 81 .P
183     Normally, to pass one or more modifiers to a RE class, you declare
184     a \fIRE_Options\fR object, set the appropriate options, and pass this
185     object to a RE constructor. Example:
186     .sp
187     RE_options opt;
188     opt.set_caseless(true);
189     if (RE("HELLO", opt).PartialMatch("hello world")) ...
190     .sp
191     RE_options has two constructors. The default constructor takes no arguments and
192     creates a set of flags that are off by default. The optional parameter
193     \fIoption_flags\fR is to facilitate transfer of legacy code from C programs.
194     This lets you do
195     .sp
196     RE(pattern,
197     RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
198     .sp
199     However, new code is better off doing
200     .sp
201     RE(pattern,
202     RE_Options().set_caseless(true).set_multiline(true))
203     .PartialMatch(str);
204     .sp
205     If you are going to pass one of the most used modifiers, there are some
206     convenience functions that return a RE_Options class with the
207     appropriate modifier already set: \fBCASELESS()\fR, \fBUTF8()\fR,
208     \fBMULTILINE()\fR, \fBDOTALL\fR(), and \fBEXTENDED()\fR.
209     .P
210     If you need to set several options at once, and you don't want to go through
211     the pains of declaring a RE_Options object and setting several options, there
212     is a parallel method that give you such ability on the fly. You can concatenate
213     several \fBset_xxxxx()\fR member functions, since each of them returns a
214     reference to its class object. For example, to pass PCRE_CASELESS,
215     PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
216     .sp
217     RE(" ^ xyz \e\es+ .* blah$",
218     RE_Options()
219     .set_caseless(true)
220     .set_extended(true)
221     .set_multiline(true)).PartialMatch(sometext);
222     .sp
223     .
224     .
225 nigel 77 .SH "SCANNING TEXT INCREMENTALLY"
226     .rs
227     .sp
228     The "Consume" operation may be useful if you want to repeatedly
229     match regular expressions at the front of a string and skip over
230     them as they match. This requires use of the "StringPiece" type,
231     which represents a sub-range of a real string. Like RE, StringPiece
232     is defined in the pcrecpp namespace.
233     .sp
234     Example: read lines of the form "var = value" from a string.
235     string contents = ...; // Fill string somehow
236     pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
237    
238     string var;
239     int value;
240     pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
241     while (re.Consume(&input, &var, &value)) {
242     ...;
243     }
244     .sp
245     Each successful call to "Consume" will set "var/value", and also
246     advance "input" so it points past the matched text.
247     .P
248     The "FindAndConsume" operation is similar to "Consume" but does not
249     anchor your match at the beginning of the string. For example, you
250     could extract all words from a string by repeatedly calling
251     .sp
252     pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
253     .
254     .
255     .SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
256     .rs
257     .sp
258     By default, if you pass a pointer to a numeric value, the
259     corresponding text is interpreted as a base-10 number. You can
260     instead wrap the pointer with a call to one of the operators Hex(),
261     Octal(), or CRadix() to interpret the text in another base. The
262     CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
263     prefixes, but defaults to base-10.
264     .sp
265     Example:
266     int a, b, c, d;
267     pcrecpp::RE re("(.*) (.*) (.*) (.*)");
268     re.FullMatch("100 40 0100 0x40",
269     pcrecpp::Octal(&a), pcrecpp::Hex(&b),
270     pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
271     .sp
272     will leave 64 in a, b, c, and d.
273     .
274     .
275     .SH "REPLACING PARTS OF STRINGS"
276     .rs
277     .sp
278     You can replace the first match of "pattern" in "str" with "rewrite".
279     Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
280     used to insert text matching corresponding parenthesized group
281     from the pattern. \e0 in "rewrite" refers to the entire matching
282     text. For example:
283     .sp
284     string s = "yabba dabba doo";
285     pcrecpp::RE("b+").Replace("d", &s);
286     .sp
287     will leave "s" containing "yada dabba doo". The result is true if the pattern
288     matches and a replacement occurs, false otherwise.
289     .P
290     \fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
291     occurrences of the pattern in the string with the rewrite. Replacements are
292     not subject to re-matching. For example:
293     .sp
294     string s = "yabba dabba doo";
295     pcrecpp::RE("b+").GlobalReplace("d", &s);
296     .sp
297     will leave "s" containing "yada dada doo". It returns the number of
298     replacements made.
299     .P
300     \fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
301     "rewrite" is copied into "out" (an additional argument) with substitutions.
302     The non-matching portions of "text" are ignored. Returns true iff a match
303     occurred and the extraction happened successfully; if no match occurs, the
304     string is left unaffected.
305     .
306     .
307     .SH AUTHOR
308     .rs
309     .sp
310     The C++ wrapper was contributed by Google Inc.
311     .br
312     Copyright (c) 2005 Google Inc.

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12