/[pcre]/code/trunk/doc/pcrecpp.3
ViewVC logotype

Contents of /code/trunk/doc/pcrecpp.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 81 - (hide annotations) (download)
Sat Feb 24 21:40:59 2007 UTC (6 years, 2 months ago) by nigel
File size: 11069 byte(s)
Load pcre-6.2 into code/trunk.

1 nigel 79 .TH PCRECPP 3
2 nigel 77 .SH NAME
3     PCRE - Perl-compatible regular expressions.
4     .SH "SYNOPSIS OF C++ WRAPPER"
5     .rs
6     .sp
7     .B #include <pcrecpp.h>
8     .PP
9     .SM
10     .br
11     .SH DESCRIPTION
12     .rs
13     .sp
14 nigel 81 The C++ wrapper for PCRE was provided by Google Inc. Some additional
15     functionality was added by Giuseppe Maxia. This brief man page was constructed
16     from the notes in the \fIpcrecpp.h\fP file, which should be consulted for
17     further details.
18 nigel 77 .
19     .
20     .SH "MATCHING INTERFACE"
21     .rs
22     .sp
23     The "FullMatch" operation checks that supplied text matches a supplied pattern
24     exactly. If pointer arguments are supplied, it copies matched sub-strings that
25     match sub-patterns into them.
26     .sp
27     Example: successful match
28     pcrecpp::RE re("h.*o");
29     re.FullMatch("hello");
30     .sp
31     Example: unsuccessful match (requires full match):
32     pcrecpp::RE re("e");
33     !re.FullMatch("hello");
34     .sp
35     Example: creating a temporary RE object:
36     pcrecpp::RE("h.*o").FullMatch("hello");
37     .sp
38     You can pass in a "const char*" or a "string" for "text". The examples below
39     tend to use a const char*. You can, as in the different examples above, store
40     the RE object explicitly in a variable or use a temporary RE object. The
41     examples below use one mode or the other arbitrarily. Either could correctly be
42     used for any of these examples.
43     .P
44     You must supply extra pointer arguments to extract matched subpieces.
45     .sp
46     Example: extracts "ruby" into "s" and 1234 into "i"
47     int i;
48     string s;
49     pcrecpp::RE re("(\e\ew+):(\e\ed+)");
50     re.FullMatch("ruby:1234", &s, &i);
51     .sp
52     Example: does not try to extract any extra sub-patterns
53     re.FullMatch("ruby:1234", &s);
54     .sp
55     Example: does not try to extract into NULL
56     re.FullMatch("ruby:1234", NULL, &i);
57     .sp
58     Example: integer overflow causes failure
59     !re.FullMatch("ruby:1234567891234", NULL, &i);
60     .sp
61     Example: fails because there aren't enough sub-patterns:
62     !pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
63     .sp
64     Example: fails because string cannot be stored in integer
65     !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
66     .sp
67     The provided pointer arguments can be pointers to any scalar numeric
68     type, or one of:
69     .sp
70     string (matched piece is copied to string)
71     StringPiece (StringPiece is mutated to point to matched piece)
72     T (where "bool T::ParseFrom(const char*, int)" exists)
73     NULL (the corresponding matched sub-pattern is not copied)
74     .sp
75     The function returns true iff all of the following conditions are satisfied:
76     .sp
77     a. "text" matches "pattern" exactly;
78     .sp
79     b. The number of matched sub-patterns is >= number of supplied
80     pointers;
81     .sp
82     c. The "i"th argument has a suitable type for holding the
83     string captured as the "i"th sub-pattern. If you pass in
84     NULL for the "i"th argument, or pass fewer arguments than
85     number of sub-patterns, "i"th captured sub-pattern is
86     ignored.
87     .sp
88     The matching interface supports at most 16 arguments per call.
89     If you need more, consider using the more general interface
90     \fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
91     \fBDoMatch\fP.
92     .
93     .SH "PARTIAL MATCHES"
94     .rs
95     .sp
96     You can use the "PartialMatch" operation when you want the pattern
97     to match any substring of the text.
98     .sp
99     Example: simple search for a string:
100     pcrecpp::RE("ell").PartialMatch("hello");
101     .sp
102     Example: find first number in a string:
103     int number;
104     pcrecpp::RE re("(\e\ed+)");
105     re.PartialMatch("x*100 + 20", &number);
106     assert(number == 100);
107     .
108     .
109     .SH "UTF-8 AND THE MATCHING INTERFACE"
110     .rs
111     .sp
112     By default, pattern and text are plain text, one byte per character. The UTF8
113     flag, passed to the constructor, causes both pattern and string to be treated
114     as UTF-8 text, still a byte stream but potentially multiple bytes per
115     character. In practice, the text is likelier to be UTF-8 than the pattern, but
116     the match returned may depend on the UTF8 flag, so always use it when matching
117     UTF8 text. For example, "." will match one byte normally but with UTF8 set may
118     match up to three bytes of a multi-byte character.
119     .sp
120     Example:
121     pcrecpp::RE_Options options;
122     options.set_utf8();
123     pcrecpp::RE re(utf8_pattern, options);
124     re.FullMatch(utf8_string);
125     .sp
126     Example: using the convenience function UTF8():
127     pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
128     re.FullMatch(utf8_string);
129     .sp
130     NOTE: The UTF8 flag is ignored if pcre was not configured with the
131     --enable-utf8 flag.
132     .
133     .
134 nigel 81 .SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE"
135     .rs
136     .sp
137     PCRE defines some modifiers to change the behavior of the regular expression
138     engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
139     pass such modifiers to a RE class. Currently, the following modifiers are
140     supported:
141     .sp
142     modifier description Perl corresponding
143     .sp
144     PCRE_CASELESS case insensitive match /i
145     PCRE_MULTILINE multiple lines match /m
146     PCRE_DOTALL dot matches newlines /s
147     PCRE_DOLLAR_ENDONLY $ matches only at end N/A
148     PCRE_EXTRA strict escape parsing N/A
149     PCRE_EXTENDED ignore whitespaces /x
150     PCRE_UTF8 handles UTF8 chars built-in
151     PCRE_UNGREEDY reverses * and *? N/A
152     PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
153     .sp
154     (*) Both Perl and PCRE allow non capturing parentheses by means of the
155     "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
156     capture, while (ab|cd) does.
157     .P
158     For a full account on how each modifier works, please check the
159     PCRE API reference page.
160     .P
161     For each modifier, there are two member functions whose name is made
162     out of the modifier in lowercase, without the "PCRE_" prefix. For
163     instance, PCRE_CASELESS is handled by
164     .sp
165     bool caseless()
166     .sp
167     which returns true if the modifier is set, and
168     .sp
169     RE_Options & set_caseless(bool)
170     .sp
171     which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be
172     accessed through the \fBset_match_limit()\fR and \fBmatch_limit()\fR member
173     functions. Setting \fImatch_limit\fR to a non-zero value will limit the
174     execution of pcre to keep it from doing bad things like blowing the stack or
175     taking an eternity to return a result. A value of 5000 is good enough to stop
176     stack blowup in a 2MB thread stack. Setting \fImatch_limit\fR to zero disables
177     match limiting.
178     .P
179     Normally, to pass one or more modifiers to a RE class, you declare
180     a \fIRE_Options\fR object, set the appropriate options, and pass this
181     object to a RE constructor. Example:
182     .sp
183     RE_options opt;
184     opt.set_caseless(true);
185     if (RE("HELLO", opt).PartialMatch("hello world")) ...
186     .sp
187     RE_options has two constructors. The default constructor takes no arguments and
188     creates a set of flags that are off by default. The optional parameter
189     \fIoption_flags\fR is to facilitate transfer of legacy code from C programs.
190     This lets you do
191     .sp
192     RE(pattern,
193     RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
194     .sp
195     However, new code is better off doing
196     .sp
197     RE(pattern,
198     RE_Options().set_caseless(true).set_multiline(true))
199     .PartialMatch(str);
200     .sp
201     If you are going to pass one of the most used modifiers, there are some
202     convenience functions that return a RE_Options class with the
203     appropriate modifier already set: \fBCASELESS()\fR, \fBUTF8()\fR,
204     \fBMULTILINE()\fR, \fBDOTALL\fR(), and \fBEXTENDED()\fR.
205     .P
206     If you need to set several options at once, and you don't want to go through
207     the pains of declaring a RE_Options object and setting several options, there
208     is a parallel method that give you such ability on the fly. You can concatenate
209     several \fBset_xxxxx()\fR member functions, since each of them returns a
210     reference to its class object. For example, to pass PCRE_CASELESS,
211     PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
212     .sp
213     RE(" ^ xyz \e\es+ .* blah$",
214     RE_Options()
215     .set_caseless(true)
216     .set_extended(true)
217     .set_multiline(true)).PartialMatch(sometext);
218     .sp
219     .
220     .
221 nigel 77 .SH "SCANNING TEXT INCREMENTALLY"
222     .rs
223     .sp
224     The "Consume" operation may be useful if you want to repeatedly
225     match regular expressions at the front of a string and skip over
226     them as they match. This requires use of the "StringPiece" type,
227     which represents a sub-range of a real string. Like RE, StringPiece
228     is defined in the pcrecpp namespace.
229     .sp
230     Example: read lines of the form "var = value" from a string.
231     string contents = ...; // Fill string somehow
232     pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
233    
234     string var;
235     int value;
236     pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
237     while (re.Consume(&input, &var, &value)) {
238     ...;
239     }
240     .sp
241     Each successful call to "Consume" will set "var/value", and also
242     advance "input" so it points past the matched text.
243     .P
244     The "FindAndConsume" operation is similar to "Consume" but does not
245     anchor your match at the beginning of the string. For example, you
246     could extract all words from a string by repeatedly calling
247     .sp
248     pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
249     .
250     .
251     .SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
252     .rs
253     .sp
254     By default, if you pass a pointer to a numeric value, the
255     corresponding text is interpreted as a base-10 number. You can
256     instead wrap the pointer with a call to one of the operators Hex(),
257     Octal(), or CRadix() to interpret the text in another base. The
258     CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
259     prefixes, but defaults to base-10.
260     .sp
261     Example:
262     int a, b, c, d;
263     pcrecpp::RE re("(.*) (.*) (.*) (.*)");
264     re.FullMatch("100 40 0100 0x40",
265     pcrecpp::Octal(&a), pcrecpp::Hex(&b),
266     pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
267     .sp
268     will leave 64 in a, b, c, and d.
269     .
270     .
271     .SH "REPLACING PARTS OF STRINGS"
272     .rs
273     .sp
274     You can replace the first match of "pattern" in "str" with "rewrite".
275     Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
276     used to insert text matching corresponding parenthesized group
277     from the pattern. \e0 in "rewrite" refers to the entire matching
278     text. For example:
279     .sp
280     string s = "yabba dabba doo";
281     pcrecpp::RE("b+").Replace("d", &s);
282     .sp
283     will leave "s" containing "yada dabba doo". The result is true if the pattern
284     matches and a replacement occurs, false otherwise.
285     .P
286     \fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
287     occurrences of the pattern in the string with the rewrite. Replacements are
288     not subject to re-matching. For example:
289     .sp
290     string s = "yabba dabba doo";
291     pcrecpp::RE("b+").GlobalReplace("d", &s);
292     .sp
293     will leave "s" containing "yada dada doo". It returns the number of
294     replacements made.
295     .P
296     \fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
297     "rewrite" is copied into "out" (an additional argument) with substitutions.
298     The non-matching portions of "text" are ignored. Returns true iff a match
299     occurred and the extraction happened successfully; if no match occurs, the
300     string is left unaffected.
301     .
302     .
303     .SH AUTHOR
304     .rs
305     .sp
306     The C++ wrapper was contributed by Google Inc.
307     .br
308     Copyright (c) 2005 Google Inc.

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12