ViewVC logotype

Contents of /code/trunk/doc/html/pcrecpp.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 111 - (hide annotations) (download) (as text)
Thu Mar 8 16:53:09 2007 UTC (8 years, 1 month ago) by ph10
File MIME type: text/html
File size: 14048 byte(s)
Create the PrepareRelease script to process the documentation and create the 
.generic files for distribution, also to remove trailing spaces. Update a lot 
more of the build-time documentation. Arrange for PrepareRelease and its 
sub-scripts to be distributed.

1 nigel 77 <html>
2     <head>
3     <title>pcrecpp specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6     <h1>pcrecpp man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10 ph10 111 <p>
11 nigel 77 This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14 ph10 111 <br>
15 nigel 77 <ul>
16     <li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
17     <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
18     <li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
19 nigel 93 <li><a name="TOC4" href="#SEC4">QUOTING METACHARACTERS</a>
20     <li><a name="TOC5" href="#SEC5">PARTIAL MATCHES</a>
21     <li><a name="TOC6" href="#SEC6">UTF-8 AND THE MATCHING INTERFACE</a>
23     <li><a name="TOC8" href="#SEC8">SCANNING TEXT INCREMENTALLY</a>
24     <li><a name="TOC9" href="#SEC9">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
25     <li><a name="TOC10" href="#SEC10">REPLACING PARTS OF STRINGS</a>
26     <li><a name="TOC11" href="#SEC11">AUTHOR</a>
27 ph10 99 <li><a name="TOC12" href="#SEC12">REVISION</a>
28 nigel 77 </ul>
29     <br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
30     <P>
31     <b>#include &#60;pcrecpp.h&#62;</b>
32     </P>
33     <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
34     <P>
35 nigel 81 The C++ wrapper for PCRE was provided by Google Inc. Some additional
36     functionality was added by Giuseppe Maxia. This brief man page was constructed
37     from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
38     further details.
39 nigel 77 </P>
40     <br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
41     <P>
42     The "FullMatch" operation checks that supplied text matches a supplied pattern
43     exactly. If pointer arguments are supplied, it copies matched sub-strings that
44     match sub-patterns into them.
45     <pre>
46     Example: successful match
47     pcrecpp::RE re("h.*o");
48     re.FullMatch("hello");
50     Example: unsuccessful match (requires full match):
51     pcrecpp::RE re("e");
52     !re.FullMatch("hello");
54     Example: creating a temporary RE object:
55     pcrecpp::RE("h.*o").FullMatch("hello");
56     </pre>
57     You can pass in a "const char*" or a "string" for "text". The examples below
58     tend to use a const char*. You can, as in the different examples above, store
59     the RE object explicitly in a variable or use a temporary RE object. The
60     examples below use one mode or the other arbitrarily. Either could correctly be
61     used for any of these examples.
62     </P>
63     <P>
64     You must supply extra pointer arguments to extract matched subpieces.
65     <pre>
66     Example: extracts "ruby" into "s" and 1234 into "i"
67     int i;
68     string s;
69     pcrecpp::RE re("(\\w+):(\\d+)");
70     re.FullMatch("ruby:1234", &s, &i);
72     Example: does not try to extract any extra sub-patterns
73     re.FullMatch("ruby:1234", &s);
75     Example: does not try to extract into NULL
76     re.FullMatch("ruby:1234", NULL, &i);
78     Example: integer overflow causes failure
79     !re.FullMatch("ruby:1234567891234", NULL, &i);
81     Example: fails because there aren't enough sub-patterns:
82     !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
84     Example: fails because string cannot be stored in integer
85     !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
86     </pre>
87     The provided pointer arguments can be pointers to any scalar numeric
88     type, or one of:
89     <pre>
90     string (matched piece is copied to string)
91     StringPiece (StringPiece is mutated to point to matched piece)
92     T (where "bool T::ParseFrom(const char*, int)" exists)
93     NULL (the corresponding matched sub-pattern is not copied)
94     </pre>
95     The function returns true iff all of the following conditions are satisfied:
96     <pre>
97     a. "text" matches "pattern" exactly;
99     b. The number of matched sub-patterns is &#62;= number of supplied
100     pointers;
102     c. The "i"th argument has a suitable type for holding the
103     string captured as the "i"th sub-pattern. If you pass in
104     NULL for the "i"th argument, or pass fewer arguments than
105     number of sub-patterns, "i"th captured sub-pattern is
106     ignored.
107     </pre>
108 nigel 93 CAVEAT: An optional sub-pattern that does not exist in the matched
109     string is assigned the empty string. Therefore, the following will
110     return false (because the empty string is not a valid number):
111     <pre>
112     int number;
113     pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
114     </pre>
115 nigel 77 The matching interface supports at most 16 arguments per call.
116     If you need more, consider using the more general interface
117     <b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
118     <b>DoMatch</b>.
119     </P>
120 nigel 93 <br><a name="SEC4" href="#TOC1">QUOTING METACHARACTERS</a><br>
121 nigel 77 <P>
122 nigel 93 You can use the "QuoteMeta" operation to insert backslashes before all
123     potentially meaningful characters in a string. The returned string, used as a
124     regular expression, will exactly match the original string.
125     <pre>
126     Example:
127     string quoted = RE::QuoteMeta(unquoted);
128     </pre>
129     Note that it's legal to escape a character even if it has no special meaning in
130     a regular expression -- so this function does that. (This also makes it
131     identical to the perl function of the same name; see "perldoc -f quotemeta".)
132     For example, "1.5-2.0?" becomes "1\.5\-2\.0\?".
133     </P>
134     <br><a name="SEC5" href="#TOC1">PARTIAL MATCHES</a><br>
135     <P>
136 nigel 77 You can use the "PartialMatch" operation when you want the pattern
137     to match any substring of the text.
138     <pre>
139     Example: simple search for a string:
140     pcrecpp::RE("ell").PartialMatch("hello");
142     Example: find first number in a string:
143     int number;
144     pcrecpp::RE re("(\\d+)");
145     re.PartialMatch("x*100 + 20", &number);
146     assert(number == 100);
147     </PRE>
148     </P>
149 nigel 93 <br><a name="SEC6" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
150 nigel 77 <P>
151     By default, pattern and text are plain text, one byte per character. The UTF8
152     flag, passed to the constructor, causes both pattern and string to be treated
153     as UTF-8 text, still a byte stream but potentially multiple bytes per
154     character. In practice, the text is likelier to be UTF-8 than the pattern, but
155     the match returned may depend on the UTF8 flag, so always use it when matching
156     UTF8 text. For example, "." will match one byte normally but with UTF8 set may
157     match up to three bytes of a multi-byte character.
158     <pre>
159     Example:
160     pcrecpp::RE_Options options;
161     options.set_utf8();
162     pcrecpp::RE re(utf8_pattern, options);
163     re.FullMatch(utf8_string);
165     Example: using the convenience function UTF8():
166     pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
167     re.FullMatch(utf8_string);
168     </pre>
169     NOTE: The UTF8 flag is ignored if pcre was not configured with the
170     <pre>
171     --enable-utf8 flag.
172     </PRE>
173     </P>
174 nigel 93 <br><a name="SEC7" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
175 nigel 77 <P>
176 nigel 81 PCRE defines some modifiers to change the behavior of the regular expression
177     engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
178     pass such modifiers to a RE class. Currently, the following modifiers are
179     supported:
180     <pre>
181     modifier description Perl corresponding
183     PCRE_CASELESS case insensitive match /i
184     PCRE_MULTILINE multiple lines match /m
185     PCRE_DOTALL dot matches newlines /s
186     PCRE_DOLLAR_ENDONLY $ matches only at end N/A
187     PCRE_EXTRA strict escape parsing N/A
188     PCRE_EXTENDED ignore whitespaces /x
189     PCRE_UTF8 handles UTF8 chars built-in
190     PCRE_UNGREEDY reverses * and *? N/A
191     PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
192     </pre>
193     (*) Both Perl and PCRE allow non capturing parentheses by means of the
194     "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
195     capture, while (ab|cd) does.
196     </P>
197     <P>
198     For a full account on how each modifier works, please check the
199     PCRE API reference page.
200     </P>
201     <P>
202     For each modifier, there are two member functions whose name is made
203     out of the modifier in lowercase, without the "PCRE_" prefix. For
204     instance, PCRE_CASELESS is handled by
205     <pre>
206     bool caseless()
207     </pre>
208     which returns true if the modifier is set, and
209     <pre>
210     RE_Options & set_caseless(bool)
211     </pre>
212 nigel 87 which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
213 nigel 81 accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
214     functions. Setting <i>match_limit</i> to a non-zero value will limit the
215     execution of pcre to keep it from doing bad things like blowing the stack or
216     taking an eternity to return a result. A value of 5000 is good enough to stop
217     stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
218 nigel 87 match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
219     which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
220     recurses. <b>match_limit()</b> limits the number of matches PCRE does;
221     <b>match_limit_recursion()</b> limits the depth of internal recursion, and
222     therefore the amount of stack that is used.
223 nigel 81 </P>
224     <P>
225     Normally, to pass one or more modifiers to a RE class, you declare
226     a <i>RE_Options</i> object, set the appropriate options, and pass this
227     object to a RE constructor. Example:
228     <pre>
229     RE_options opt;
230     opt.set_caseless(true);
231     if (RE("HELLO", opt).PartialMatch("hello world")) ...
232     </pre>
233     RE_options has two constructors. The default constructor takes no arguments and
234     creates a set of flags that are off by default. The optional parameter
235     <i>option_flags</i> is to facilitate transfer of legacy code from C programs.
236     This lets you do
237     <pre>
238     RE(pattern,
239     RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
240     </pre>
241     However, new code is better off doing
242     <pre>
243     RE(pattern,
244     RE_Options().set_caseless(true).set_multiline(true))
245     .PartialMatch(str);
246     </pre>
247     If you are going to pass one of the most used modifiers, there are some
248     convenience functions that return a RE_Options class with the
249     appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
250     <b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
251     </P>
252     <P>
253     If you need to set several options at once, and you don't want to go through
254     the pains of declaring a RE_Options object and setting several options, there
255     is a parallel method that give you such ability on the fly. You can concatenate
256     several <b>set_xxxxx()</b> member functions, since each of them returns a
257     reference to its class object. For example, to pass PCRE_CASELESS,
258     PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
259     <pre>
260     RE(" ^ xyz \\s+ .* blah$",
261     RE_Options()
262     .set_caseless(true)
263     .set_extended(true)
264     .set_multiline(true)).PartialMatch(sometext);
266     </PRE>
267     </P>
268 nigel 93 <br><a name="SEC8" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
269 nigel 81 <P>
270 nigel 77 The "Consume" operation may be useful if you want to repeatedly
271     match regular expressions at the front of a string and skip over
272     them as they match. This requires use of the "StringPiece" type,
273     which represents a sub-range of a real string. Like RE, StringPiece
274     is defined in the pcrecpp namespace.
275     <pre>
276     Example: read lines of the form "var = value" from a string.
277     string contents = ...; // Fill string somehow
278     pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
279     </PRE>
280     </P>
281     <P>
282     <pre>
283     string var;
284     int value;
285     pcrecpp::RE re("(\\w+) = (\\d+)\n");
286     while (re.Consume(&input, &var, &value)) {
287     ...;
288     }
289     </pre>
290     Each successful call to "Consume" will set "var/value", and also
291     advance "input" so it points past the matched text.
292     </P>
293     <P>
294     The "FindAndConsume" operation is similar to "Consume" but does not
295     anchor your match at the beginning of the string. For example, you
296     could extract all words from a string by repeatedly calling
297     <pre>
298     pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
299     </PRE>
300     </P>
301 nigel 93 <br><a name="SEC9" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
302 nigel 77 <P>
303     By default, if you pass a pointer to a numeric value, the
304     corresponding text is interpreted as a base-10 number. You can
305     instead wrap the pointer with a call to one of the operators Hex(),
306     Octal(), or CRadix() to interpret the text in another base. The
307     CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
308     prefixes, but defaults to base-10.
309     <pre>
310     Example:
311     int a, b, c, d;
312     pcrecpp::RE re("(.*) (.*) (.*) (.*)");
313     re.FullMatch("100 40 0100 0x40",
314     pcrecpp::Octal(&a), pcrecpp::Hex(&b),
315     pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
316     </pre>
317     will leave 64 in a, b, c, and d.
318     </P>
319 nigel 93 <br><a name="SEC10" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
320 nigel 77 <P>
321     You can replace the first match of "pattern" in "str" with "rewrite".
322     Within "rewrite", backslash-escaped digits (\1 to \9) can be
323     used to insert text matching corresponding parenthesized group
324     from the pattern. \0 in "rewrite" refers to the entire matching
325     text. For example:
326     <pre>
327     string s = "yabba dabba doo";
328     pcrecpp::RE("b+").Replace("d", &s);
329     </pre>
330     will leave "s" containing "yada dabba doo". The result is true if the pattern
331     matches and a replacement occurs, false otherwise.
332     </P>
333     <P>
334     <b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
335     occurrences of the pattern in the string with the rewrite. Replacements are
336     not subject to re-matching. For example:
337     <pre>
338     string s = "yabba dabba doo";
339     pcrecpp::RE("b+").GlobalReplace("d", &s);
340     </pre>
341     will leave "s" containing "yada dada doo". It returns the number of
342     replacements made.
343     </P>
344     <P>
345     <b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
346     "rewrite" is copied into "out" (an additional argument) with substitutions.
347     The non-matching portions of "text" are ignored. Returns true iff a match
348     occurred and the extraction happened successfully; if no match occurs, the
349     string is left unaffected.
350     </P>
351 nigel 93 <br><a name="SEC11" href="#TOC1">AUTHOR</a><br>
352 nigel 77 <P>
353     The C++ wrapper was contributed by Google Inc.
354     <br>
355 nigel 93 Copyright &copy; 2006 Google Inc.
356 ph10 99 <br>
357     </P>
358     <br><a name="SEC12" href="#TOC1">REVISION</a><br>
359     <P>
360     Last updated: 06 March 2007
361     <br>
362 nigel 77 <p>
363     Return to the <a href="index.html">PCRE index page</a>.
364     </p>


Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

ViewVC Help
Powered by ViewVC 1.1.12