| 16 |
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a> |
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a> |
| 17 |
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a> |
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a> |
| 18 |
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a> |
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a> |
| 19 |
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a> |
<li><a name="TOC4" href="#SEC4">QUOTING METACHARACTERS</a> |
| 20 |
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a> |
<li><a name="TOC5" href="#SEC5">PARTIAL MATCHES</a> |
| 21 |
<li><a name="TOC6" href="#SEC6">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a> |
<li><a name="TOC6" href="#SEC6">UTF-8 AND THE MATCHING INTERFACE</a> |
| 22 |
<li><a name="TOC7" href="#SEC7">SCANNING TEXT INCREMENTALLY</a> |
<li><a name="TOC7" href="#SEC7">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a> |
| 23 |
<li><a name="TOC8" href="#SEC8">PARSING HEX/OCTAL/C-RADIX NUMBERS</a> |
<li><a name="TOC8" href="#SEC8">SCANNING TEXT INCREMENTALLY</a> |
| 24 |
<li><a name="TOC9" href="#SEC9">REPLACING PARTS OF STRINGS</a> |
<li><a name="TOC9" href="#SEC9">PARSING HEX/OCTAL/C-RADIX NUMBERS</a> |
| 25 |
<li><a name="TOC10" href="#SEC10">AUTHOR</a> |
<li><a name="TOC10" href="#SEC10">REPLACING PARTS OF STRINGS</a> |
| 26 |
|
<li><a name="TOC11" href="#SEC11">AUTHOR</a> |
| 27 |
|
<li><a name="TOC12" href="#SEC12">REVISION</a> |
| 28 |
</ul> |
</ul> |
| 29 |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br> |
| 30 |
<P> |
<P> |
| 31 |
<b>#include <pcrecpp.h></b> |
<b>#include <pcrecpp.h></b> |
| 32 |
</P> |
</P> |
|
<P> |
|
|
</P> |
|
| 33 |
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br> |
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br> |
| 34 |
<P> |
<P> |
| 35 |
The C++ wrapper for PCRE was provided by Google Inc. Some additional |
The C++ wrapper for PCRE was provided by Google Inc. Some additional |
| 36 |
functionality was added by Giuseppe Maxia. This brief man page was constructed |
functionality was added by Giuseppe Maxia. This brief man page was constructed |
| 37 |
from the notes in the <i>pcrecpp.h</i> file, which should be consulted for |
from the notes in the <i>pcrecpp.h</i> file, which should be consulted for |
| 38 |
further details. |
further details. Note that the C++ wrapper supports only the original 8-bit |
| 39 |
|
PCRE library. There is no 16-bit support at present. |
| 40 |
</P> |
</P> |
| 41 |
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br> |
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br> |
| 42 |
<P> |
<P> |
| 102 |
|
|
| 103 |
c. The "i"th argument has a suitable type for holding the |
c. The "i"th argument has a suitable type for holding the |
| 104 |
string captured as the "i"th sub-pattern. If you pass in |
string captured as the "i"th sub-pattern. If you pass in |
| 105 |
NULL for the "i"th argument, or pass fewer arguments than |
void * NULL for the "i"th argument, or a non-void * NULL |
| 106 |
|
of the correct type, or pass fewer arguments than the |
| 107 |
number of sub-patterns, "i"th captured sub-pattern is |
number of sub-patterns, "i"th captured sub-pattern is |
| 108 |
ignored. |
ignored. |
| 109 |
</pre> |
</pre> |
| 110 |
|
CAVEAT: An optional sub-pattern that does not exist in the matched |
| 111 |
|
string is assigned the empty string. Therefore, the following will |
| 112 |
|
return false (because the empty string is not a valid number): |
| 113 |
|
<pre> |
| 114 |
|
int number; |
| 115 |
|
pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number); |
| 116 |
|
</pre> |
| 117 |
The matching interface supports at most 16 arguments per call. |
The matching interface supports at most 16 arguments per call. |
| 118 |
If you need more, consider using the more general interface |
If you need more, consider using the more general interface |
| 119 |
<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for |
<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for |
| 120 |
<b>DoMatch</b>. |
<b>DoMatch</b>. |
| 121 |
</P> |
</P> |
| 122 |
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHES</a><br> |
<P> |
| 123 |
|
NOTE: Do not use <b>no_arg</b>, which is used internally to mark the end of a |
| 124 |
|
list of optional arguments, as a placeholder for missing arguments, as this can |
| 125 |
|
lead to segfaults. |
| 126 |
|
</P> |
| 127 |
|
<br><a name="SEC4" href="#TOC1">QUOTING METACHARACTERS</a><br> |
| 128 |
|
<P> |
| 129 |
|
You can use the "QuoteMeta" operation to insert backslashes before all |
| 130 |
|
potentially meaningful characters in a string. The returned string, used as a |
| 131 |
|
regular expression, will exactly match the original string. |
| 132 |
|
<pre> |
| 133 |
|
Example: |
| 134 |
|
string quoted = RE::QuoteMeta(unquoted); |
| 135 |
|
</pre> |
| 136 |
|
Note that it's legal to escape a character even if it has no special meaning in |
| 137 |
|
a regular expression -- so this function does that. (This also makes it |
| 138 |
|
identical to the perl function of the same name; see "perldoc -f quotemeta".) |
| 139 |
|
For example, "1.5-2.0?" becomes "1\.5\-2\.0\?". |
| 140 |
|
</P> |
| 141 |
|
<br><a name="SEC5" href="#TOC1">PARTIAL MATCHES</a><br> |
| 142 |
<P> |
<P> |
| 143 |
You can use the "PartialMatch" operation when you want the pattern |
You can use the "PartialMatch" operation when you want the pattern |
| 144 |
to match any substring of the text. |
to match any substring of the text. |
| 153 |
assert(number == 100); |
assert(number == 100); |
| 154 |
</PRE> |
</PRE> |
| 155 |
</P> |
</P> |
| 156 |
<br><a name="SEC5" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br> |
<br><a name="SEC6" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br> |
| 157 |
<P> |
<P> |
| 158 |
By default, pattern and text are plain text, one byte per character. The UTF8 |
By default, pattern and text are plain text, one byte per character. The UTF8 |
| 159 |
flag, passed to the constructor, causes both pattern and string to be treated |
flag, passed to the constructor, causes both pattern and string to be treated |
| 178 |
--enable-utf8 flag. |
--enable-utf8 flag. |
| 179 |
</PRE> |
</PRE> |
| 180 |
</P> |
</P> |
| 181 |
<br><a name="SEC6" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br> |
<br><a name="SEC7" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br> |
| 182 |
<P> |
<P> |
| 183 |
PCRE defines some modifiers to change the behavior of the regular expression |
PCRE defines some modifiers to change the behavior of the regular expression |
| 184 |
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to |
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to |
| 216 |
<pre> |
<pre> |
| 217 |
RE_Options & set_caseless(bool) |
RE_Options & set_caseless(bool) |
| 218 |
</pre> |
</pre> |
| 219 |
which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be |
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be |
| 220 |
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member |
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member |
| 221 |
functions. Setting <i>match_limit</i> to a non-zero value will limit the |
functions. Setting <i>match_limit</i> to a non-zero value will limit the |
| 222 |
execution of pcre to keep it from doing bad things like blowing the stack or |
execution of pcre to keep it from doing bad things like blowing the stack or |
| 223 |
taking an eternity to return a result. A value of 5000 is good enough to stop |
taking an eternity to return a result. A value of 5000 is good enough to stop |
| 224 |
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables |
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables |
| 225 |
match limiting. |
match limiting. Alternatively, you can call <b>match_limit_recursion()</b> |
| 226 |
|
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE |
| 227 |
|
recurses. <b>match_limit()</b> limits the number of matches PCRE does; |
| 228 |
|
<b>match_limit_recursion()</b> limits the depth of internal recursion, and |
| 229 |
|
therefore the amount of stack that is used. |
| 230 |
</P> |
</P> |
| 231 |
<P> |
<P> |
| 232 |
Normally, to pass one or more modifiers to a RE class, you declare |
Normally, to pass one or more modifiers to a RE class, you declare |
| 233 |
a <i>RE_Options</i> object, set the appropriate options, and pass this |
a <i>RE_Options</i> object, set the appropriate options, and pass this |
| 234 |
object to a RE constructor. Example: |
object to a RE constructor. Example: |
| 235 |
<pre> |
<pre> |
| 236 |
RE_options opt; |
RE_Options opt; |
| 237 |
opt.set_caseless(true); |
opt.set_caseless(true); |
| 238 |
if (RE("HELLO", opt).PartialMatch("hello world")) ... |
if (RE("HELLO", opt).PartialMatch("hello world")) ... |
| 239 |
</pre> |
</pre> |
| 272 |
|
|
| 273 |
</PRE> |
</PRE> |
| 274 |
</P> |
</P> |
| 275 |
<br><a name="SEC7" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br> |
<br><a name="SEC8" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br> |
| 276 |
<P> |
<P> |
| 277 |
The "Consume" operation may be useful if you want to repeatedly |
The "Consume" operation may be useful if you want to repeatedly |
| 278 |
match regular expressions at the front of a string and skip over |
match regular expressions at the front of a string and skip over |
| 283 |
Example: read lines of the form "var = value" from a string. |
Example: read lines of the form "var = value" from a string. |
| 284 |
string contents = ...; // Fill string somehow |
string contents = ...; // Fill string somehow |
| 285 |
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece |
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece |
| 286 |
</PRE> |
|
|
</P> |
|
|
<P> |
|
|
<pre> |
|
| 287 |
string var; |
string var; |
| 288 |
int value; |
int value; |
| 289 |
pcrecpp::RE re("(\\w+) = (\\d+)\n"); |
pcrecpp::RE re("(\\w+) = (\\d+)\n"); |
| 302 |
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word) |
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word) |
| 303 |
</PRE> |
</PRE> |
| 304 |
</P> |
</P> |
| 305 |
<br><a name="SEC8" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br> |
<br><a name="SEC9" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br> |
| 306 |
<P> |
<P> |
| 307 |
By default, if you pass a pointer to a numeric value, the |
By default, if you pass a pointer to a numeric value, the |
| 308 |
corresponding text is interpreted as a base-10 number. You can |
corresponding text is interpreted as a base-10 number. You can |
| 320 |
</pre> |
</pre> |
| 321 |
will leave 64 in a, b, c, and d. |
will leave 64 in a, b, c, and d. |
| 322 |
</P> |
</P> |
| 323 |
<br><a name="SEC9" href="#TOC1">REPLACING PARTS OF STRINGS</a><br> |
<br><a name="SEC10" href="#TOC1">REPLACING PARTS OF STRINGS</a><br> |
| 324 |
<P> |
<P> |
| 325 |
You can replace the first match of "pattern" in "str" with "rewrite". |
You can replace the first match of "pattern" in "str" with "rewrite". |
| 326 |
Within "rewrite", backslash-escaped digits (\1 to \9) can be |
Within "rewrite", backslash-escaped digits (\1 to \9) can be |
| 352 |
occurred and the extraction happened successfully; if no match occurs, the |
occurred and the extraction happened successfully; if no match occurs, the |
| 353 |
string is left unaffected. |
string is left unaffected. |
| 354 |
</P> |
</P> |
| 355 |
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br> |
<br><a name="SEC11" href="#TOC1">AUTHOR</a><br> |
| 356 |
<P> |
<P> |
| 357 |
The C++ wrapper was contributed by Google Inc. |
The C++ wrapper was contributed by Google Inc. |
| 358 |
<br> |
<br> |
| 359 |
Copyright © 2005 Google Inc. |
Copyright © 2007 Google Inc. |
| 360 |
|
<br> |
| 361 |
|
</P> |
| 362 |
|
<br><a name="SEC12" href="#TOC1">REVISION</a><br> |
| 363 |
|
<P> |
| 364 |
|
Last updated: 08 January 2012 |
| 365 |
|
<br> |
| 366 |
<p> |
<p> |
| 367 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |
| 368 |
</p> |
</p> |