| 18 |
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a> |
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a> |
| 19 |
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a> |
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a> |
| 20 |
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a> |
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a> |
| 21 |
<li><a name="TOC6" href="#SEC6">SCANNING TEXT INCREMENTALLY</a> |
<li><a name="TOC6" href="#SEC6">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a> |
| 22 |
<li><a name="TOC7" href="#SEC7">PARSING HEX/OCTAL/C-RADIX NUMBERS</a> |
<li><a name="TOC7" href="#SEC7">SCANNING TEXT INCREMENTALLY</a> |
| 23 |
<li><a name="TOC8" href="#SEC8">REPLACING PARTS OF STRINGS</a> |
<li><a name="TOC8" href="#SEC8">PARSING HEX/OCTAL/C-RADIX NUMBERS</a> |
| 24 |
<li><a name="TOC9" href="#SEC9">AUTHOR</a> |
<li><a name="TOC9" href="#SEC9">REPLACING PARTS OF STRINGS</a> |
| 25 |
|
<li><a name="TOC10" href="#SEC10">AUTHOR</a> |
| 26 |
</ul> |
</ul> |
| 27 |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br> |
| 28 |
<P> |
<P> |
| 32 |
</P> |
</P> |
| 33 |
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br> |
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br> |
| 34 |
<P> |
<P> |
| 35 |
The C++ wrapper for PCRE was provided by Google Inc. This brief man page was |
The C++ wrapper for PCRE was provided by Google Inc. Some additional |
| 36 |
constructed from the notes in the <i>pcrecpp.h</i> file, which should be |
functionality was added by Giuseppe Maxia. This brief man page was constructed |
| 37 |
consulted for further details. |
from the notes in the <i>pcrecpp.h</i> file, which should be consulted for |
| 38 |
|
further details. |
| 39 |
</P> |
</P> |
| 40 |
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br> |
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br> |
| 41 |
<P> |
<P> |
| 150 |
--enable-utf8 flag. |
--enable-utf8 flag. |
| 151 |
</PRE> |
</PRE> |
| 152 |
</P> |
</P> |
| 153 |
<br><a name="SEC6" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br> |
<br><a name="SEC6" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br> |
| 154 |
|
<P> |
| 155 |
|
PCRE defines some modifiers to change the behavior of the regular expression |
| 156 |
|
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to |
| 157 |
|
pass such modifiers to a RE class. Currently, the following modifiers are |
| 158 |
|
supported: |
| 159 |
|
<pre> |
| 160 |
|
modifier description Perl corresponding |
| 161 |
|
|
| 162 |
|
PCRE_CASELESS case insensitive match /i |
| 163 |
|
PCRE_MULTILINE multiple lines match /m |
| 164 |
|
PCRE_DOTALL dot matches newlines /s |
| 165 |
|
PCRE_DOLLAR_ENDONLY $ matches only at end N/A |
| 166 |
|
PCRE_EXTRA strict escape parsing N/A |
| 167 |
|
PCRE_EXTENDED ignore whitespaces /x |
| 168 |
|
PCRE_UTF8 handles UTF8 chars built-in |
| 169 |
|
PCRE_UNGREEDY reverses * and *? N/A |
| 170 |
|
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*) |
| 171 |
|
</pre> |
| 172 |
|
(*) Both Perl and PCRE allow non capturing parentheses by means of the |
| 173 |
|
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not |
| 174 |
|
capture, while (ab|cd) does. |
| 175 |
|
</P> |
| 176 |
|
<P> |
| 177 |
|
For a full account on how each modifier works, please check the |
| 178 |
|
PCRE API reference page. |
| 179 |
|
</P> |
| 180 |
|
<P> |
| 181 |
|
For each modifier, there are two member functions whose name is made |
| 182 |
|
out of the modifier in lowercase, without the "PCRE_" prefix. For |
| 183 |
|
instance, PCRE_CASELESS is handled by |
| 184 |
|
<pre> |
| 185 |
|
bool caseless() |
| 186 |
|
</pre> |
| 187 |
|
which returns true if the modifier is set, and |
| 188 |
|
<pre> |
| 189 |
|
RE_Options & set_caseless(bool) |
| 190 |
|
</pre> |
| 191 |
|
which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be |
| 192 |
|
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member |
| 193 |
|
functions. Setting <i>match_limit</i> to a non-zero value will limit the |
| 194 |
|
execution of pcre to keep it from doing bad things like blowing the stack or |
| 195 |
|
taking an eternity to return a result. A value of 5000 is good enough to stop |
| 196 |
|
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables |
| 197 |
|
match limiting. |
| 198 |
|
</P> |
| 199 |
|
<P> |
| 200 |
|
Normally, to pass one or more modifiers to a RE class, you declare |
| 201 |
|
a <i>RE_Options</i> object, set the appropriate options, and pass this |
| 202 |
|
object to a RE constructor. Example: |
| 203 |
|
<pre> |
| 204 |
|
RE_options opt; |
| 205 |
|
opt.set_caseless(true); |
| 206 |
|
if (RE("HELLO", opt).PartialMatch("hello world")) ... |
| 207 |
|
</pre> |
| 208 |
|
RE_options has two constructors. The default constructor takes no arguments and |
| 209 |
|
creates a set of flags that are off by default. The optional parameter |
| 210 |
|
<i>option_flags</i> is to facilitate transfer of legacy code from C programs. |
| 211 |
|
This lets you do |
| 212 |
|
<pre> |
| 213 |
|
RE(pattern, |
| 214 |
|
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); |
| 215 |
|
</pre> |
| 216 |
|
However, new code is better off doing |
| 217 |
|
<pre> |
| 218 |
|
RE(pattern, |
| 219 |
|
RE_Options().set_caseless(true).set_multiline(true)) |
| 220 |
|
.PartialMatch(str); |
| 221 |
|
</pre> |
| 222 |
|
If you are going to pass one of the most used modifiers, there are some |
| 223 |
|
convenience functions that return a RE_Options class with the |
| 224 |
|
appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>, |
| 225 |
|
<b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>. |
| 226 |
|
</P> |
| 227 |
|
<P> |
| 228 |
|
If you need to set several options at once, and you don't want to go through |
| 229 |
|
the pains of declaring a RE_Options object and setting several options, there |
| 230 |
|
is a parallel method that give you such ability on the fly. You can concatenate |
| 231 |
|
several <b>set_xxxxx()</b> member functions, since each of them returns a |
| 232 |
|
reference to its class object. For example, to pass PCRE_CASELESS, |
| 233 |
|
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write: |
| 234 |
|
<pre> |
| 235 |
|
RE(" ^ xyz \\s+ .* blah$", |
| 236 |
|
RE_Options() |
| 237 |
|
.set_caseless(true) |
| 238 |
|
.set_extended(true) |
| 239 |
|
.set_multiline(true)).PartialMatch(sometext); |
| 240 |
|
|
| 241 |
|
</PRE> |
| 242 |
|
</P> |
| 243 |
|
<br><a name="SEC7" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br> |
| 244 |
<P> |
<P> |
| 245 |
The "Consume" operation may be useful if you want to repeatedly |
The "Consume" operation may be useful if you want to repeatedly |
| 246 |
match regular expressions at the front of a string and skip over |
match regular expressions at the front of a string and skip over |
| 273 |
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word) |
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word) |
| 274 |
</PRE> |
</PRE> |
| 275 |
</P> |
</P> |
| 276 |
<br><a name="SEC7" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br> |
<br><a name="SEC8" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br> |
| 277 |
<P> |
<P> |
| 278 |
By default, if you pass a pointer to a numeric value, the |
By default, if you pass a pointer to a numeric value, the |
| 279 |
corresponding text is interpreted as a base-10 number. You can |
corresponding text is interpreted as a base-10 number. You can |
| 291 |
</pre> |
</pre> |
| 292 |
will leave 64 in a, b, c, and d. |
will leave 64 in a, b, c, and d. |
| 293 |
</P> |
</P> |
| 294 |
<br><a name="SEC8" href="#TOC1">REPLACING PARTS OF STRINGS</a><br> |
<br><a name="SEC9" href="#TOC1">REPLACING PARTS OF STRINGS</a><br> |
| 295 |
<P> |
<P> |
| 296 |
You can replace the first match of "pattern" in "str" with "rewrite". |
You can replace the first match of "pattern" in "str" with "rewrite". |
| 297 |
Within "rewrite", backslash-escaped digits (\1 to \9) can be |
Within "rewrite", backslash-escaped digits (\1 to \9) can be |
| 323 |
occurred and the extraction happened successfully; if no match occurs, the |
occurred and the extraction happened successfully; if no match occurs, the |
| 324 |
string is left unaffected. |
string is left unaffected. |
| 325 |
</P> |
</P> |
| 326 |
<br><a name="SEC9" href="#TOC1">AUTHOR</a><br> |
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br> |
| 327 |
<P> |
<P> |
| 328 |
The C++ wrapper was contributed by Google Inc. |
The C++ wrapper was contributed by Google Inc. |
| 329 |
<br> |
<br> |