| 11 |
.SH DESCRIPTION |
.SH DESCRIPTION |
| 12 |
.rs |
.rs |
| 13 |
.sp |
.sp |
| 14 |
The C++ wrapper for PCRE was provided by Google Inc. This brief man page was |
The C++ wrapper for PCRE was provided by Google Inc. Some additional |
| 15 |
constructed from the notes in the \fIpcrecpp.h\fP file, which should be |
functionality was added by Giuseppe Maxia. This brief man page was constructed |
| 16 |
consulted for further details. |
from the notes in the \fIpcrecpp.h\fP file, which should be consulted for |
| 17 |
|
further details. |
| 18 |
. |
. |
| 19 |
. |
. |
| 20 |
.SH "MATCHING INTERFACE" |
.SH "MATCHING INTERFACE" |
| 131 |
--enable-utf8 flag. |
--enable-utf8 flag. |
| 132 |
. |
. |
| 133 |
. |
. |
| 134 |
|
.SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE" |
| 135 |
|
.rs |
| 136 |
|
.sp |
| 137 |
|
PCRE defines some modifiers to change the behavior of the regular expression |
| 138 |
|
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to |
| 139 |
|
pass such modifiers to a RE class. Currently, the following modifiers are |
| 140 |
|
supported: |
| 141 |
|
.sp |
| 142 |
|
modifier description Perl corresponding |
| 143 |
|
.sp |
| 144 |
|
PCRE_CASELESS case insensitive match /i |
| 145 |
|
PCRE_MULTILINE multiple lines match /m |
| 146 |
|
PCRE_DOTALL dot matches newlines /s |
| 147 |
|
PCRE_DOLLAR_ENDONLY $ matches only at end N/A |
| 148 |
|
PCRE_EXTRA strict escape parsing N/A |
| 149 |
|
PCRE_EXTENDED ignore whitespaces /x |
| 150 |
|
PCRE_UTF8 handles UTF8 chars built-in |
| 151 |
|
PCRE_UNGREEDY reverses * and *? N/A |
| 152 |
|
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*) |
| 153 |
|
.sp |
| 154 |
|
(*) Both Perl and PCRE allow non capturing parentheses by means of the |
| 155 |
|
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not |
| 156 |
|
capture, while (ab|cd) does. |
| 157 |
|
.P |
| 158 |
|
For a full account on how each modifier works, please check the |
| 159 |
|
PCRE API reference page. |
| 160 |
|
.P |
| 161 |
|
For each modifier, there are two member functions whose name is made |
| 162 |
|
out of the modifier in lowercase, without the "PCRE_" prefix. For |
| 163 |
|
instance, PCRE_CASELESS is handled by |
| 164 |
|
.sp |
| 165 |
|
bool caseless() |
| 166 |
|
.sp |
| 167 |
|
which returns true if the modifier is set, and |
| 168 |
|
.sp |
| 169 |
|
RE_Options & set_caseless(bool) |
| 170 |
|
.sp |
| 171 |
|
which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be |
| 172 |
|
accessed through the \fBset_match_limit()\fR and \fBmatch_limit()\fR member |
| 173 |
|
functions. Setting \fImatch_limit\fR to a non-zero value will limit the |
| 174 |
|
execution of pcre to keep it from doing bad things like blowing the stack or |
| 175 |
|
taking an eternity to return a result. A value of 5000 is good enough to stop |
| 176 |
|
stack blowup in a 2MB thread stack. Setting \fImatch_limit\fR to zero disables |
| 177 |
|
match limiting. |
| 178 |
|
.P |
| 179 |
|
Normally, to pass one or more modifiers to a RE class, you declare |
| 180 |
|
a \fIRE_Options\fR object, set the appropriate options, and pass this |
| 181 |
|
object to a RE constructor. Example: |
| 182 |
|
.sp |
| 183 |
|
RE_options opt; |
| 184 |
|
opt.set_caseless(true); |
| 185 |
|
if (RE("HELLO", opt).PartialMatch("hello world")) ... |
| 186 |
|
.sp |
| 187 |
|
RE_options has two constructors. The default constructor takes no arguments and |
| 188 |
|
creates a set of flags that are off by default. The optional parameter |
| 189 |
|
\fIoption_flags\fR is to facilitate transfer of legacy code from C programs. |
| 190 |
|
This lets you do |
| 191 |
|
.sp |
| 192 |
|
RE(pattern, |
| 193 |
|
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); |
| 194 |
|
.sp |
| 195 |
|
However, new code is better off doing |
| 196 |
|
.sp |
| 197 |
|
RE(pattern, |
| 198 |
|
RE_Options().set_caseless(true).set_multiline(true)) |
| 199 |
|
.PartialMatch(str); |
| 200 |
|
.sp |
| 201 |
|
If you are going to pass one of the most used modifiers, there are some |
| 202 |
|
convenience functions that return a RE_Options class with the |
| 203 |
|
appropriate modifier already set: \fBCASELESS()\fR, \fBUTF8()\fR, |
| 204 |
|
\fBMULTILINE()\fR, \fBDOTALL\fR(), and \fBEXTENDED()\fR. |
| 205 |
|
.P |
| 206 |
|
If you need to set several options at once, and you don't want to go through |
| 207 |
|
the pains of declaring a RE_Options object and setting several options, there |
| 208 |
|
is a parallel method that give you such ability on the fly. You can concatenate |
| 209 |
|
several \fBset_xxxxx()\fR member functions, since each of them returns a |
| 210 |
|
reference to its class object. For example, to pass PCRE_CASELESS, |
| 211 |
|
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write: |
| 212 |
|
.sp |
| 213 |
|
RE(" ^ xyz \e\es+ .* blah$", |
| 214 |
|
RE_Options() |
| 215 |
|
.set_caseless(true) |
| 216 |
|
.set_extended(true) |
| 217 |
|
.set_multiline(true)).PartialMatch(sometext); |
| 218 |
|
.sp |
| 219 |
|
. |
| 220 |
|
. |
| 221 |
.SH "SCANNING TEXT INCREMENTALLY" |
.SH "SCANNING TEXT INCREMENTALLY" |
| 222 |
.rs |
.rs |
| 223 |
.sp |
.sp |