--- code/trunk/doc/html/pcrecpp.html 2007/02/24 21:40:45 77 +++ code/trunk/doc/html/pcrecpp.html 2007/02/24 21:40:59 81 @@ -18,10 +18,11 @@
@@ -31,9 +32,10 @@
-The C++ wrapper for PCRE was provided by Google Inc. This brief man page was -constructed from the notes in the pcrecpp.h file, which should be -consulted for further details. +The C++ wrapper for PCRE was provided by Google Inc. Some additional +functionality was added by Giuseppe Maxia. This brief man page was constructed +from the notes in the pcrecpp.h file, which should be consulted for +further details.
@@ -148,7 +150,97 @@ --enable-utf8 flag.
-+PCRE defines some modifiers to change the behavior of the regular expression +engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to +pass such modifiers to a RE class. Currently, the following modifiers are +supported: +
+ modifier description Perl corresponding + + PCRE_CASELESS case insensitive match /i + PCRE_MULTILINE multiple lines match /m + PCRE_DOTALL dot matches newlines /s + PCRE_DOLLAR_ENDONLY $ matches only at end N/A + PCRE_EXTRA strict escape parsing N/A + PCRE_EXTENDED ignore whitespaces /x + PCRE_UTF8 handles UTF8 chars built-in + PCRE_UNGREEDY reverses * and *? N/A + PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*) ++(*) Both Perl and PCRE allow non capturing parentheses by means of the +"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not +capture, while (ab|cd) does. + +
+For a full account on how each modifier works, please check the +PCRE API reference page. +
++For each modifier, there are two member functions whose name is made +out of the modifier in lowercase, without the "PCRE_" prefix. For +instance, PCRE_CASELESS is handled by +
+ bool caseless() ++which returns true if the modifier is set, and +
+ RE_Options & set_caseless(bool) ++which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be +accessed through the set_match_limit() and match_limit() member +functions. Setting match_limit to a non-zero value will limit the +execution of pcre to keep it from doing bad things like blowing the stack or +taking an eternity to return a result. A value of 5000 is good enough to stop +stack blowup in a 2MB thread stack. Setting match_limit to zero disables +match limiting. + +
+Normally, to pass one or more modifiers to a RE class, you declare +a RE_Options object, set the appropriate options, and pass this +object to a RE constructor. Example: +
+ RE_options opt;
+ opt.set_caseless(true);
+ if (RE("HELLO", opt).PartialMatch("hello world")) ...
+
+RE_options has two constructors. The default constructor takes no arguments and
+creates a set of flags that are off by default. The optional parameter
+option_flags is to facilitate transfer of legacy code from C programs.
+This lets you do
++ RE(pattern, + RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); ++However, new code is better off doing +
+ RE(pattern, + RE_Options().set_caseless(true).set_multiline(true)) + .PartialMatch(str); ++If you are going to pass one of the most used modifiers, there are some +convenience functions that return a RE_Options class with the +appropriate modifier already set: CASELESS(), UTF8(), +MULTILINE(), DOTALL(), and EXTENDED(). + +
+If you need to set several options at once, and you don't want to go through +the pains of declaring a RE_Options object and setting several options, there +is a parallel method that give you such ability on the fly. You can concatenate +several set_xxxxx() member functions, since each of them returns a +reference to its class object. For example, to pass PCRE_CASELESS, +PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write: +
+ RE(" ^ xyz \\s+ .* blah$",
+ RE_Options()
+ .set_caseless(true)
+ .set_extended(true)
+ .set_multiline(true)).PartialMatch(sometext);
+
+
+
+The "Consume" operation may be useful if you want to repeatedly match regular expressions at the front of a string and skip over @@ -181,7 +273,7 @@ pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
-By default, if you pass a pointer to a numeric value, the corresponding text is interpreted as a base-10 number. You can @@ -199,7 +291,7 @@ will leave 64 in a, b, c, and d.
-You can replace the first match of "pattern" in "str" with "rewrite". Within "rewrite", backslash-escaped digits (\1 to \9) can be @@ -231,7 +323,7 @@ occurred and the extraction happened successfully; if no match occurs, the string is left unaffected.
-
The C++ wrapper was contributed by Google Inc.