--- code/trunk/doc/html/pcrecpp.html 2007/02/24 21:40:45 77 +++ code/trunk/doc/html/pcrecpp.html 2007/02/24 21:40:59 81 @@ -18,10 +18,11 @@
  • MATCHING INTERFACE
  • PARTIAL MATCHES
  • UTF-8 AND THE MATCHING INTERFACE -
  • SCANNING TEXT INCREMENTALLY -
  • PARSING HEX/OCTAL/C-RADIX NUMBERS -
  • REPLACING PARTS OF STRINGS -
  • AUTHOR +
  • PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE +
  • SCANNING TEXT INCREMENTALLY +
  • PARSING HEX/OCTAL/C-RADIX NUMBERS +
  • REPLACING PARTS OF STRINGS +
  • AUTHOR
    SYNOPSIS OF C++ WRAPPER

    @@ -31,9 +32,10 @@


    DESCRIPTION

    -The C++ wrapper for PCRE was provided by Google Inc. This brief man page was -constructed from the notes in the pcrecpp.h file, which should be -consulted for further details. +The C++ wrapper for PCRE was provided by Google Inc. Some additional +functionality was added by Giuseppe Maxia. This brief man page was constructed +from the notes in the pcrecpp.h file, which should be consulted for +further details.


    MATCHING INTERFACE

    @@ -148,7 +150,97 @@ --enable-utf8 flag.

    -
    SCANNING TEXT INCREMENTALLY
    +
    PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE
    +

    +PCRE defines some modifiers to change the behavior of the regular expression +engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to +pass such modifiers to a RE class. Currently, the following modifiers are +supported: +

    +   modifier              description               Perl corresponding
    +
    +   PCRE_CASELESS         case insensitive match      /i
    +   PCRE_MULTILINE        multiple lines match        /m
    +   PCRE_DOTALL           dot matches newlines        /s
    +   PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
    +   PCRE_EXTRA            strict escape parsing       N/A
    +   PCRE_EXTENDED         ignore whitespaces          /x
    +   PCRE_UTF8             handles UTF8 chars          built-in
    +   PCRE_UNGREEDY         reverses * and *?           N/A
    +   PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
    +
    +(*) Both Perl and PCRE allow non capturing parentheses by means of the +"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not +capture, while (ab|cd) does. +

    +

    +For a full account on how each modifier works, please check the +PCRE API reference page. +

    +

    +For each modifier, there are two member functions whose name is made +out of the modifier in lowercase, without the "PCRE_" prefix. For +instance, PCRE_CASELESS is handled by +

    +  bool caseless()
    +
    +which returns true if the modifier is set, and +
    +  RE_Options & set_caseless(bool)
    +
    +which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be +accessed through the set_match_limit() and match_limit() member +functions. Setting match_limit to a non-zero value will limit the +execution of pcre to keep it from doing bad things like blowing the stack or +taking an eternity to return a result. A value of 5000 is good enough to stop +stack blowup in a 2MB thread stack. Setting match_limit to zero disables +match limiting. +

    +

    +Normally, to pass one or more modifiers to a RE class, you declare +a RE_Options object, set the appropriate options, and pass this +object to a RE constructor. Example: +

    +   RE_options opt;
    +   opt.set_caseless(true);
    +   if (RE("HELLO", opt).PartialMatch("hello world")) ...
    +
    +RE_options has two constructors. The default constructor takes no arguments and +creates a set of flags that are off by default. The optional parameter +option_flags is to facilitate transfer of legacy code from C programs. +This lets you do +
    +   RE(pattern,
    +     RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
    +
    +However, new code is better off doing +
    +   RE(pattern,
    +     RE_Options().set_caseless(true).set_multiline(true))
    +       .PartialMatch(str);
    +
    +If you are going to pass one of the most used modifiers, there are some +convenience functions that return a RE_Options class with the +appropriate modifier already set: CASELESS(), UTF8(), +MULTILINE(), DOTALL(), and EXTENDED(). +

    +

    +If you need to set several options at once, and you don't want to go through +the pains of declaring a RE_Options object and setting several options, there +is a parallel method that give you such ability on the fly. You can concatenate +several set_xxxxx() member functions, since each of them returns a +reference to its class object. For example, to pass PCRE_CASELESS, +PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write: +

    +   RE(" ^ xyz \\s+ .* blah$",
    +     RE_Options()
    +       .set_caseless(true)
    +       .set_extended(true)
    +       .set_multiline(true)).PartialMatch(sometext);
    +
    +
    +

    +
    SCANNING TEXT INCREMENTALLY

    The "Consume" operation may be useful if you want to repeatedly match regular expressions at the front of a string and skip over @@ -181,7 +273,7 @@ pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)

    -
    PARSING HEX/OCTAL/C-RADIX NUMBERS
    +
    PARSING HEX/OCTAL/C-RADIX NUMBERS

    By default, if you pass a pointer to a numeric value, the corresponding text is interpreted as a base-10 number. You can @@ -199,7 +291,7 @@ will leave 64 in a, b, c, and d.

    -
    REPLACING PARTS OF STRINGS
    +
    REPLACING PARTS OF STRINGS

    You can replace the first match of "pattern" in "str" with "rewrite". Within "rewrite", backslash-escaped digits (\1 to \9) can be @@ -231,7 +323,7 @@ occurred and the extraction happened successfully; if no match occurs, the string is left unaffected.

    -
    AUTHOR
    +
    AUTHOR

    The C++ wrapper was contributed by Google Inc.