/[pcre]/code/trunk/doc/html/pcrepattern.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepattern.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 974 by ph10, Sat Apr 14 16:16:58 2012 UTC revision 975 by ph10, Sat Jun 2 11:03:06 2012 UTC
# Line 227  backslash. All other characters (in part Line 227  backslash. All other characters (in part
227  greater than 127) are treated as literals.  greater than 127) are treated as literals.
228  </P>  </P>
229  <P>  <P>
230  If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the  If a pattern is compiled with the PCRE_EXTENDED option, white space in the
231  pattern (other than in a character class) and characters between a # outside  pattern (other than in a character class) and characters between a # outside
232  a character class and the next newline are ignored. An escaping backslash can  a character class and the next newline are ignored. An escaping backslash can
233  be used to include a whitespace or # character as part of the pattern.  be used to include a white space or # character as part of the pattern.
234  </P>  </P>
235  <P>  <P>
236  If you want to remove the special meaning from a sequence of characters, you  If you want to remove the special meaning from a sequence of characters, you
# Line 264  one of the following escape sequences th Line 264  one of the following escape sequences th
264    \a        alarm, that is, the BEL character (hex 07)    \a        alarm, that is, the BEL character (hex 07)
265    \cx       "control-x", where x is any ASCII character    \cx       "control-x", where x is any ASCII character
266    \e        escape (hex 1B)    \e        escape (hex 1B)
267    \f        formfeed (hex 0C)    \f        form feed (hex 0C)
268    \n        linefeed (hex 0A)    \n        linefeed (hex 0A)
269    \r        carriage return (hex 0D)    \r        carriage return (hex 0D)
270    \t        tab (hex 09)    \t        tab (hex 09)
# Line 406  Another use of backslash is for specifyi Line 406  Another use of backslash is for specifyi
406  <pre>  <pre>
407    \d     any decimal digit    \d     any decimal digit
408    \D     any character that is not a decimal digit    \D     any character that is not a decimal digit
409    \h     any horizontal whitespace character    \h     any horizontal white space character
410    \H     any character that is not a horizontal whitespace character    \H     any character that is not a horizontal white space character
411    \s     any whitespace character    \s     any white space character
412    \S     any character that is not a whitespace character    \S     any character that is not a white space character
413    \v     any vertical whitespace character    \v     any vertical white space character
414    \V     any character that is not a vertical whitespace character    \V     any character that is not a vertical white space character
415    \w     any "word" character    \w     any "word" character
416    \W     any "non-word" character    \W     any "non-word" character
417  </pre>  </pre>
# Line 497  The vertical space characters are: Line 497  The vertical space characters are:
497  <pre>  <pre>
498    U+000A     Linefeed    U+000A     Linefeed
499    U+000B     Vertical tab    U+000B     Vertical tab
500    U+000C     Formfeed    U+000C     Form feed
501    U+000D     Carriage return    U+000D     Carriage return
502    U+0085     Next line    U+0085     Next line
503    U+2028     Line separator    U+2028     Line separator
# Line 520  This is an example of an "atomic group", Line 520  This is an example of an "atomic group",
520  <a href="#atomicgroup">below.</a>  <a href="#atomicgroup">below.</a>
521  This particular group matches either the two-character sequence CR followed by  This particular group matches either the two-character sequence CR followed by
522  LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,  LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
523  U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next  U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
524  line, U+0085). The two-character sequence is treated as a single unit that  line, U+0085). The two-character sequence is treated as a single unit that
525  cannot be split.  cannot be split.
526  </P>  </P>
# Line 822  PCRE_UCP is set. They are: Line 822  PCRE_UCP is set. They are:
822    Xwd   Any Perl "word" character    Xwd   Any Perl "word" character
823  </pre>  </pre>
824  Xan matches characters that have either the L (letter) or the N (number)  Xan matches characters that have either the L (letter) or the N (number)
825  property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or  property. Xps matches the characters tab, linefeed, vertical tab, form feed, or
826  carriage return, and any other character that has the Z (separator) property.  carriage return, and any other character that has the Z (separator) property.
827  Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the  Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the
828  same characters as Xan, plus underscore.  same characters as Xan, plus underscore.
# Line 1829  Because there may be many capturing pare Line 1829  Because there may be many capturing pare
1829  following a backslash are taken as part of a potential back reference number.  following a backslash are taken as part of a potential back reference number.
1830  If the pattern continues with a digit character, some delimiter must be used to  If the pattern continues with a digit character, some delimiter must be used to
1831  terminate the back reference. If the PCRE_EXTENDED option is set, this can be  terminate the back reference. If the PCRE_EXTENDED option is set, this can be
1832  whitespace. Otherwise, the \g{ syntax or an empty comment (see  white space. Otherwise, the \g{ syntax or an empty comment (see
1833  <a href="#comments">"Comments"</a>  <a href="#comments">"Comments"</a>
1834  below) can be used.  below) can be used.
1835  </P>  </P>
# Line 2171  point in the pattern; the idea of DEFINE Line 2171  point in the pattern; the idea of DEFINE
2171  subroutines that can be referenced from elsewhere. (The use of  subroutines that can be referenced from elsewhere. (The use of
2172  <a href="#subpatternsassubroutines">subroutines</a>  <a href="#subpatternsassubroutines">subroutines</a>
2173  is described below.) For example, a pattern to match an IPv4 address such as  is described below.) For example, a pattern to match an IPv4 address such as
2174  "192.168.23.245" could be written like this (ignore whitespace and line  "192.168.23.245" could be written like this (ignore white space and line
2175  breaks):  breaks):
2176  <pre>  <pre>
2177    (?(DEFINE) (?&#60;byte&#62; 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )    (?(DEFINE) (?&#60;byte&#62; 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
# Line 2565  exception: the name from a *(MARK), (*PR Line 2565  exception: the name from a *(MARK), (*PR
2565  a successful positive assertion <i>is</i> passed back when a match succeeds  a successful positive assertion <i>is</i> passed back when a match succeeds
2566  (compare capturing parentheses in assertions). Note that such subpatterns are  (compare capturing parentheses in assertions). Note that such subpatterns are
2567  processed as anchored at the point where they are tested. Note also that Perl's  processed as anchored at the point where they are tested. Note also that Perl's
2568  treatment of subroutines is different in some cases.  treatment of subroutines and assertions is different in some cases.
2569  </P>  </P>
2570  <P>  <P>
2571  The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
2572  parenthesis followed by an asterisk. They are generally of the form  parenthesis followed by an asterisk. They are generally of the form
2573  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
2574  depending on whether or not an argument is present. A name is any sequence of  depending on whether or not an argument is present. A name is any sequence of
2575  characters that does not include a closing parenthesis. If the name is empty,  characters that does not include a closing parenthesis. The maximum length of
2576  that is, if the closing parenthesis immediately follows the colon, the effect  name is 255 in the 8-bit library and 65535 in the 16-bit library. If the name
2577  is as if the colon were not there. Any number of these verbs may occur in a  is empty, that is, if the closing parenthesis immediately follows the colon,
2578  pattern.  the effect is as if the colon were not there. Any number of these verbs may
2579    occur in a pattern.
2580  <a name="nooptimize"></a></P>  <a name="nooptimize"></a></P>
2581  <br><b>  <br><b>
2582  Optimizations that affect backtracking verbs  Optimizations that affect backtracking verbs
# Line 2593  section entitled Line 2594  section entitled
2594  <a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a>  <a href="pcreapi.html#execoptions">"Option bits for <b>pcre_exec()</b>"</a>
2595  in the  in the
2596  <a href="pcreapi.html"><b>pcreapi</b></a>  <a href="pcreapi.html"><b>pcreapi</b></a>
2597  documentation.  documentation.
2598  </P>  </P>
2599  <P>  <P>
2600  Experiments with Perl suggest that it too has similar optimizations, sometimes  Experiments with Perl suggest that it too has similar optimizations, sometimes
# Line 2687  attempts starting at "P" and then with a Line 2688  attempts starting at "P" and then with a
2688  </P>  </P>
2689  <P>  <P>
2690  If you are interested in (*MARK) values after failed matches, you should  If you are interested in (*MARK) values after failed matches, you should
2691  probably set the PCRE_NO_START_OPTIMIZE option  probably set the PCRE_NO_START_OPTIMIZE option
2692  <a href="#nooptimize">(see above)</a>  <a href="#nooptimize">(see above)</a>
2693  to ensure that the match is always attempted.  to ensure that the match is always attempted.
2694  </P>  </P>
# Line 2868  Cambridge CB2 3QH, England. Line 2869  Cambridge CB2 3QH, England.
2869  </P>  </P>
2870  <br><a name="SEC28" href="#TOC1">REVISION</a><br>  <br><a name="SEC28" href="#TOC1">REVISION</a><br>
2871  <P>  <P>
2872  Last updated: 14 April 2012  Last updated: 01 June 2012
2873  <br>  <br>
2874  Copyright &copy; 1997-2012 University of Cambridge.  Copyright &copy; 1997-2012 University of Cambridge.
2875  <br>  <br>

Legend:
Removed from v.974  
changed lines
  Added in v.975

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12