/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 974 by ph10, Sat Apr 14 16:16:58 2012 UTC revision 975 by ph10, Sat Jun 2 11:03:06 2012 UTC
# Line 138  REVISION Line 138  REVISION
138         Last updated: 10 January 2012         Last updated: 10 January 2012
139         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
140  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
141    
142    
143  PCRE(3)                                                                PCRE(3)  PCRE(3)                                                                PCRE(3)
144    
145    
# Line 464  REVISION Line 464  REVISION
464         Last updated: 14 April 2012         Last updated: 14 April 2012
465         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
466  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
467    
468    
469  PCREBUILD(3)                                                      PCREBUILD(3)  PCREBUILD(3)                                                      PCREBUILD(3)
470    
471    
# Line 568  UTF-8 and UTF-16 SUPPORT Line 568  UTF-8 and UTF-16 SUPPORT
568         tern compiling functions.         tern compiling functions.
569    
570         If you set --enable-utf when compiling in an EBCDIC  environment,  PCRE         If you set --enable-utf when compiling in an EBCDIC  environment,  PCRE
571         expects its input to be either ASCII or UTF-8 (depending on the runtime         expects  its  input  to be either ASCII or UTF-8 (depending on the run-
572         option). It is not possible to support both EBCDIC and UTF-8  codes  in         time option). It is not possible to support both EBCDIC and UTF-8 codes
573         the  same  version  of  the  library.  Consequently,  --enable-utf  and         in  the  same  version  of  the library. Consequently, --enable-utf and
574         --enable-ebcdic are mutually exclusive.         --enable-ebcdic are mutually exclusive.
575    
576    
# Line 761  CREATING CHARACTER TABLES AT BUILD TIME Line 761  CREATING CHARACTER TABLES AT BUILD TIME
761         to the configure command, the distributed tables are  no  longer  used.         to the configure command, the distributed tables are  no  longer  used.
762         Instead,  a  program  called dftables is compiled and run. This outputs         Instead,  a  program  called dftables is compiled and run. This outputs
763         the source for new set of tables, created in the default locale of your         the source for new set of tables, created in the default locale of your
764         C runtime system. (This method of replacing the tables does not work if         C  run-time  system. (This method of replacing the tables does not work
765         you are cross compiling, because dftables is run on the local host.  If         if you are cross compiling, because dftables is run on the local  host.
766         you  need  to  create alternative tables when cross compiling, you will         If you need to create alternative tables when cross compiling, you will
767         have to do so "by hand".)         have to do so "by hand".)
768    
769    
# Line 860  REVISION Line 860  REVISION
860         Last updated: 07 January 2012         Last updated: 07 January 2012
861         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
862  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
863    
864    
865  PCREMATCHING(3)                                                PCREMATCHING(3)  PCREMATCHING(3)                                                PCREMATCHING(3)
866    
867    
# Line 1067  REVISION Line 1067  REVISION
1067         Last updated: 08 January 2012         Last updated: 08 January 2012
1068         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
1069  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
1070    
1071    
1072  PCREAPI(3)                                                          PCREAPI(3)  PCREAPI(3)                                                          PCREAPI(3)
1073    
1074    
# Line 1311  NEWLINES Line 1311  NEWLINES
1311         feed) character, the two-character sequence CRLF, any of the three pre-         feed) character, the two-character sequence CRLF, any of the three pre-
1312         ceding,  or any Unicode newline sequence. The Unicode newline sequences         ceding,  or any Unicode newline sequence. The Unicode newline sequences
1313         are the three just mentioned, plus the single characters  VT  (vertical         are the three just mentioned, plus the single characters  VT  (vertical
1314         tab,  U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line         tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
1315         separator, U+2028), and PS (paragraph separator, U+2029).         separator, U+2028), and PS (paragraph separator, U+2029).
1316    
1317         Each of the first three conventions is used by at least  one  operating         Each of the first three conventions is used by at least  one  operating
# Line 1625  COMPILING A PATTERN Line 1625  COMPILING A PATTERN
1625    
1626           PCRE_EXTENDED           PCRE_EXTENDED
1627    
1628         If  this  bit  is  set,  whitespace  data characters in the pattern are         If  this  bit  is  set,  white space data characters in the pattern are
1629         totally ignored except when escaped or inside a character class. White-         totally ignored except when escaped or inside a character class.  White
1630         space does not include the VT character (code 11). In addition, charac-         space does not include the VT character (code 11). In addition, charac-
1631         ters between an unescaped # outside a character class and the next new-         ters between an unescaped # outside a character class and the next new-
1632         line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x         line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
# Line 1642  COMPILING A PATTERN Line 1642  COMPILING A PATTERN
1642    
1643         This option makes it possible to include  comments  inside  complicated         This option makes it possible to include  comments  inside  complicated
1644         patterns.   Note,  however,  that this applies only to data characters.         patterns.   Note,  however,  that this applies only to data characters.
1645         Whitespace  characters  may  never  appear  within  special   character         White space  characters  may  never  appear  within  special  character
1646         sequences in a pattern, for example within the sequence (?( that intro-         sequences in a pattern, for example within the sequence (?( that intro-
1647         duces a conditional subpattern.         duces a conditional subpattern.
1648    
# Line 1727  COMPILING A PATTERN Line 1727  COMPILING A PATTERN
1727         that any of the three preceding sequences should be recognized. Setting         that any of the three preceding sequences should be recognized. Setting
1728         PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be         PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
1729         recognized. The Unicode newline sequences are the three just mentioned,         recognized. The Unicode newline sequences are the three just mentioned,
1730         plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,         plus  the  single  characters VT (vertical tab, U+000B), FF (form feed,
1731         U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS         U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
1732         (paragraph  separator, U+2029). For the 8-bit library, the last two are         (paragraph  separator, U+2029). For the 8-bit library, the last two are
1733         recognized only in UTF-8 mode.         recognized only in UTF-8 mode.
# Line 1741  COMPILING A PATTERN Line 1741  COMPILING A PATTERN
1741         cause an error.         cause an error.
1742    
1743         The only time that a line break in a pattern  is  specially  recognized         The only time that a line break in a pattern  is  specially  recognized
1744         when  compiling  is when PCRE_EXTENDED is set. CR and LF are whitespace         when  compiling is when PCRE_EXTENDED is set. CR and LF are white space
1745         characters, and so are ignored in this mode. Also, an unescaped #  out-         characters, and so are ignored in this mode. Also, an unescaped #  out-
1746         side  a  character class indicates a comment that lasts until after the         side  a  character class indicates a comment that lasts until after the
1747         next line break sequence. In other circumstances, line break  sequences         next line break sequence. In other circumstances, line break  sequences
# Line 1894  COMPILATION ERROR CODES Line 1894  COMPILATION ERROR CODES
1894           72  too many forward references           72  too many forward references
1895           73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)           73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
1896           74  invalid UTF-16 string (specifically UTF-16)           74  invalid UTF-16 string (specifically UTF-16)
1897             75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
1898    
1899         The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different         The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different
1900         values may be used if the limits were changed when PCRE was built.         values may be used if the limits were changed when PCRE was built.
# Line 2993  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2994  MATCHING A PATTERN: THE TRADITIONAL FUNC
2994         for the just-in-time processing stack is  not  large  enough.  See  the         for the just-in-time processing stack is  not  large  enough.  See  the
2995         pcrejit documentation for more details.         pcrejit documentation for more details.
2996    
2997           PCRE_ERROR_BADMODE (-28)           PCRE_ERROR_BADMODE        (-28)
2998    
2999         This error is given if a pattern that was compiled by the 8-bit library         This error is given if a pattern that was compiled by the 8-bit library
3000         is passed to a 16-bit library function, or vice versa.         is passed to a 16-bit library function, or vice versa.
3001    
3002           PCRE_ERROR_BADENDIANNESS (-29)           PCRE_ERROR_BADENDIANNESS  (-29)
3003    
3004         This error is given if  a  pattern  that  was  compiled  and  saved  is         This error is given if  a  pattern  that  was  compiled  and  saved  is
3005         reloaded  on  a  host  with  different endianness. The utility function         reloaded  on  a  host  with  different endianness. The utility function
3006         pcre_pattern_to_host_byte_order() can be used to convert such a pattern         pcre_pattern_to_host_byte_order() can be used to convert such a pattern
3007         so that it runs on the new host.         so that it runs on the new host.
3008    
3009         Error numbers -16 to -20 and -22 are not used by pcre_exec().         Error numbers -16 to -20, -22, and -30 are not used by pcre_exec().
3010    
3011     Reason codes for invalid UTF-8 strings     Reason codes for invalid UTF-8 strings
3012    
# Line 3468  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 3469  MATCHING A PATTERN: THE ALTERNATIVE FUNC
3469         This error is given if the output vector  is  not  large  enough.  This         This error is given if the output vector  is  not  large  enough.  This
3470         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
3471    
3472             PCRE_ERROR_DFA_BADRESTART (-30)
3473    
3474           When  pcre_dfa_exec()  is called with the PCRE_DFA_RESTART option, some
3475           plausibility checks are made on the contents of  the  workspace,  which
3476           should  contain  data about the previous partial match. If any of these
3477           checks fail, this error is given.
3478    
3479    
3480  SEE ALSO  SEE ALSO
3481    
3482         pcre16(3),   pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),  pcrematch-         pcre16(3),  pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),   pcrematch-
3483         ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),         ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),
3484         pcrestack(3).         pcrestack(3).
3485    
# Line 3485  AUTHOR Line 3493  AUTHOR
3493    
3494  REVISION  REVISION
3495    
3496         Last updated: 14 April 2012         Last updated: 04 May 2012
3497         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3498  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3499    
3500    
3501  PCRECALLOUT(3)                                                  PCRECALLOUT(3)  PCRECALLOUT(3)                                                  PCRECALLOUT(3)
3502    
3503    
# Line 3687  REVISION Line 3695  REVISION
3695         Last updated: 08 Janurary 2012         Last updated: 08 Janurary 2012
3696         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3697  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3698    
3699    
3700  PCRECOMPAT(3)                                                    PCRECOMPAT(3)  PCRECOMPAT(3)                                                    PCRECOMPAT(3)
3701    
3702    
# Line 3777  DIFFERENCES BETWEEN PCRE AND PERL Line 3785  DIFFERENCES BETWEEN PCRE AND PERL
3785         There is a discussion that explains these differences in more detail in         There is a discussion that explains these differences in more detail in
3786         the section on recursion differences from Perl in the pcrepattern page.         the section on recursion differences from Perl in the pcrepattern page.
3787    
3788         11.  If  (*THEN)  is present in a group that is called as a subroutine,         11.  If  any of the backtracking control verbs are used in an assertion
3789         its action is limited to that group, even if the group does not contain         or in a subpattern that is called  as  a  subroutine  (whether  or  not
3790         any | characters.         recursively),  their effect is confined to that subpattern; it does not
3791           extend to the surrounding pattern. This is not always the case in Perl.
3792           In  particular,  if  (*THEN)  is present in a group that is called as a
3793           subroutine, its action is limited to that group, even if the group does
3794           not  contain any | characters. There is one exception to this: the name
3795           from a *(MARK), (*PRUNE), or (*THEN) that is encountered in a  success-
3796           ful  positive  assertion  is passed back when a match succeeds (compare
3797           capturing parentheses in assertions). Note that  such  subpatterns  are
3798           processed as anchored at the point where they are tested.
3799    
3800         12.  There are some differences that are concerned with the settings of         12.  There are some differences that are concerned with the settings of
3801         captured strings when part of  a  pattern  is  repeated.  For  example,         captured strings when part of  a  pattern  is  repeated.  For  example,
# Line 3799  DIFFERENCES BETWEEN PCRE AND PERL Line 3815  DIFFERENCES BETWEEN PCRE AND PERL
3815    
3816         14.  Perl  recognizes  comments  in some places that PCRE does not, for         14.  Perl  recognizes  comments  in some places that PCRE does not, for
3817         example, between the ( and ? at the start of a subpattern.  If  the  /x         example, between the ( and ? at the start of a subpattern.  If  the  /x
3818         modifier  is set, Perl allows whitespace between ( and ? but PCRE never         modifier is set, Perl allows white space between ( and ? but PCRE never
3819         does, even if the PCRE_EXTENDED option is set.         does, even if the PCRE_EXTENDED option is set.
3820    
3821         15. PCRE provides some extensions to the Perl regular expression facil-         15. PCRE provides some extensions to the Perl regular expression facil-
# Line 3859  AUTHOR Line 3875  AUTHOR
3875    
3876  REVISION  REVISION
3877    
3878         Last updated: 08 Januray 2012         Last updated: 01 June 2012
3879         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3880  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3881    
3882    
3883  PCREPATTERN(3)                                                  PCREPATTERN(3)  PCREPATTERN(3)                                                  PCREPATTERN(3)
3884    
3885    
# Line 4045  BACKSLASH Line 4061  BACKSLASH
4061         after  a  backslash.  All  other characters (in particular, those whose         after  a  backslash.  All  other characters (in particular, those whose
4062         codepoints are greater than 127) are treated as literals.         codepoints are greater than 127) are treated as literals.
4063    
4064         If a pattern is compiled with the PCRE_EXTENDED option,  whitespace  in         If a pattern is compiled with the PCRE_EXTENDED option, white space  in
4065         the  pattern (other than in a character class) and characters between a         the  pattern (other than in a character class) and characters between a
4066         # outside a character class and the next newline are ignored. An escap-         # outside a character class and the next newline are ignored. An escap-
4067         ing  backslash  can  be  used to include a whitespace or # character as         ing  backslash  can  be used to include a white space or # character as
4068         part of the pattern.         part of the pattern.
4069    
4070         If you want to remove the special meaning from a  sequence  of  charac-         If you want to remove the special meaning from a  sequence  of  charac-
# Line 4083  BACKSLASH Line 4099  BACKSLASH
4099           \a        alarm, that is, the BEL character (hex 07)           \a        alarm, that is, the BEL character (hex 07)
4100           \cx       "control-x", where x is any ASCII character           \cx       "control-x", where x is any ASCII character
4101           \e        escape (hex 1B)           \e        escape (hex 1B)
4102           \f        formfeed (hex 0C)           \f        form feed (hex 0C)
4103           \n        linefeed (hex 0A)           \n        linefeed (hex 0A)
4104           \r        carriage return (hex 0D)           \r        carriage return (hex 0D)
4105           \t        tab (hex 09)           \t        tab (hex 09)
# Line 4212  BACKSLASH Line 4228  BACKSLASH
4228    
4229           \d     any decimal digit           \d     any decimal digit
4230           \D     any character that is not a decimal digit           \D     any character that is not a decimal digit
4231           \h     any horizontal whitespace character           \h     any horizontal white space character
4232           \H     any character that is not a horizontal whitespace character           \H     any character that is not a horizontal white space character
4233           \s     any whitespace character           \s     any white space character
4234           \S     any character that is not a whitespace character           \S     any character that is not a white space character
4235           \v     any vertical whitespace character           \v     any vertical white space character
4236           \V     any character that is not a vertical whitespace character           \V     any character that is not a vertical white space character
4237           \w     any "word" character           \w     any "word" character
4238           \W     any "non-word" character           \W     any "non-word" character
4239    
# Line 4297  BACKSLASH Line 4313  BACKSLASH
4313    
4314           U+000A     Linefeed           U+000A     Linefeed
4315           U+000B     Vertical tab           U+000B     Vertical tab
4316           U+000C     Formfeed           U+000C     Form feed
4317           U+000D     Carriage return           U+000D     Carriage return
4318           U+0085     Next line           U+0085     Next line
4319           U+2028     Line separator           U+2028     Line separator
# Line 4317  BACKSLASH Line 4333  BACKSLASH
4333         This  is  an  example  of an "atomic group", details of which are given         This  is  an  example  of an "atomic group", details of which are given
4334         below.  This particular group matches either the two-character sequence         below.  This particular group matches either the two-character sequence
4335         CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,         CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,
4336         U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage         U+000A), VT (vertical tab, U+000B), FF (form feed,  U+000C),  CR  (car-
4337         return, U+000D), or NEL (next line, U+0085). The two-character sequence         riage  return,  U+000D),  or NEL (next line, U+0085). The two-character
4338         is treated as a single unit that cannot be split.         sequence is treated as a single unit that cannot be split.
4339    
4340         In other modes, two additional characters whose codepoints are  greater         In other modes, two additional characters whose codepoints are  greater
4341         than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-         than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
# Line 4519  BACKSLASH Line 4535  BACKSLASH
4535    
4536         Xan matches characters that have either the L (letter) or the  N  (num-         Xan matches characters that have either the L (letter) or the  N  (num-
4537         ber)  property. Xps matches the characters tab, linefeed, vertical tab,         ber)  property. Xps matches the characters tab, linefeed, vertical tab,
4538         formfeed, or carriage return, and any other character that  has  the  Z         form feed, or carriage return, and any other character that has  the  Z
4539         (separator) property.  Xsp is the same as Xps, except that vertical tab         (separator) property.  Xsp is the same as Xps, except that vertical tab
4540         is excluded. Xwd matches the same characters as Xan, plus underscore.         is excluded. Xwd matches the same characters as Xan, plus underscore.
4541    
# Line 5484  BACK REFERENCES Line 5500  BACK REFERENCES
5500         its following a backslash are taken as part of a potential back  refer-         its following a backslash are taken as part of a potential back  refer-
5501         ence  number.   If  the  pattern continues with a digit character, some         ence  number.   If  the  pattern continues with a digit character, some
5502         delimiter must  be  used  to  terminate  the  back  reference.  If  the         delimiter must  be  used  to  terminate  the  back  reference.  If  the
5503         PCRE_EXTENDED option is set, this can be whitespace. Otherwise, the \g{         PCRE_EXTENDED  option  is  set, this can be white space. Otherwise, the
5504         syntax or an empty comment (see "Comments" below) can be used.         \g{ syntax or an empty comment (see "Comments" below) can be used.
5505    
5506     Recursive back references     Recursive back references
5507    
# Line 5797  CONDITIONAL SUBPATTERNS Line 5813  CONDITIONAL SUBPATTERNS
5813         DEFINE  is that it can be used to define subroutines that can be refer-         DEFINE  is that it can be used to define subroutines that can be refer-
5814         enced from elsewhere. (The use of subroutines is described below.)  For         enced from elsewhere. (The use of subroutines is described below.)  For
5815         example,  a  pattern  to match an IPv4 address such as "192.168.23.245"         example,  a  pattern  to match an IPv4 address such as "192.168.23.245"
5816         could be written like this (ignore whitespace and line breaks):         could be written like this (ignore white space and line breaks):
5817    
5818           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
5819           \b (?&byte) (\.(?&byte)){3} \b           \b (?&byte) (\.(?&byte)){3} \b
# Line 6188  BACKTRACKING CONTROL Line 6204  BACKTRACKING CONTROL
6204         that  is  encountered in a successful positive assertion is passed back         that  is  encountered in a successful positive assertion is passed back
6205         when a match succeeds (compare capturing  parentheses  in  assertions).         when a match succeeds (compare capturing  parentheses  in  assertions).
6206         Note that such subpatterns are processed as anchored at the point where         Note that such subpatterns are processed as anchored at the point where
6207         they are tested. Note also that Perl's treatment of subroutines is dif-         they are tested. Note also that Perl's  treatment  of  subroutines  and
6208         ferent in some cases.         assertions is different in some cases.
6209    
6210         The  new verbs make use of what was previously invalid syntax: an open-         The  new verbs make use of what was previously invalid syntax: an open-
6211         ing parenthesis followed by an asterisk. They are generally of the form         ing parenthesis followed by an asterisk. They are generally of the form
6212         (*VERB)  or (*VERB:NAME). Some may take either form, with differing be-         (*VERB)  or (*VERB:NAME). Some may take either form, with differing be-
6213         haviour, depending on whether or not an argument is present. A name  is         haviour, depending on whether or not an argument is present. A name  is
6214         any sequence of characters that does not include a closing parenthesis.         any sequence of characters that does not include a closing parenthesis.
6215         If the name is empty, that is, if the closing  parenthesis  immediately         The maximum length of name is 255 in the 8-bit library and 65535 in the
6216         follows  the  colon,  the effect is as if the colon were not there. Any         16-bit library. If the name is empty, that is, if the closing parenthe-
6217         number of these verbs may occur in a pattern.         sis immediately follows the colon, the effect is as if the  colon  were
6218           not there. Any number of these verbs may occur in a pattern.
6219    
6220     Optimizations that affect backtracking verbs     Optimizations that affect backtracking verbs
6221    
6222         PCRE contains some optimizations that are used to speed up matching  by         PCRE  contains some optimizations that are used to speed up matching by
6223         running some checks at the start of each match attempt. For example, it         running some checks at the start of each match attempt. For example, it
6224         may know the minimum length of matching subject, or that  a  particular         may  know  the minimum length of matching subject, or that a particular
6225         character  must  be present. When one of these optimizations suppresses         character must be present. When one of these  optimizations  suppresses
6226         the running of a match, any included backtracking verbs  will  not,  of         the  running  of  a match, any included backtracking verbs will not, of
6227         course, be processed. You can suppress the start-of-match optimizations         course, be processed. You can suppress the start-of-match optimizations
6228         by setting the PCRE_NO_START_OPTIMIZE  option  when  calling  pcre_com-         by  setting  the  PCRE_NO_START_OPTIMIZE  option when calling pcre_com-
6229         pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).         pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).
6230         There is more discussion of this option in the section entitled "Option         There is more discussion of this option in the section entitled "Option
6231         bits for pcre_exec()" in the pcreapi documentation.         bits for pcre_exec()" in the pcreapi documentation.
6232    
6233         Experiments  with  Perl  suggest that it too has similar optimizations,         Experiments with Perl suggest that it too  has  similar  optimizations,
6234         sometimes leading to anomalous results.         sometimes leading to anomalous results.
6235    
6236     Verbs that act immediately     Verbs that act immediately
6237    
6238         The following verbs act as soon as they are encountered. They  may  not         The  following  verbs act as soon as they are encountered. They may not
6239         be followed by a name.         be followed by a name.
6240    
6241            (*ACCEPT)            (*ACCEPT)
6242    
6243         This  verb causes the match to end successfully, skipping the remainder         This verb causes the match to end successfully, skipping the  remainder
6244         of the pattern. However, when it is inside a subpattern that is  called         of  the pattern. However, when it is inside a subpattern that is called
6245         as  a  subroutine, only that subpattern is ended successfully. Matching         as a subroutine, only that subpattern is ended  successfully.  Matching
6246         then continues at the outer level. If  (*ACCEPT)  is  inside  capturing         then  continues  at  the  outer level. If (*ACCEPT) is inside capturing
6247         parentheses, the data so far is captured. For example:         parentheses, the data so far is captured. For example:
6248    
6249           A((?:A|B(*ACCEPT)|C)D)           A((?:A|B(*ACCEPT)|C)D)
6250    
6251         This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-         This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
6252         tured by the outer parentheses.         tured by the outer parentheses.
6253    
6254           (*FAIL) or (*F)           (*FAIL) or (*F)
6255    
6256         This verb causes a matching failure, forcing backtracking to occur.  It         This  verb causes a matching failure, forcing backtracking to occur. It
6257         is  equivalent to (?!) but easier to read. The Perl documentation notes         is equivalent to (?!) but easier to read. The Perl documentation  notes
6258         that it is probably useful only when combined  with  (?{})  or  (??{}).         that  it  is  probably  useful only when combined with (?{}) or (??{}).
6259         Those  are,  of course, Perl features that are not present in PCRE. The         Those are, of course, Perl features that are not present in  PCRE.  The
6260         nearest equivalent is the callout feature, as for example in this  pat-         nearest  equivalent is the callout feature, as for example in this pat-
6261         tern:         tern:
6262    
6263           a+(?C)(*FAIL)           a+(?C)(*FAIL)
6264    
6265         A  match  with the string "aaaa" always fails, but the callout is taken         A match with the string "aaaa" always fails, but the callout  is  taken
6266         before each backtrack happens (in this example, 10 times).         before each backtrack happens (in this example, 10 times).
6267    
6268     Recording which path was taken     Recording which path was taken
6269    
6270         There is one verb whose main purpose  is  to  track  how  a  match  was         There  is  one  verb  whose  main  purpose  is to track how a match was
6271         arrived  at,  though  it  also  has a secondary use in conjunction with         arrived at, though it also has a  secondary  use  in  conjunction  with
6272         advancing the match starting point (see (*SKIP) below).         advancing the match starting point (see (*SKIP) below).
6273    
6274           (*MARK:NAME) or (*:NAME)           (*MARK:NAME) or (*:NAME)
6275    
6276         A name is always  required  with  this  verb.  There  may  be  as  many         A  name  is  always  required  with  this  verb.  There  may be as many
6277         instances  of  (*MARK) as you like in a pattern, and their names do not         instances of (*MARK) as you like in a pattern, and their names  do  not
6278         have to be unique.         have to be unique.
6279    
6280         When a match succeeds, the name of the last-encountered (*MARK) on  the         When  a match succeeds, the name of the last-encountered (*MARK) on the
6281         matching  path is passed back to the caller as described in the section         matching path is passed back to the caller as described in the  section
6282         entitled "Extra data for pcre_exec()"  in  the  pcreapi  documentation.         entitled  "Extra  data  for  pcre_exec()" in the pcreapi documentation.
6283         Here  is  an example of pcretest output, where the /K modifier requests         Here is an example of pcretest output, where the /K  modifier  requests
6284         the retrieval and outputting of (*MARK) data:         the retrieval and outputting of (*MARK) data:
6285    
6286             re> /X(*MARK:A)Y|X(*MARK:B)Z/K             re> /X(*MARK:A)Y|X(*MARK:B)Z/K
# Line 6275  BACKTRACKING CONTROL Line 6292  BACKTRACKING CONTROL
6292           MK: B           MK: B
6293    
6294         The (*MARK) name is tagged with "MK:" in this output, and in this exam-         The (*MARK) name is tagged with "MK:" in this output, and in this exam-
6295         ple  it indicates which of the two alternatives matched. This is a more         ple it indicates which of the two alternatives matched. This is a  more
6296         efficient way of obtaining this information than putting each  alterna-         efficient  way of obtaining this information than putting each alterna-
6297         tive in its own capturing parentheses.         tive in its own capturing parentheses.
6298    
6299         If (*MARK) is encountered in a positive assertion, its name is recorded         If (*MARK) is encountered in a positive assertion, its name is recorded
6300         and passed back if it is the last-encountered. This does not happen for         and passed back if it is the last-encountered. This does not happen for
6301         negative assertions.         negative assertions.
6302    
6303         After  a  partial match or a failed match, the name of the last encoun-         After a partial match or a failed match, the name of the  last  encoun-
6304         tered (*MARK) in the entire match process is returned. For example:         tered (*MARK) in the entire match process is returned. For example:
6305    
6306             re> /X(*MARK:A)Y|X(*MARK:B)Z/K             re> /X(*MARK:A)Y|X(*MARK:B)Z/K
6307           data> XP           data> XP
6308           No match, mark = B           No match, mark = B
6309    
6310         Note that in this unanchored example the  mark  is  retained  from  the         Note  that  in  this  unanchored  example the mark is retained from the
6311         match attempt that started at the letter "X" in the subject. Subsequent         match attempt that started at the letter "X" in the subject. Subsequent
6312         match attempts starting at "P" and then with an empty string do not get         match attempts starting at "P" and then with an empty string do not get
6313         as far as the (*MARK) item, but nevertheless do not reset it.         as far as the (*MARK) item, but nevertheless do not reset it.
6314    
6315         If  you  are  interested  in  (*MARK)  values after failed matches, you         If you are interested in  (*MARK)  values  after  failed  matches,  you
6316         should probably set the PCRE_NO_START_OPTIMIZE option  (see  above)  to         should  probably  set  the PCRE_NO_START_OPTIMIZE option (see above) to
6317         ensure that the match is always attempted.         ensure that the match is always attempted.
6318    
6319     Verbs that act after backtracking     Verbs that act after backtracking
6320    
6321         The following verbs do nothing when they are encountered. Matching con-         The following verbs do nothing when they are encountered. Matching con-
6322         tinues with what follows, but if there is no subsequent match,  causing         tinues  with what follows, but if there is no subsequent match, causing
6323         a  backtrack  to  the  verb, a failure is forced. That is, backtracking         a backtrack to the verb, a failure is  forced.  That  is,  backtracking
6324         cannot pass to the left of the verb. However, when one of  these  verbs         cannot  pass  to the left of the verb. However, when one of these verbs
6325         appears  inside  an atomic group, its effect is confined to that group,         appears inside an atomic group, its effect is confined to  that  group,
6326         because once the group has been matched, there is never any  backtrack-         because  once the group has been matched, there is never any backtrack-
6327         ing  into  it.  In  this situation, backtracking can "jump back" to the         ing into it. In this situation, backtracking can  "jump  back"  to  the
6328         left of the entire atomic group. (Remember also, as stated above,  that         left  of the entire atomic group. (Remember also, as stated above, that
6329         this localization also applies in subroutine calls and assertions.)         this localization also applies in subroutine calls and assertions.)
6330    
6331         These  verbs  differ  in exactly what kind of failure occurs when back-         These verbs differ in exactly what kind of failure  occurs  when  back-
6332         tracking reaches them.         tracking reaches them.
6333    
6334           (*COMMIT)           (*COMMIT)
6335    
6336         This verb, which may not be followed by a name, causes the whole  match         This  verb, which may not be followed by a name, causes the whole match
6337         to fail outright if the rest of the pattern does not match. Even if the         to fail outright if the rest of the pattern does not match. Even if the
6338         pattern is unanchored, no further attempts to find a match by advancing         pattern is unanchored, no further attempts to find a match by advancing
6339         the  starting  point  take  place.  Once  (*COMMIT)  has  been  passed,         the  starting  point  take  place.  Once  (*COMMIT)  has  been  passed,
6340         pcre_exec() is committed to finding a match  at  the  current  starting         pcre_exec()  is  committed  to  finding a match at the current starting
6341         point, or not at all. For example:         point, or not at all. For example:
6342    
6343           a+(*COMMIT)b           a+(*COMMIT)b
6344    
6345         This  matches  "xxaab" but not "aacaab". It can be thought of as a kind         This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
6346         of dynamic anchor, or "I've started, so I must finish." The name of the         of dynamic anchor, or "I've started, so I must finish." The name of the
6347         most  recently passed (*MARK) in the path is passed back when (*COMMIT)         most recently passed (*MARK) in the path is passed back when  (*COMMIT)
6348         forces a match failure.         forces a match failure.
6349    
6350         Note that (*COMMIT) at the start of a pattern is not  the  same  as  an         Note  that  (*COMMIT)  at  the start of a pattern is not the same as an
6351         anchor,  unless  PCRE's start-of-match optimizations are turned off, as         anchor, unless PCRE's start-of-match optimizations are turned  off,  as
6352         shown in this pcretest example:         shown in this pcretest example:
6353    
6354             re> /(*COMMIT)abc/             re> /(*COMMIT)abc/
# Line 6340  BACKTRACKING CONTROL Line 6357  BACKTRACKING CONTROL
6357           xyzabc\Y           xyzabc\Y
6358           No match           No match
6359    
6360         PCRE knows that any match must start  with  "a",  so  the  optimization         PCRE  knows  that  any  match  must start with "a", so the optimization
6361         skips  along the subject to "a" before running the first match attempt,         skips along the subject to "a" before running the first match  attempt,
6362         which succeeds. When the optimization is disabled by the \Y  escape  in         which  succeeds.  When the optimization is disabled by the \Y escape in
6363         the second subject, the match starts at "x" and so the (*COMMIT) causes         the second subject, the match starts at "x" and so the (*COMMIT) causes
6364         it to fail without trying any other starting points.         it to fail without trying any other starting points.
6365    
6366           (*PRUNE) or (*PRUNE:NAME)           (*PRUNE) or (*PRUNE:NAME)
6367    
6368         This verb causes the match to fail at the current starting position  in         This  verb causes the match to fail at the current starting position in
6369         the  subject  if the rest of the pattern does not match. If the pattern         the subject if the rest of the pattern does not match. If  the  pattern
6370         is unanchored, the normal "bumpalong"  advance  to  the  next  starting         is  unanchored,  the  normal  "bumpalong"  advance to the next starting
6371         character  then happens. Backtracking can occur as usual to the left of         character then happens. Backtracking can occur as usual to the left  of
6372         (*PRUNE), before it is reached,  or  when  matching  to  the  right  of         (*PRUNE),  before  it  is  reached,  or  when  matching to the right of
6373         (*PRUNE),  but  if  there is no match to the right, backtracking cannot         (*PRUNE), but if there is no match to the  right,  backtracking  cannot
6374         cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an  alter-         cross  (*PRUNE). In simple cases, the use of (*PRUNE) is just an alter-
6375         native  to an atomic group or possessive quantifier, but there are some         native to an atomic group or possessive quantifier, but there are  some
6376         uses of (*PRUNE) that cannot be expressed in any other way.  The behav-         uses of (*PRUNE) that cannot be expressed in any other way.  The behav-
6377         iour  of  (*PRUNE:NAME)  is  the  same  as  (*MARK:NAME)(*PRUNE). In an         iour of (*PRUNE:NAME)  is  the  same  as  (*MARK:NAME)(*PRUNE).  In  an
6378         anchored pattern (*PRUNE) has the same effect as (*COMMIT).         anchored pattern (*PRUNE) has the same effect as (*COMMIT).
6379    
6380           (*SKIP)           (*SKIP)
6381    
6382         This verb, when given without a name, is like (*PRUNE), except that  if         This  verb, when given without a name, is like (*PRUNE), except that if
6383         the  pattern  is unanchored, the "bumpalong" advance is not to the next         the pattern is unanchored, the "bumpalong" advance is not to  the  next
6384         character, but to the position in the subject where (*SKIP) was encoun-         character, but to the position in the subject where (*SKIP) was encoun-
6385         tered.  (*SKIP)  signifies that whatever text was matched leading up to         tered. (*SKIP) signifies that whatever text was matched leading  up  to
6386         it cannot be part of a successful match. Consider:         it cannot be part of a successful match. Consider:
6387    
6388           a+(*SKIP)b           a+(*SKIP)b
6389    
6390         If the subject is "aaaac...",  after  the  first  match  attempt  fails         If  the  subject  is  "aaaac...",  after  the first match attempt fails
6391         (starting  at  the  first  character in the string), the starting point         (starting at the first character in the  string),  the  starting  point
6392         skips on to start the next attempt at "c". Note that a possessive quan-         skips on to start the next attempt at "c". Note that a possessive quan-
6393         tifer  does not have the same effect as this example; although it would         tifer does not have the same effect as this example; although it  would
6394         suppress backtracking  during  the  first  match  attempt,  the  second         suppress  backtracking  during  the  first  match  attempt,  the second
6395         attempt  would  start at the second character instead of skipping on to         attempt would start at the second character instead of skipping  on  to
6396         "c".         "c".
6397    
6398           (*SKIP:NAME)           (*SKIP:NAME)
6399    
6400         When (*SKIP) has an associated name, its behaviour is modified. If  the         When  (*SKIP) has an associated name, its behaviour is modified. If the
6401         following pattern fails to match, the previous path through the pattern         following pattern fails to match, the previous path through the pattern
6402         is searched for the most recent (*MARK) that has the same name. If  one         is  searched for the most recent (*MARK) that has the same name. If one
6403         is  found, the "bumpalong" advance is to the subject position that cor-         is found, the "bumpalong" advance is to the subject position that  cor-
6404         responds to that (*MARK) instead of to where (*SKIP)  was  encountered.         responds  to  that (*MARK) instead of to where (*SKIP) was encountered.
6405         If no (*MARK) with a matching name is found, the (*SKIP) is ignored.         If no (*MARK) with a matching name is found, the (*SKIP) is ignored.
6406    
6407           (*THEN) or (*THEN:NAME)           (*THEN) or (*THEN:NAME)
6408    
6409         This  verb  causes a skip to the next innermost alternative if the rest         This verb causes a skip to the next innermost alternative if  the  rest
6410         of the pattern does not match. That is, it cancels  pending  backtrack-         of  the  pattern does not match. That is, it cancels pending backtrack-
6411         ing,  but  only within the current alternative. Its name comes from the         ing, but only within the current alternative. Its name comes  from  the
6412         observation that it can be used for a pattern-based if-then-else block:         observation that it can be used for a pattern-based if-then-else block:
6413    
6414           ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...           ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
6415    
6416         If the COND1 pattern matches, FOO is tried (and possibly further  items         If  the COND1 pattern matches, FOO is tried (and possibly further items
6417         after  the  end  of the group if FOO succeeds); on failure, the matcher         after the end of the group if FOO succeeds); on  failure,  the  matcher
6418         skips to the second alternative and tries COND2,  without  backtracking         skips  to  the second alternative and tries COND2, without backtracking
6419         into  COND1.  The  behaviour  of  (*THEN:NAME)  is  exactly the same as         into COND1. The behaviour  of  (*THEN:NAME)  is  exactly  the  same  as
6420         (*MARK:NAME)(*THEN).  If (*THEN) is not inside an alternation, it  acts         (*MARK:NAME)(*THEN).   If (*THEN) is not inside an alternation, it acts
6421         like (*PRUNE).         like (*PRUNE).
6422    
6423         Note  that  a  subpattern that does not contain a | character is just a         Note that a subpattern that does not contain a | character  is  just  a
6424         part of the enclosing alternative; it is not a nested alternation  with         part  of the enclosing alternative; it is not a nested alternation with
6425         only  one alternative. The effect of (*THEN) extends beyond such a sub-         only one alternative. The effect of (*THEN) extends beyond such a  sub-
6426         pattern to the enclosing alternative. Consider this pattern,  where  A,         pattern  to  the enclosing alternative. Consider this pattern, where A,
6427         B, etc. are complex pattern fragments that do not contain any | charac-         B, etc. are complex pattern fragments that do not contain any | charac-
6428         ters at this level:         ters at this level:
6429    
6430           A (B(*THEN)C) | D           A (B(*THEN)C) | D
6431    
6432         If A and B are matched, but there is a failure in C, matching does  not         If  A and B are matched, but there is a failure in C, matching does not
6433         backtrack into A; instead it moves to the next alternative, that is, D.         backtrack into A; instead it moves to the next alternative, that is, D.
6434         However, if the subpattern containing (*THEN) is given an  alternative,         However,  if the subpattern containing (*THEN) is given an alternative,
6435         it behaves differently:         it behaves differently:
6436    
6437           A (B(*THEN)C | (*FAIL)) | D           A (B(*THEN)C | (*FAIL)) | D
6438    
6439         The  effect of (*THEN) is now confined to the inner subpattern. After a         The effect of (*THEN) is now confined to the inner subpattern. After  a
6440         failure in C, matching moves to (*FAIL), which causes the whole subpat-         failure in C, matching moves to (*FAIL), which causes the whole subpat-
6441         tern  to  fail  because  there are no more alternatives to try. In this         tern to fail because there are no more alternatives  to  try.  In  this
6442         case, matching does now backtrack into A.         case, matching does now backtrack into A.
6443    
6444         Note also that a conditional subpattern is not considered as having two         Note also that a conditional subpattern is not considered as having two
6445         alternatives,  because  only  one  is  ever used. In other words, the |         alternatives, because only one is ever used.  In  other  words,  the  |
6446         character in a conditional subpattern has a different meaning. Ignoring         character in a conditional subpattern has a different meaning. Ignoring
6447         white space, consider:         white space, consider:
6448    
6449           ^.*? (?(?=a) a | b(*THEN)c )           ^.*? (?(?=a) a | b(*THEN)c )
6450    
6451         If  the  subject  is  "ba", this pattern does not match. Because .*? is         If the subject is "ba", this pattern does not  match.  Because  .*?  is
6452         ungreedy, it initially matches zero  characters.  The  condition  (?=a)         ungreedy,  it  initially  matches  zero characters. The condition (?=a)
6453         then  fails,  the  character  "b"  is  matched, but "c" is not. At this         then fails, the character "b" is matched,  but  "c"  is  not.  At  this
6454         point, matching does not backtrack to .*? as might perhaps be  expected         point,  matching does not backtrack to .*? as might perhaps be expected
6455         from  the  presence  of  the | character. The conditional subpattern is         from the presence of the | character.  The  conditional  subpattern  is
6456         part of the single alternative that comprises the whole pattern, and so         part of the single alternative that comprises the whole pattern, and so
6457         the  match  fails.  (If  there was a backtrack into .*?, allowing it to         the match fails. (If there was a backtrack into  .*?,  allowing  it  to
6458         match "b", the match would succeed.)         match "b", the match would succeed.)
6459    
6460         The verbs just described provide four different "strengths" of  control         The  verbs just described provide four different "strengths" of control
6461         when subsequent matching fails. (*THEN) is the weakest, carrying on the         when subsequent matching fails. (*THEN) is the weakest, carrying on the
6462         match at the next alternative. (*PRUNE) comes next, failing  the  match         match  at  the next alternative. (*PRUNE) comes next, failing the match
6463         at  the  current starting position, but allowing an advance to the next         at the current starting position, but allowing an advance to  the  next
6464         character (for an unanchored pattern). (*SKIP) is similar, except  that         character  (for an unanchored pattern). (*SKIP) is similar, except that
6465         the advance may be more than one character. (*COMMIT) is the strongest,         the advance may be more than one character. (*COMMIT) is the strongest,
6466         causing the entire match to fail.         causing the entire match to fail.
6467    
# Line 6454  BACKTRACKING CONTROL Line 6471  BACKTRACKING CONTROL
6471    
6472           (A(*COMMIT)B(*THEN)C|D)           (A(*COMMIT)B(*THEN)C|D)
6473    
6474         Once A has matched, PCRE is committed to this  match,  at  the  current         Once  A  has  matched,  PCRE is committed to this match, at the current
6475         starting  position. If subsequently B matches, but C does not, the nor-         starting position. If subsequently B matches, but C does not, the  nor-
6476         mal (*THEN) action of trying the next alternative (that is, D) does not         mal (*THEN) action of trying the next alternative (that is, D) does not
6477         happen because (*COMMIT) overrides.         happen because (*COMMIT) overrides.
6478    
6479    
6480  SEE ALSO  SEE ALSO
6481    
6482         pcreapi(3),  pcrecallout(3),  pcrematching(3),  pcresyntax(3), pcre(3),         pcreapi(3), pcrecallout(3),  pcrematching(3),  pcresyntax(3),  pcre(3),
6483         pcre16(3).         pcre16(3).
6484    
6485    
# Line 6475  AUTHOR Line 6492  AUTHOR
6492    
6493  REVISION  REVISION
6494    
6495         Last updated: 14 April 2012         Last updated: 01 June 2012
6496         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
6497  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
6498    
6499    
6500  PCRESYNTAX(3)                                                    PCRESYNTAX(3)  PCRESYNTAX(3)                                                    PCRESYNTAX(3)
6501    
6502    
# Line 6505  CHARACTERS Line 6522  CHARACTERS
6522           \a         alarm, that is, the BEL character (hex 07)           \a         alarm, that is, the BEL character (hex 07)
6523           \cx        "control-x", where x is any ASCII character           \cx        "control-x", where x is any ASCII character
6524           \e         escape (hex 1B)           \e         escape (hex 1B)
6525           \f         formfeed (hex 0C)           \f         form feed (hex 0C)
6526           \n         newline (hex 0A)           \n         newline (hex 0A)
6527           \r         carriage return (hex 0D)           \r         carriage return (hex 0D)
6528           \t         tab (hex 09)           \t         tab (hex 09)
# Line 6521  CHARACTER TYPES Line 6538  CHARACTER TYPES
6538           \C         one data unit, even in UTF mode (best avoided)           \C         one data unit, even in UTF mode (best avoided)
6539           \d         a decimal digit           \d         a decimal digit
6540           \D         a character that is not a decimal digit           \D         a character that is not a decimal digit
6541           \h         a horizontal whitespace character           \h         a horizontal white space character
6542           \H         a character that is not a horizontal whitespace character           \H         a character that is not a horizontal white space character
6543           \N         a character that is not a newline           \N         a character that is not a newline
6544           \p{xx}     a character with the xx property           \p{xx}     a character with the xx property
6545           \P{xx}     a character without the xx property           \P{xx}     a character without the xx property
6546           \R         a newline sequence           \R         a newline sequence
6547           \s         a whitespace character           \s         a white space character
6548           \S         a character that is not a whitespace character           \S         a character that is not a white space character
6549           \v         a vertical whitespace character           \v         a vertical white space character
6550           \V         a character that is not a vertical whitespace character           \V         a character that is not a vertical white space character
6551           \w         a "word" character           \w         a "word" character
6552           \W         a "non-word" character           \W         a "non-word" character
6553           \X         an extended Unicode sequence           \X         an extended Unicode sequence
# Line 6634  CHARACTER CLASSES Line 6651  CHARACTER CLASSES
6651           lower       lower case letter           lower       lower case letter
6652           print       printing, including space           print       printing, including space
6653           punct       printing, excluding alphanumeric           punct       printing, excluding alphanumeric
6654           space       whitespace           space       white space
6655           upper       upper case letter           upper       upper case letter
6656           word        same as \w           word        same as \w
6657           xdigit      hexadecimal digit           xdigit      hexadecimal digit
# Line 6856  REVISION Line 6873  REVISION
6873         Last updated: 10 January 2012         Last updated: 10 January 2012
6874         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
6875  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
6876    
6877    
6878  PCREUNICODE(3)                                                  PCREUNICODE(3)  PCREUNICODE(3)                                                  PCREUNICODE(3)
6879    
6880    
# Line 6935  UNICODE PROPERTY SUPPORT Line 6952  UNICODE PROPERTY SUPPORT
6952    
6953         If an invalid UTF-8 string is passed to PCRE, an error return is given.         If an invalid UTF-8 string is passed to PCRE, an error return is given.
6954         At compile time, the only additional information is the offset  to  the         At compile time, the only additional information is the offset  to  the
6955         first  byte of the failing character. The runtime functions pcre_exec()         first byte of the failing character. The run-time functions pcre_exec()
6956         and pcre_dfa_exec() also pass back this information, as well as a  more         and pcre_dfa_exec() also pass back this information, as well as a  more
6957         detailed  reason  code if the caller has provided memory in which to do         detailed  reason  code if the caller has provided memory in which to do
6958         this.         this.
# Line 6976  UNICODE PROPERTY SUPPORT Line 6993  UNICODE PROPERTY SUPPORT
6993    
6994         If an invalid UTF-16 string is passed  to  PCRE,  an  error  return  is         If an invalid UTF-16 string is passed  to  PCRE,  an  error  return  is
6995         given.  At  compile time, the only additional information is the offset         given.  At  compile time, the only additional information is the offset
6996         to the first data unit of the failing character. The runtime  functions         to the first data unit of the failing character. The run-time functions
6997         pcre16_exec() and pcre16_dfa_exec() also pass back this information, as         pcre16_exec() and pcre16_dfa_exec() also pass back this information, as
6998         well as a more detailed reason code if the caller has  provided  memory         well as a more detailed reason code if the caller has  provided  memory
6999         in which to do this.         in which to do this.
# Line 7030  UNICODE PROPERTY SUPPORT Line 7047  UNICODE PROPERTY SUPPORT
7047         7.  Similarly,  characters that match the POSIX named character classes         7.  Similarly,  characters that match the POSIX named character classes
7048         are all low-valued characters, unless the PCRE_UCP option is set.         are all low-valued characters, unless the PCRE_UCP option is set.
7049    
7050         8. However, the horizontal and  vertical  whitespace  matching  escapes         8. However, the horizontal and vertical white  space  matching  escapes
7051         (\h,  \H,  \v, and \V) do match all the appropriate Unicode characters,         (\h,  \H,  \v, and \V) do match all the appropriate Unicode characters,
7052         whether or not PCRE_UCP is set.         whether or not PCRE_UCP is set.
7053    
# Line 7057  REVISION Line 7074  REVISION
7074         Last updated: 14 April 2012         Last updated: 14 April 2012
7075         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7076  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7077    
7078    
7079  PCREJIT(3)                                                          PCREJIT(3)  PCREJIT(3)                                                          PCREJIT(3)
7080    
7081    
# Line 7209  UNSUPPORTED OPTIONS AND PATTERN ITEMS Line 7226  UNSUPPORTED OPTIONS AND PATTERN ITEMS
7226    
7227           \C             match a single byte; not supported in UTF-8 mode           \C             match a single byte; not supported in UTF-8 mode
7228           (?Cn)          callouts           (?Cn)          callouts
7229           (*COMMIT)      )           (*PRUNE)       )
7230           (*MARK)        )           (*SKIP)        ) backtracking control verbs
          (*PRUNE)       ) the backtracking control verbs  
          (*SKIP)        )  
7231           (*THEN)        )           (*THEN)        )
7232    
7233         Support for some of these may be added in future.         Support for some of these may be added in future.
# Line 7441  AUTHOR Line 7456  AUTHOR
7456    
7457  REVISION  REVISION
7458    
7459         Last updated: 14 April 2012         Last updated: 04 May 2012
7460         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7461  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7462    
7463    
7464  PCREPARTIAL(3)                                                  PCREPARTIAL(3)  PCREPARTIAL(3)                                                  PCREPARTIAL(3)
7465    
7466    
# Line 7894  REVISION Line 7909  REVISION
7909         Last updated: 24 February 2012         Last updated: 24 February 2012
7910         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7911  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7912    
7913    
7914  PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)  PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)
7915    
7916    
# Line 8029  REVISION Line 8044  REVISION
8044         Last updated: 10 January 2012         Last updated: 10 January 2012
8045         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8046  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8047    
8048    
8049  PCREPERFORM(3)                                                  PCREPERFORM(3)  PCREPERFORM(3)                                                  PCREPERFORM(3)
8050    
8051    
# Line 8199  REVISION Line 8214  REVISION
8214         Last updated: 09 January 2012         Last updated: 09 January 2012
8215         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8216  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8217    
8218    
8219  PCREPOSIX(3)                                                      PCREPOSIX(3)  PCREPOSIX(3)                                                      PCREPOSIX(3)
8220    
8221    
# Line 8463  REVISION Line 8478  REVISION
8478         Last updated: 09 January 2012         Last updated: 09 January 2012
8479         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8480  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8481    
8482    
8483  PCRECPP(3)                                                          PCRECPP(3)  PCRECPP(3)                                                          PCRECPP(3)
8484    
8485    
# Line 8641  PASSING MODIFIERS TO THE REGULAR EXPRESS Line 8656  PASSING MODIFIERS TO THE REGULAR EXPRESS
8656            PCRE_DOTALL           dot matches newlines        /s            PCRE_DOTALL           dot matches newlines        /s
8657            PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A            PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
8658            PCRE_EXTRA            strict escape parsing       N/A            PCRE_EXTRA            strict escape parsing       N/A
8659            PCRE_EXTENDED         ignore whitespaces          /x            PCRE_EXTENDED         ignore white spaces         /x
8660            PCRE_UTF8             handles UTF8 chars          built-in            PCRE_UTF8             handles UTF8 chars          built-in
8661            PCRE_UNGREEDY         reverses * and *?           N/A            PCRE_UNGREEDY         reverses * and *?           N/A
8662            PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)            PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
# Line 8805  REVISION Line 8820  REVISION
8820    
8821         Last updated: 08 January 2012         Last updated: 08 January 2012
8822  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8823    
8824    
8825  PCRESAMPLE(3)                                                    PCRESAMPLE(3)  PCRESAMPLE(3)                                                    PCRESAMPLE(3)
8826    
8827    
# Line 8929  SIZE AND OTHER LIMITATIONS Line 8944  SIZE AND OTHER LIMITATIONS
8944         The maximum length of name for a named subpattern is 32 characters, and         The maximum length of name for a named subpattern is 32 characters, and
8945         the maximum number of named subpatterns is 10000.         the maximum number of named subpatterns is 10000.
8946    
8947           The maximum length of a  name  in  a  (*MARK),  (*PRUNE),  (*SKIP),  or
8948           (*THEN)  verb  is  255  for  the 8-bit library and 65535 for the 16-bit
8949           library.
8950    
8951         The maximum length of a subject string is the largest  positive  number         The maximum length of a subject string is the largest  positive  number
8952         that  an integer variable can hold. However, when using the traditional         that  an integer variable can hold. However, when using the traditional
8953         matching function, PCRE uses recursion to handle subpatterns and indef-         matching function, PCRE uses recursion to handle subpatterns and indef-
# Line 8946  AUTHOR Line 8965  AUTHOR
8965    
8966  REVISION  REVISION
8967    
8968         Last updated: 08 January 2012         Last updated: 04 May 2012
8969         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8970  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8971    
8972    
8973  PCRESTACK(3)                                                      PCRESTACK(3)  PCRESTACK(3)                                                      PCRESTACK(3)
8974    
8975    
# Line 9134  REVISION Line 9153  REVISION
9153         Last updated: 21 January 2012         Last updated: 21 January 2012
9154         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
9155  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
9156    
9157    

Legend:
Removed from v.974  
changed lines
  Added in v.975

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12