/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 469 by ph10, Mon Oct 19 14:38:48 2009 UTC revision 487 by ph10, Wed Jan 6 10:26:55 2010 UTC
# Line 6958  PCRE DISCUSSION OF STACK USAGE Line 6958  PCRE DISCUSSION OF STACK USAGE
6958         restarted instead.         restarted instead.
6959    
6960         The pcre_dfa_exec() function operates in an entirely different way, and         The pcre_dfa_exec() function operates in an entirely different way, and
6961         hardly uses recursion at all. The limit on its complexity is the amount         uses recursion only when there is a  regular  expression  recursion  or
6962         of  workspace  it  is  given.  The comments that follow do NOT apply to         subroutine  call in the pattern. This includes the processing of asser-
6963         pcre_dfa_exec(); they are relevant only for pcre_exec().         tion and "once-only" subpatterns, which  are  handled  like  subroutine
6964           calls.  Normally,  these are never very deep, and the limit on the com-
6965         You can set limits on the number of times that match() is called,  both         plexity of pcre_dfa_exec() is controlled by the amount of workspace  it
6966         in  total  and  recursively. If the limit is exceeded, an error occurs.         is  given. However, it is possible to write patterns with runaway infi-
6967         For details, see the section on  extra  data  for  pcre_exec()  in  the         nite recursions; such patterns will cause pcre_dfa_exec() to run out of
6968         pcreapi documentation.         stack. At present, there is no protection against this.
6969    
6970         Each  time  that match() is actually called recursively, it uses memory         The comments that follow do NOT apply to pcre_dfa_exec(); they are rel-
6971         from the process stack. For certain kinds of  pattern  and  data,  very         evant only for pcre_exec().
6972         large  amounts of stack may be needed, despite the recognition of "tail  
6973         recursion".  You can often reduce the amount of recursion,  and  there-     Reducing pcre_exec()'s stack usage
6974         fore  the  amount of stack used, by modifying the pattern that is being  
6975           Each time that match() is actually called recursively, it  uses  memory
6976           from  the  process  stack.  For certain kinds of pattern and data, very
6977           large amounts of stack may be needed, despite the recognition of  "tail
6978           recursion".   You  can often reduce the amount of recursion, and there-
6979           fore the amount of stack used, by modifying the pattern that  is  being
6980         matched. Consider, for example, this pattern:         matched. Consider, for example, this pattern:
6981    
6982           ([^<]|<(?!inet))+           ([^<]|<(?!inet))+
6983    
6984         It matches from wherever it starts until it encounters "<inet"  or  the         It  matches  from wherever it starts until it encounters "<inet" or the
6985         end  of  the  data,  and is the kind of pattern that might be used when         end of the data, and is the kind of pattern that  might  be  used  when
6986         processing an XML file. Each iteration of the outer parentheses matches         processing an XML file. Each iteration of the outer parentheses matches
6987         either  one  character that is not "<" or a "<" that is not followed by         either one character that is not "<" or a "<" that is not  followed  by
6988         "inet". However, each time a  parenthesis  is  processed,  a  recursion         "inet".  However,  each  time  a  parenthesis is processed, a recursion
6989         occurs, so this formulation uses a stack frame for each matched charac-         occurs, so this formulation uses a stack frame for each matched charac-
6990         ter. For a long string, a lot of stack is required. Consider  now  this         ter.  For  a long string, a lot of stack is required. Consider now this
6991         rewritten pattern, which matches exactly the same strings:         rewritten pattern, which matches exactly the same strings:
6992    
6993           ([^<]++|<(?!inet))+           ([^<]++|<(?!inet))+
6994    
6995         This  uses very much less stack, because runs of characters that do not         This uses very much less stack, because runs of characters that do  not
6996         contain "<" are "swallowed" in one item inside the parentheses.  Recur-         contain  "<" are "swallowed" in one item inside the parentheses. Recur-
6997         sion  happens  only when a "<" character that is not followed by "inet"         sion happens only when a "<" character that is not followed  by  "inet"
6998         is encountered (and we assume this is relatively  rare).  A  possessive         is  encountered  (and  we assume this is relatively rare). A possessive
6999         quantifier  is  used  to stop any backtracking into the runs of non-"<"         quantifier is used to stop any backtracking into the  runs  of  non-"<"
7000         characters, but that is not related to stack usage.         characters, but that is not related to stack usage.
7001    
7002         This example shows that one way of avoiding stack problems when  match-         This  example shows that one way of avoiding stack problems when match-
7003         ing long subject strings is to write repeated parenthesized subpatterns         ing long subject strings is to write repeated parenthesized subpatterns
7004         to match more than one character whenever possible.         to match more than one character whenever possible.
7005    
7006     Compiling PCRE to use heap instead of stack     Compiling PCRE to use heap instead of stack for pcre_exec()
7007    
7008           In  environments  where  stack memory is constrained, you might want to
7009           compile PCRE to use heap memory instead of stack for remembering  back-
7010           up  points  when  pcre_exec()  is running. This makes it run a lot more
7011           slowly, however.  Details of how to do this are given in the  pcrebuild
7012           documentation. When built in this way, instead of using the stack, PCRE
7013           obtains and frees memory by calling the functions that are  pointed  to
7014           by  the  pcre_stack_malloc  and  pcre_stack_free variables. By default,
7015           these point to malloc() and free(), but you can replace the pointers to
7016           cause  PCRE to use your own functions. Since the block sizes are always
7017           the same, and are always freed in reverse order, it may be possible  to
7018           implement  customized  memory handlers that are more efficient than the
7019           standard functions.
7020    
7021         In environments where stack memory is constrained, you  might  want  to     Limiting pcre_exec()'s stack usage
7022         compile  PCRE to use heap memory instead of stack for remembering back-  
7023         up points. This makes it run a lot more slowly, however. Details of how         You can set limits on the number of times that match() is called,  both
7024         to do this are given in the pcrebuild documentation. When built in this         in  total  and recursively. If a limit is exceeded, pcre_exec() returns
7025         way, instead of using the stack, PCRE obtains and frees memory by call-         an error code. Setting suitable limits should prevent it  from  running
7026         ing  the  functions  that  are  pointed to by the pcre_stack_malloc and         out  of  stack.  The  default  values of the limits are very large, and
7027         pcre_stack_free variables. By default,  these  point  to  malloc()  and         unlikely ever to operate. They can be changed when PCRE is  built,  and
7028         free(),  but you can replace the pointers to cause PCRE to use your own         they  can  also be set when pcre_exec() is called. For details of these
7029         functions. Since the block sizes are always the same,  and  are  always         interfaces, see the pcrebuild documentation and the  section  on  extra
7030         freed in reverse order, it may be possible to implement customized mem-         data for pcre_exec() in the pcreapi documentation.
        ory handlers that are more efficient than the standard functions.  
   
    Limiting PCRE's stack usage  
   
        PCRE has an internal counter that can be used to  limit  the  depth  of  
        recursion,  and  thus cause pcre_exec() to give an error code before it  
        runs out of stack. By default, the limit is very  large,  and  unlikely  
        ever  to operate. It can be changed when PCRE is built, and it can also  
        be set when pcre_exec() is called. For details of these interfaces, see  
        the pcrebuild and pcreapi documentation.  
7031    
7032         As a very rough rule of thumb, you should reckon on about 500 bytes per         As a very rough rule of thumb, you should reckon on about 500 bytes per
7033         recursion. Thus, if you want to limit your  stack  usage  to  8Mb,  you         recursion. Thus, if you want to limit your  stack  usage  to  8Mb,  you
7034         should  set  the  limit at 16000 recursions. A 64Mb stack, on the other         should  set  the  limit at 16000 recursions. A 64Mb stack, on the other
7035         hand, can support around 128000 recursions. The pcretest  test  program         hand, can support around 128000 recursions.
7036         has a command line option (-S) that can be used to increase the size of  
7037         its stack.         In Unix-like environments, the pcretest test program has a command line
7038           option (-S) that can be used to increase the size of its stack. As long
7039           as the stack is large enough, another option (-M) can be used  to  find
7040           the  smallest  limits  that allow a particular pattern to match a given
7041           subject string. This is done by  calling  pcre_exec()  repeatedly  with
7042           different limits.
7043    
7044     Changing stack size in Unix-like systems     Changing stack size in Unix-like systems
7045    
7046         In Unix-like environments, there is not often a problem with the  stack         In  Unix-like environments, there is not often a problem with the stack
7047         unless  very  long  strings  are  involved, though the default limit on         unless very long strings are involved,  though  the  default  limit  on
7048         stack size varies from system to system. Values from 8Mb  to  64Mb  are         stack  size  varies  from system to system. Values from 8Mb to 64Mb are
7049         common. You can find your default limit by running the command:         common. You can find your default limit by running the command:
7050    
7051           ulimit -s           ulimit -s
7052    
7053         Unfortunately,  the  effect  of  running out of stack is often SIGSEGV,         Unfortunately, the effect of running out of  stack  is  often  SIGSEGV,
7054         though sometimes a more explicit error message is given. You  can  nor-         though  sometimes  a more explicit error message is given. You can nor-
7055         mally increase the limit on stack size by code such as this:         mally increase the limit on stack size by code such as this:
7056    
7057           struct rlimit rlim;           struct rlimit rlim;
# Line 7046  PCRE DISCUSSION OF STACK USAGE Line 7059  PCRE DISCUSSION OF STACK USAGE
7059           rlim.rlim_cur = 100*1024*1024;           rlim.rlim_cur = 100*1024*1024;
7060           setrlimit(RLIMIT_STACK, &rlim);           setrlimit(RLIMIT_STACK, &rlim);
7061    
7062         This  reads  the current limits (soft and hard) using getrlimit(), then         This reads the current limits (soft and hard) using  getrlimit(),  then
7063         attempts to increase the soft limit to  100Mb  using  setrlimit().  You         attempts  to  increase  the  soft limit to 100Mb using setrlimit(). You
7064         must do this before calling pcre_exec().         must do this before calling pcre_exec().
7065    
7066     Changing stack size in Mac OS X     Changing stack size in Mac OS X
7067    
7068         Using setrlimit(), as described above, should also work on Mac OS X. It         Using setrlimit(), as described above, should also work on Mac OS X. It
7069         is also possible to set a stack size when linking a program. There is a         is also possible to set a stack size when linking a program. There is a
7070         discussion   about   stack  sizes  in  Mac  OS  X  at  this  web  site:         discussion  about  stack  sizes  in  Mac  OS  X  at  this   web   site:
7071         http://developer.apple.com/qa/qa2005/qa1419.html.         http://developer.apple.com/qa/qa2005/qa1419.html.
7072    
7073    
# Line 7067  AUTHOR Line 7080  AUTHOR
7080    
7081  REVISION  REVISION
7082    
7083         Last updated: 09 July 2008         Last updated: 03 January 2010
7084         Copyright (c) 1997-2008 University of Cambridge.         Copyright (c) 1997-2010 University of Cambridge.
7085  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7086    
7087    

Legend:
Removed from v.469  
changed lines
  Added in v.487

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12