/[pcre]/code/trunk/doc/html/pcreperform.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcreperform.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 63 by nigel, Sat Feb 24 21:40:03 2007 UTC revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC
# Line 3  Line 3 
3  <title>pcreperform specification</title>  <title>pcreperform specification</title>
4  </head>  </head>
5  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6  This HTML document has been generated automatically from the original man page.  <h1>pcreperform man page</h1>
7  If there is any nonsense in it, please consult the man page, in case the  <p>
8  conversion went wrong.<br>  Return to the <a href="index.html">PCRE index page</a>.
9  <ul>  </p>
10  <li><a name="TOC1" href="#SEC1">PCRE PERFORMANCE</a>  <p>
11  </ul>  This page is part of the PCRE HTML documentation. It was generated automatically
12  <br><a name="SEC1" href="#TOC1">PCRE PERFORMANCE</a><br>  from the original man page. If there is any nonsense in it, please consult the
13    man page, in case the conversion went wrong.
14    <br>
15    <br><b>
16    PCRE PERFORMANCE
17    </b><br>
18  <P>  <P>
19  Certain items that may appear in regular expression patterns are more efficient  Certain items that may appear in regular expression patterns are more efficient
20  than others. It is more efficient to use a character class like [aeiou] than a  than others. It is more efficient to use a character class like [aeiou] than a
21  set of alternatives such as (a|e|i|o|u). In general, the simplest construction  set of alternatives such as (a|e|i|o|u). In general, the simplest construction
22  that provides the required behaviour is usually the most efficient. Jeffrey  that provides the required behaviour is usually the most efficient. Jeffrey
23  Friedl's book contains a lot of discussion about optimizing regular expressions  Friedl's book contains a lot of useful general discussion about optimizing
24  for efficient performance.  regular expressions for efficient performance. This document contains a few
25    observations about PCRE.
26    </P>
27    <P>
28    Using Unicode character properties (the \p, \P, and \X escapes) is slow,
29    because PCRE has to scan a structure that contains data for over fifteen
30    thousand characters whenever it needs a character's property. If you can find
31    an alternative pattern that does not use character properties, it will probably
32    be faster.
33  </P>  </P>
34  <P>  <P>
35  When a pattern begins with .* not in parentheses, or in parentheses that are  When a pattern begins with .* not in parentheses, or in parentheses that are
# Line 27  optimization, because the . metacharacte Line 40  optimization, because the . metacharacte
40  the subject string contains newlines, the pattern may match from the character  the subject string contains newlines, the pattern may match from the character
41  immediately following one of them instead of from the very start. For example,  immediately following one of them instead of from the very start. For example,
42  the pattern  the pattern
 </P>  
 <P>  
43  <pre>  <pre>
44    .*second    .*second
45  </PRE>  </pre>
 </P>  
 <P>  
46  matches the subject "first\nand second" (where \n stands for a newline  matches the subject "first\nand second" (where \n stands for a newline
47  character), with the match starting at the seventh character. In order to do  character), with the match starting at the seventh character. In order to do
48  this, PCRE has to retry the match starting after every newline in the subject.  this, PCRE has to retry the match starting after every newline in the subject.
# Line 48  having to scan along the subject looking Line 57  having to scan along the subject looking
57  Beware of patterns that contain nested indefinite repeats. These can take a  Beware of patterns that contain nested indefinite repeats. These can take a
58  long time to run when applied to a string that does not match. Consider the  long time to run when applied to a string that does not match. Consider the
59  pattern fragment  pattern fragment
 </P>  
 <P>  
60  <pre>  <pre>
61    (a+)*    (a+)*
62  </PRE>  </pre>
 </P>  
 <P>  
63  This can match "aaaa" in 33 different ways, and this number increases very  This can match "aaaa" in 33 different ways, and this number increases very
64  rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4  rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
65  times, and for each of those cases other than 0, the + repeats can match  times, and for each of those cases other than 0, the + repeats can match
# Line 64  variation, and this can take an extremel Line 69  variation, and this can take an extremel
69  </P>  </P>
70  <P>  <P>
71  An optimization catches some of the more simple cases such as  An optimization catches some of the more simple cases such as
 </P>  
 <P>  
72  <pre>  <pre>
73    (a+)*b    (a+)*b
74  </PRE>  </pre>
 </P>  
 <P>  
75  where a literal character follows. Before embarking on the standard matching  where a literal character follows. Before embarking on the standard matching
76  procedure, PCRE checks that there is a "b" later in the subject string, and if  procedure, PCRE checks that there is a "b" later in the subject string, and if
77  there is not, it fails the match immediately. However, when there is no  there is not, it fails the match immediately. However, when there is no
78  following literal this optimization cannot be used. You can see the difference  following literal this optimization cannot be used. You can see the difference
79  by comparing the behaviour of  by comparing the behaviour of
 </P>  
 <P>  
80  <pre>  <pre>
81    (a+)*\d    (a+)*\d
82  </PRE>  </pre>
 </P>  
 <P>  
83  with the pattern above. The former gives a failure almost instantly when  with the pattern above. The former gives a failure almost instantly when
84  applied to a whole line of "a" characters, whereas the latter takes an  applied to a whole line of "a" characters, whereas the latter takes an
85  appreciable time with strings longer than about 20 characters.  appreciable time with strings longer than about 20 characters.
86  </P>  </P>
87  <P>  <P>
88  Last updated: 03 February 2003  In many cases, the solution to this kind of performance issue is to use an
89    atomic group or a possessive quantifier.
90    </P>
91    <P>
92    Last updated: 09 September 2004
93  <br>  <br>
94  Copyright &copy; 1997-2003 University of Cambridge.  Copyright &copy; 1997-2004 University of Cambridge.
95    <p>
96    Return to the <a href="index.html">PCRE index page</a>.
97    </p>

Legend:
Removed from v.63  
changed lines
  Added in v.75

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12