/[pcre]/code/tags/pcre-1.03/Performance
ViewVC logotype

Contents of /code/tags/pcre-1.03/Performance

Parent Directory Parent Directory | Revision Log Revision Log


Revision 3 - (hide annotations) (download)
Sat Feb 24 21:38:01 2007 UTC (7 years, 2 months ago) by nigel
Original Path: code/trunk/Performance
File size: 7165 byte(s)
Load pcre-1.00 into code/trunk.

1 nigel 3 Some comparisons of PCRE with the original Henry Spencer (1986) regular
2     expression functions were done on a SPARCstation IPC using gcc version 2.6.3
3     with -O optimization, to give some idea as to how the two libraries compare.
4     This is not a major statistical investigation.
5    
6    
7     Code size
8     ---------
9    
10     The code size of PCRE is a bit over twice the size of the Henry Spencer
11     functions (roughly 33K vs 14K bytes on a SPARCstation with gcc -O).
12    
13    
14     Store size for compiled expressions
15     -----------------------------------
16    
17     For expressions that are compatible with both libraries, PCRE uses less store
18     for the examples tried, except in some cases that involve the use of character
19     classes. Except in the special case of a negated charcter class containing only
20     one character (e.g. [^a]), PCRE uses a 32-byte bit map for each character
21     class, in order to get the maximum matching speed. By contrast the Spencer code
22     uses a strchr() call.
23    
24     The Spencer functions have an overhead of 92 bytes per expression, because
25     there is a table for up to 10 matched substrings held with every compiled
26     expression. In contrast, PCRE's overhead is just 9 bytes, since it requires the
27     caller to supply a vector to receive the offsets of the matched substrings. In
28     the table below, the size without the overhead is shown in brackets.
29    
30     PCRE Spencer Pattern
31     ---- ------- -------
32    
33     18 (09) 109 (17) /^$/
34     25 (16) 120 (28) /^.*nter/
35     26 (17) 121 (29) /^12.34/
36     37 (28) 126 (34) /the quick brown fox/
37     50 (41) 114 (22) /^[]cde]/
38     50 (41) 114 (22) /^[^]cde]/
39     51 (42) 125 (33) /^[.^$|()*+?{,}]+/
40     52 (43) 126 (34) /^[0-9]+$/
41     56 (47) 153 (61) /^(abc)(abc)?zz/
42     57 (48) 133 (41) /^xxx[0-9]+$/
43     57 (48) 145 (53) /([0-9a-fA-F:]+)$/
44     62 (53) 171 (79) /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$$/
45     70 (61) 170 (78) /^(b+|a)(b+|a)?c/
46     74 (65) 173 (81) /^(ba|b*)(ba|b*)?bc/
47     99 (90) 235 (143) /^(a(b(c)))(d(e(f)))(h(i(j)))$/
48     119 (110) 157 (65) /^.+[0-9][0-9][0-9]$/
49     165 (156) 446 (354) /^[a-zA-Z0-9][a-zA-Z0-9\-]*(\.[a-zA-Z0-9][a-zA-z0-9\-]*)*\.$/
50     451 (442) 605 (513) /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
51    
52    
53     Compilation time
54     ----------------
55    
56     Timing was done using the clock() function to time 2000 compilations of each
57     expression and then dividing by twice the number of clocks per second, to get a
58     value in milliseconds. The variation observed over several runs was never more
59     than 0.01:
60    
61     PCRE Spencer Pattern
62     ---- ------- -------
63    
64     0.04 0.07 /^$/
65     0.06 0.12 /^.*nter/
66     0.06 0.13 /^12.34/
67     0.06 0.09 /^[]cde]/
68     0.07 0.14 /^[0-9]+$/
69     0.07 0.10 /^[^]cde]/
70     0.08 0.17 /^xxx[0-9]+$/
71     0.08 0.14 /the quick brown fox/
72     0.09 0.14 /^[.^$|()*+?{,}]+/
73     0.10 0.33 /([0-9a-fA-F:]+)$/
74     0.12 0.26 /^.+[0-9][0-9][0-9]$/
75     0.12 0.42 /^(abc)(abc)?zz/
76     0.14 0.51 /^(b+|a)(b+|a)?c/
77     0.15 0.53 /^(ba|b*)(ba|b*)?bc/
78     0.19 0.51 /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/
79     0.34 1.59 /^(a(b(c)))(d(e(f)))(h(i(j)))$/
80     0.47 1.32 /^[a-zA-Z0-9][a-zA-Z0-9\-]*(\.[a-zA-Z0-9][a-zA-z0-9\-]*)*\.$/
81     0.66 1.78 /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
82    
83    
84     Execution time
85     --------------
86    
87     Execution timing was done in a similar manner. Blank entries in the "pattern"
88     column below indicate the use of the same pattern as before.
89    
90     PCRE Spencer Subject Pattern
91     ---- ------- ------- -------
92    
93     0.03 0.02 <null string> /^$/
94     0.04 0.04 enter /^.*nter/
95     0.04 0.04 uponter
96     0.03 0.03 12\r34 /^12.34/
97     0.03 0.03 0 /^[0-9]+$/
98     0.04 0.03 100
99     0.03 0.03 ]thing /^[]cde]/
100     0.03 0.03 ething
101     0.03 0.03 athing /^[^]cde]/
102     0.04 0.04 xxx0 /^xxx[0-9]+$/
103     0.04 0.04 xxx1234
104     0.04 0.07 .^\$(*+)|{?,?} /^[.^$|()*+?{,}]+/
105     0.03 0.03 the quick brown fox /the quick brown fox/
106     0.06 0.08 What do you know about the quick brown fox?
107     0.04 0.07 0abc /([0-9a-fA-F:]+)$/
108     0.04 0.07 abc
109     0.05 0.13 5f03:12C0::932e
110     0.06 0.07 x123 /^.+[0-9][0-9][0-9]$/
111     0.06 0.07 123456
112     0.06 0.09 abczz /^(abc)(abc)?zz/
113     0.06 0.12 abcabczz /^(abc)(abc)?zz/
114     /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/
115     0.23 0.28 abc!pqr=apquxz.ixr.zzz.ac.uk
116     0.09 0.15 bc /^(b+|a)(b+|a)?c/
117     0.09 0.15 bbc
118     0.08 0.15 bbbc
119     0.09 0.15 bac
120     0.09 0.15 bbac
121     0.07 0.14 aac
122     0.09 0.15 abbbbbbbbbbbc
123     0.09 0.15 bbbbbbbbbbbac
124     0.09 0.18 babc /^(ba|b*)(ba|b*)?bc/
125     0.12 0.24 bbabc
126     0.07 0.15 bababc
127     0.06 0.10 a. /^[a-zA-Z0-9][a-zA-Z0-9\-]*(\.[a-zA-Z0-9][a-zA-z0-9\-]*)*\.$/
128     0.13 0.34 ab-c.pq-r.
129     0.24 0.58 sxk.zzz.ac.uk.
130     0.12 0.34 x-.y-.
131     0.20 0.38 abcdefhij /^(a(b(c)))(d(e(f)))(h(i(j)))$/
132     /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
133     0.18 0.30 From abcd Mon Sep 01 12:33:02 1997
134    
135     In general, PCRE runs faster than the Spencer function, but remember, this
136     is just for one particular compiler on one set of hardware and operating
137     system. Until comprehensive tests have been run in other environments, the most
138     one can plausibly say is that it is probably no worse on average for the kinds
139     of expression tested here.
140    
141    
142     Speeding up matching
143     --------------------
144    
145     A character class is much more efficient than a set of bracketed alternatives.
146     Matching /^[abc]{12}/ against "abcabcabcabc" took 0.03 ms, whereas
147     /^(a|b|c){12}/ took 0.33 ms. This is because brackets and alternatives involve
148     recursion.
149    
150    
151     Serious test
152     ------------
153    
154     One of the tests of PCRE is the monster regular expression from "Mastering
155     Regular Expressions" (O'Reilly's "hip owls" book, 1997, ISBN 1-56592-257-3)
156     which recognizes email addresses. There are two versions, unoptimized and
157     optimized. For interest, here are their timings, again on a SPARCstation IPC.
158     The compile times were 55 ms and 94 ms, and the compiled expressions
159     occupied 11010 and 15426 bytes of store, respectively. The following strings
160     were matched in the times shown (unoptimized first):
161    
162     0.34 0.38 user@dom.ain
163     0.38 0.42 <user@dom.ain>
164     0.88 0.60 Alan Other <user@dom.ain>
165     1.87 0.82 "A. Other" <user.1234@dom.ain> (a comment)
166     1.77 1.19 A. Other <user.1234@dom.ain> (a comment)
167     2.21 0.42 "/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/"@x400-re.lay
168    
169     The optimization of the expression clearly has a dramatic effect in some cases.
170    
171     Philip Hazel <ph10@cus.cam.ac.uk>
172     October 1997

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12