/[pcre]/code/tags/pcre-1.03/Performance
ViewVC logotype

Contents of /code/tags/pcre-1.03/Performance

Parent Directory Parent Directory | Revision Log Revision Log


Revision 10 - (show annotations) (download)
Sat Feb 24 21:38:15 2007 UTC (7 years, 6 months ago) by nigel
File size: 7165 byte(s)
Tag code/trunk as code/tags/pcre-1.03.

1 Some comparisons of PCRE with the original Henry Spencer (1986) regular
2 expression functions were done on a SPARCstation IPC using gcc version 2.6.3
3 with -O optimization, to give some idea as to how the two libraries compare.
4 This is not a major statistical investigation.
5
6
7 Code size
8 ---------
9
10 The code size of PCRE is a bit over twice the size of the Henry Spencer
11 functions (roughly 33K vs 14K bytes on a SPARCstation with gcc -O).
12
13
14 Store size for compiled expressions
15 -----------------------------------
16
17 For expressions that are compatible with both libraries, PCRE uses less store
18 for the examples tried, except in some cases that involve the use of character
19 classes. Except in the special case of a negated charcter class containing only
20 one character (e.g. [^a]), PCRE uses a 32-byte bit map for each character
21 class, in order to get the maximum matching speed. By contrast the Spencer code
22 uses a strchr() call.
23
24 The Spencer functions have an overhead of 92 bytes per expression, because
25 there is a table for up to 10 matched substrings held with every compiled
26 expression. In contrast, PCRE's overhead is just 9 bytes, since it requires the
27 caller to supply a vector to receive the offsets of the matched substrings. In
28 the table below, the size without the overhead is shown in brackets.
29
30 PCRE Spencer Pattern
31 ---- ------- -------
32
33 18 (09) 109 (17) /^$/
34 25 (16) 120 (28) /^.*nter/
35 26 (17) 121 (29) /^12.34/
36 37 (28) 126 (34) /the quick brown fox/
37 50 (41) 114 (22) /^[]cde]/
38 50 (41) 114 (22) /^[^]cde]/
39 51 (42) 125 (33) /^[.^$|()*+?{,}]+/
40 52 (43) 126 (34) /^[0-9]+$/
41 56 (47) 153 (61) /^(abc)(abc)?zz/
42 57 (48) 133 (41) /^xxx[0-9]+$/
43 57 (48) 145 (53) /([0-9a-fA-F:]+)$/
44 62 (53) 171 (79) /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$$/
45 70 (61) 170 (78) /^(b+|a)(b+|a)?c/
46 74 (65) 173 (81) /^(ba|b*)(ba|b*)?bc/
47 99 (90) 235 (143) /^(a(b(c)))(d(e(f)))(h(i(j)))$/
48 119 (110) 157 (65) /^.+[0-9][0-9][0-9]$/
49 165 (156) 446 (354) /^[a-zA-Z0-9][a-zA-Z0-9\-]*(\.[a-zA-Z0-9][a-zA-z0-9\-]*)*\.$/
50 451 (442) 605 (513) /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
51
52
53 Compilation time
54 ----------------
55
56 Timing was done using the clock() function to time 2000 compilations of each
57 expression and then dividing by twice the number of clocks per second, to get a
58 value in milliseconds. The variation observed over several runs was never more
59 than 0.01:
60
61 PCRE Spencer Pattern
62 ---- ------- -------
63
64 0.04 0.07 /^$/
65 0.06 0.12 /^.*nter/
66 0.06 0.13 /^12.34/
67 0.06 0.09 /^[]cde]/
68 0.07 0.14 /^[0-9]+$/
69 0.07 0.10 /^[^]cde]/
70 0.08 0.17 /^xxx[0-9]+$/
71 0.08 0.14 /the quick brown fox/
72 0.09 0.14 /^[.^$|()*+?{,}]+/
73 0.10 0.33 /([0-9a-fA-F:]+)$/
74 0.12 0.26 /^.+[0-9][0-9][0-9]$/
75 0.12 0.42 /^(abc)(abc)?zz/
76 0.14 0.51 /^(b+|a)(b+|a)?c/
77 0.15 0.53 /^(ba|b*)(ba|b*)?bc/
78 0.19 0.51 /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/
79 0.34 1.59 /^(a(b(c)))(d(e(f)))(h(i(j)))$/
80 0.47 1.32 /^[a-zA-Z0-9][a-zA-Z0-9\-]*(\.[a-zA-Z0-9][a-zA-z0-9\-]*)*\.$/
81 0.66 1.78 /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
82
83
84 Execution time
85 --------------
86
87 Execution timing was done in a similar manner. Blank entries in the "pattern"
88 column below indicate the use of the same pattern as before.
89
90 PCRE Spencer Subject Pattern
91 ---- ------- ------- -------
92
93 0.03 0.02 <null string> /^$/
94 0.04 0.04 enter /^.*nter/
95 0.04 0.04 uponter
96 0.03 0.03 12\r34 /^12.34/
97 0.03 0.03 0 /^[0-9]+$/
98 0.04 0.03 100
99 0.03 0.03 ]thing /^[]cde]/
100 0.03 0.03 ething
101 0.03 0.03 athing /^[^]cde]/
102 0.04 0.04 xxx0 /^xxx[0-9]+$/
103 0.04 0.04 xxx1234
104 0.04 0.07 .^\$(*+)|{?,?} /^[.^$|()*+?{,}]+/
105 0.03 0.03 the quick brown fox /the quick brown fox/
106 0.06 0.08 What do you know about the quick brown fox?
107 0.04 0.07 0abc /([0-9a-fA-F:]+)$/
108 0.04 0.07 abc
109 0.05 0.13 5f03:12C0::932e
110 0.06 0.07 x123 /^.+[0-9][0-9][0-9]$/
111 0.06 0.07 123456
112 0.06 0.09 abczz /^(abc)(abc)?zz/
113 0.06 0.12 abcabczz /^(abc)(abc)?zz/
114 /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/
115 0.23 0.28 abc!pqr=apquxz.ixr.zzz.ac.uk
116 0.09 0.15 bc /^(b+|a)(b+|a)?c/
117 0.09 0.15 bbc
118 0.08 0.15 bbbc
119 0.09 0.15 bac
120 0.09 0.15 bbac
121 0.07 0.14 aac
122 0.09 0.15 abbbbbbbbbbbc
123 0.09 0.15 bbbbbbbbbbbac
124 0.09 0.18 babc /^(ba|b*)(ba|b*)?bc/
125 0.12 0.24 bbabc
126 0.07 0.15 bababc
127 0.06 0.10 a. /^[a-zA-Z0-9][a-zA-Z0-9\-]*(\.[a-zA-Z0-9][a-zA-z0-9\-]*)*\.$/
128 0.13 0.34 ab-c.pq-r.
129 0.24 0.58 sxk.zzz.ac.uk.
130 0.12 0.34 x-.y-.
131 0.20 0.38 abcdefhij /^(a(b(c)))(d(e(f)))(h(i(j)))$/
132 /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/
133 0.18 0.30 From abcd Mon Sep 01 12:33:02 1997
134
135 In general, PCRE runs faster than the Spencer function, but remember, this
136 is just for one particular compiler on one set of hardware and operating
137 system. Until comprehensive tests have been run in other environments, the most
138 one can plausibly say is that it is probably no worse on average for the kinds
139 of expression tested here.
140
141
142 Speeding up matching
143 --------------------
144
145 A character class is much more efficient than a set of bracketed alternatives.
146 Matching /^[abc]{12}/ against "abcabcabcabc" took 0.03 ms, whereas
147 /^(a|b|c){12}/ took 0.33 ms. This is because brackets and alternatives involve
148 recursion.
149
150
151 Serious test
152 ------------
153
154 One of the tests of PCRE is the monster regular expression from "Mastering
155 Regular Expressions" (O'Reilly's "hip owls" book, 1997, ISBN 1-56592-257-3)
156 which recognizes email addresses. There are two versions, unoptimized and
157 optimized. For interest, here are their timings, again on a SPARCstation IPC.
158 The compile times were 55 ms and 94 ms, and the compiled expressions
159 occupied 11010 and 15426 bytes of store, respectively. The following strings
160 were matched in the times shown (unoptimized first):
161
162 0.34 0.38 user@dom.ain
163 0.38 0.42 <user@dom.ain>
164 0.88 0.60 Alan Other <user@dom.ain>
165 1.87 0.82 "A. Other" <user.1234@dom.ain> (a comment)
166 1.77 1.19 A. Other <user.1234@dom.ain> (a comment)
167 2.21 0.42 "/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/"@x400-re.lay
168
169 The optimization of the expression clearly has a dramatic effect in some cases.
170
171 Philip Hazel <ph10@cus.cam.ac.uk>
172 October 1997

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12