/[pcre]/code/trunk/maint/Tech.Notes
ViewVC logotype

Diff of /code/trunk/maint/Tech.Notes

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 87 by nigel, Sat Feb 24 21:41:21 2007 UTC revision 91 by nigel, Sat Feb 24 21:41:34 2007 UTC
# Line 1  Line 1 
1  Technical Notes about PCRE  Technical Notes about PCRE
2  --------------------------  --------------------------
3    
4    These are very rough technical notes that record potentially useful information
5    about PCRE internals.
6    
7  Historical note 1  Historical note 1
8  -----------------  -----------------
9    
# Line 21  the pattern, as is expected in Unix and Line 24  the pattern, as is expected in Unix and
24  Historical note 2  Historical note 2
25  -----------------  -----------------
26    
27  By contrast, the code originally written by Henry Spencer and subsequently  By contrast, the code originally written by Henry Spencer (which was
28  heavily modified for Perl actually compiles the expression twice: once in a  subsequently heavily modified for Perl) compiles the expression twice: once in
29  dummy mode in order to find out how much store will be needed, and then for  a dummy mode in order to find out how much store will be needed, and then for
30  real. The execution function operates by backtracking and maximizing (or,  real. (The Perl version probably doesn't do this any more; I'm talking about
31  optionally, minimizing in Perl) the amount of the subject that matches  the original library.) The execution function operates by backtracking and
32  individual wild portions of the pattern. This is an "NFA algorithm" in Friedl's  maximizing (or, optionally, minimizing in Perl) the amount of the subject that
33  terminology.  matches individual wild portions of the pattern. This is an "NFA algorithm" in
34    Friedl's terminology.
35    
36  OK, here's the real stuff  OK, here's the real stuff
37  -------------------------  -------------------------
# Line 43  then a second pass to do the real compil Line 47  then a second pass to do the real compil
47  predicted amount of store. The idea is that this is going to turn out faster  predicted amount of store. The idea is that this is going to turn out faster
48  because the first pass is degenerate and the second pass can just store stuff  because the first pass is degenerate and the second pass can just store stuff
49  straight into the vector, which it knows is big enough. It does make the  straight into the vector, which it knows is big enough. It does make the
50  compiling functions bigger, of course, but they have got quite big anyway to  compiling functions bigger, of course, but they have become quite big anyway to
51  handle all the Perl stuff.  handle all the Perl stuff.
52    
53  Traditional matching function  Traditional matching function
# Line 63  pcre_dfa_exec(). This implements a DFA m Line 67  pcre_dfa_exec(). This implements a DFA m
67  simultaneously for all possible matches that start at one point in the subject  simultaneously for all possible matches that start at one point in the subject
68  string. (Going back to my roots: see Historical Note 1 above.) This function  string. (Going back to my roots: see Historical Note 1 above.) This function
69  intreprets the same compiled pattern data as pcre_exec(); however, not all the  intreprets the same compiled pattern data as pcre_exec(); however, not all the
70  facilities are available, and those that are don't always work in quite the  facilities are available, and those that are do not always work in quite the
71  same way. See the user documentation for details.  same way. See the user documentation for details.
72    
73  Format of compiled patterns  Format of compiled patterns
# Line 157  Match by Unicode property Line 161  Match by Unicode property
161    
162  OP_PROP and OP_NOTPROP are used for positive and negative matches of a  OP_PROP and OP_NOTPROP are used for positive and negative matches of a
163  character by testing its Unicode property (the \p and \P escape sequences).  character by testing its Unicode property (the \p and \P escape sequences).
164  Each is followed by a single byte that encodes the desired property value.  Each is followed by two bytes that encode the desired property as a type and a
165    value.
166    
167  Repeats of these items use the OP_TYPESTAR etc. set of opcodes, followed by two  Repeats of these items use the OP_TYPESTAR etc. set of opcodes, followed by
168  bytes: OP_PROP or OP_NOTPROP and then the desired property value.  three bytes: OP_PROP or OP_NOTPROP and then the desired property type and
169    value.
170    
171    
172  Matching literal characters  Matching literal characters
# Line 339  at compile time, and so does not cause a Line 345  at compile time, and so does not cause a
345  data.  data.
346    
347  Philip Hazel  Philip Hazel
348  January 2006  June 2006

Legend:
Removed from v.87  
changed lines
  Added in v.91

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12