/[pcre]/code/trunk/maint/Tech.Notes
ViewVC logotype

Diff of /code/trunk/maint/Tech.Notes

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 63 by nigel, Sat Feb 24 21:40:03 2007 UTC revision 71 by nigel, Sat Feb 24 21:40:24 2007 UTC
# Line 48  These items are all just one byte long Line 48  These items are all just one byte long
48    
49    OP_END                 end of pattern    OP_END                 end of pattern
50    OP_ANY                 match any character    OP_ANY                 match any character
51      OP_ANYBYTE             match any single byte, even in UTF-8 mode
52    OP_SOD                 match start of data: \A    OP_SOD                 match start of data: \A
53      OP_SOM,                start of match (subject + offset): \G
54    OP_CIRC                ^ (start of data, or after \n in multiline)    OP_CIRC                ^ (start of data, or after \n in multiline)
55    OP_NOT_WORD_BOUNDARY   \W    OP_NOT_WORD_BOUNDARY   \W
56    OP_WORD_BOUNDARY       \w    OP_WORD_BOUNDARY       \w
# Line 61  These items are all just one byte long Line 63  These items are all just one byte long
63    OP_EODN                match end of data or \n at end: \Z    OP_EODN                match end of data or \n at end: \Z
64    OP_EOD                 match end of data: \z    OP_EOD                 match end of data: \z
65    OP_DOLL                $ (end of data, or before \n in multiline)    OP_DOLL                $ (end of data, or before \n in multiline)
   OP_RECURSE             match the pattern recursively  
66    
67    
68  Repeating single characters  Repeating single characters
# Line 119  instances of OP_CHARS are used. Line 120  instances of OP_CHARS are used.
120  Character classes  Character classes
121  -----------------  -----------------
122    
123  When characters less than 256 are involved, OP_CLASS is used for a character  If there is only one character, OP_CHARS is used for a positive class,
 class. If there is only one character, OP_CHARS is used for a positive class,  
124  and OP_NOT for a negative one (that is, for something like [^a]). However, in  and OP_NOT for a negative one (that is, for something like [^a]). However, in
125  UTF-8 mode, this applies only to characters with values < 128, because OP_NOT  UTF-8 mode, this applies only to characters with values < 128, because OP_NOT
126  is confined to single bytes.  is confined to single bytes.
# Line 129  Another set of repeating opcodes (OP_NOT Line 129  Another set of repeating opcodes (OP_NOT
129  negated, single-character class. The normal ones (OP_STAR etc.) are used for a  negated, single-character class. The normal ones (OP_STAR etc.) are used for a
130  repeated positive single-character class.  repeated positive single-character class.
131    
132  OP_CLASS is followed by a 32-byte bit map containing a 1 bit for every  When there's more than one character in a class and all the characters are less
133  character that is acceptable. The bits are counted from the least significant  than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a negative
134  end of each byte.  one. In either case, the opcode is followed by a 32-byte bit map containing a 1
135    bit for every character that is acceptable. The bits are counted from the least
136    significant end of each byte.
137    
138    The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 mode,
139    subject characters with values greater than 256 can be handled correctly. For
140    OP_CLASS they don't match, whereas for OP_NCLASS they do.
141    
142  For classes containing characters with values > 255, OP_XCLASS is used. It  For classes containing characters with values > 255, OP_XCLASS is used. It
143  optionally uses a bit map (if any characters lie within it), followed by a list  optionally uses a bit map (if any characters lie within it), followed by a list
# Line 243  same scheme is used, with a "reference n Line 249  same scheme is used, with a "reference n
249  conditional subpattern always starts with one of the assertions.  conditional subpattern always starts with one of the assertions.
250    
251    
252    Recursion
253    ---------
254    
255    Recursion either matches the current regex, or some subexpression. The opcode
256    OP_RECURSE is followed by an value which is the offset to the starting bracket
257    from the start of the whole pattern.
258    
259    
260    Callout
261    -------
262    
263    OP_CALLOUT is followed by one byte of data that holds a callout number in the
264    range 0 to 255.
265    
266    
267  Changing options  Changing options
268  ----------------  ----------------
269    
# Line 257  at compile time, and so does not cause a Line 278  at compile time, and so does not cause a
278  data.  data.
279    
280  Philip Hazel  Philip Hazel
281  August 2002  August 2003

Legend:
Removed from v.63  
changed lines
  Added in v.71

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12