/[pcre]/code/trunk/ChangeLog
ViewVC logotype

Diff of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 89 by nigel, Sat Feb 24 21:41:27 2007 UTC revision 91 by nigel, Sat Feb 24 21:41:34 2007 UTC
# Line 1  Line 1 
1  ChangeLog for PCRE  ChangeLog for PCRE
2  ------------------  ------------------
3    
4    Version 6.7 04-Jul-06
5    ---------------------
6    
7     1. In order to handle tests when input lines are enormously long, pcretest has
8        been re-factored so that it automatically extends its buffers when
9        necessary. The code is crude, but this _is_ just a test program. The
10        default size has been increased from 32K to 50K.
11    
12     2. The code in pcre_study() was using the value of the re argument before
13        testing it for NULL. (Of course, in any sensible call of the function, it
14        won't be NULL.)
15    
16     3. The memmove() emulation function in pcre_internal.h, which is used on
17        systems that lack both memmove() and bcopy() - that is, hardly ever -
18        was missing a "static" storage class specifier.
19    
20     4. When UTF-8 mode was not set, PCRE looped when compiling certain patterns
21        containing an extended class (one that cannot be represented by a bitmap
22        because it contains high-valued characters or Unicode property items, e.g.
23        [\pZ]). Almost always one would set UTF-8 mode when processing such a
24        pattern, but PCRE should not loop if you do not (it no longer does).
25        [Detail: two cases were found: (a) a repeated subpattern containing an
26        extended class; (b) a recursive reference to a subpattern that followed a
27        previous extended class. It wasn't skipping over the extended class
28        correctly when UTF-8 mode was not set.]
29    
30     5. A negated single-character class was not being recognized as fixed-length
31        in lookbehind assertions such as (?<=[^f]), leading to an incorrect
32        compile error "lookbehind assertion is not fixed length".
33    
34     6. The RunPerlTest auxiliary script was showing an unexpected difference
35        between PCRE and Perl for UTF-8 tests. It turns out that it is hard to
36        write a Perl script that can interpret lines of an input file either as
37        byte characters or as UTF-8, which is what "perltest" was being required to
38        do for the non-UTF-8 and UTF-8 tests, respectively. Essentially what you
39        can't do is switch easily at run time between having the "use utf8;" pragma
40        or not. In the end, I fudged it by using the RunPerlTest script to insert
41        "use utf8;" explicitly for the UTF-8 tests.
42    
43     7. In multiline (/m) mode, PCRE was matching ^ after a terminating newline at
44        the end of the subject string, contrary to the documentation and to what
45        Perl does. This was true of both matching functions. Now it matches only at
46        the start of the subject and immediately after *internal* newlines.
47    
48     8. A call of pcre_fullinfo() from pcretest to get the option bits was passing
49        a pointer to an int instead of a pointer to an unsigned long int. This
50        caused problems on 64-bit systems.
51    
52     9. Applied a patch from the folks at Google to pcrecpp.cc, to fix "another
53        instance of the 'standard' template library not being so standard".
54    
55    10. There was no check on the number of named subpatterns nor the maximum
56        length of a subpattern name. The product of these values is used to compute
57        the size of the memory block for a compiled pattern. By supplying a very
58        long subpattern name and a large number of named subpatterns, the size
59        computation could be caused to overflow. This is now prevented by limiting
60        the length of names to 32 characters, and the number of named subpatterns
61        to 10,000.
62    
63    11. Subpatterns that are repeated with specific counts have to be replicated in
64        the compiled pattern. The size of memory for this was computed from the
65        length of the subpattern and the repeat count. The latter is limited to
66        65535, but there was no limit on the former, meaning that integer overflow
67        could in principle occur. The compiled length of a repeated subpattern is
68        now limited to 30,000 bytes in order to prevent this.
69    
70    12. Added the optional facility to have named substrings with the same name.
71    
72    13. Added the ability to use a named substring as a condition, using the
73        Python syntax: (?(name)yes|no). This overloads (?(R)... and names that
74        are numbers (not recommended). Forward references are permitted.
75    
76    14. Added forward references in named backreferences (if you see what I mean).
77    
78    15. In UTF-8 mode, with the PCRE_DOTALL option set, a quantified dot in the
79        pattern could run off the end of the subject. For example, the pattern
80        "(?s)(.{1,5})"8 did this with the subject "ab".
81    
82    16. If PCRE_DOTALL or PCRE_MULTILINE were set, pcre_dfa_exec() behaved as if
83        PCRE_CASELESS was set when matching characters that were quantified with ?
84        or *.
85    
86    17. A character class other than a single negated character that had a minimum
87        but no maximum quantifier - for example [ab]{6,} - was not handled
88        correctly by pce_dfa_exec(). It would match only one character.
89    
90    18. A valid (though odd) pattern that looked like a POSIX character
91        class but used an invalid character after [ (for example [[,abc,]]) caused
92        pcre_compile() to give the error "Failed: internal error: code overflow" or
93        in some cases to crash with a glibc free() error. This could even happen if
94        the pattern terminated after [[ but there just happened to be a sequence of
95        letters, a binary zero, and a closing ] in the memory that followed.
96    
97    19. Perl's treatment of octal escapes in the range \400 to \777 has changed
98        over the years. Originally (before any Unicode support), just the bottom 8
99        bits were taken. Thus, for example, \500 really meant \100. Nowadays the
100        output from "man perlunicode" includes this:
101    
102          The regular expression compiler produces polymorphic opcodes.  That
103          is, the pattern adapts to the data and automatically switches to
104          the Unicode character scheme when presented with Unicode data--or
105          instead uses a traditional byte scheme when presented with byte
106          data.
107    
108        Sadly, a wide octal escape does not cause a switch, and in a string with
109        no other multibyte characters, these octal escapes are treated as before.
110        Thus, in Perl, the pattern  /\500/ actually matches \100 but the pattern
111        /\500|\x{1ff}/ matches \500 or \777 because the whole thing is treated as a
112        Unicode string.
113    
114        I have not perpetrated such confusion in PCRE. Up till now, it took just
115        the bottom 8 bits, as in old Perl. I have now made octal escapes with
116        values greater than \377 illegal in non-UTF-8 mode. In UTF-8 mode they
117        translate to the appropriate multibyte character.
118    
119    29. Applied some refactoring to reduce the number of warnings from Microsoft
120        and Borland compilers. This has included removing the fudge introduced
121        seven years ago for the OS/2 compiler (see 2.02/2 below) because it caused
122        a warning about an unused variable.
123    
124    21. PCRE has not included VT (character 0x0b) in the set of whitespace
125        characters since release 4.0, because Perl (from release 5.004) does not.
126        [Or at least, is documented not to: some releases seem to be in conflict
127        with the documentation.] However, when a pattern was studied with
128        pcre_study() and all its branches started with \s, PCRE still included VT
129        as a possible starting character. Of course, this did no harm; it just
130        caused an unnecessary match attempt.
131    
132    22. Removed a now-redundant internal flag bit that recorded the fact that case
133        dependency changed within the pattern. This was once needed for "required
134        byte" processing, but is no longer used. This recovers a now-scarce options
135        bit. Also moved the least significant internal flag bit to the most-
136        significant bit of the word, which was not previously used (hangover from
137        the days when it was an int rather than a uint) to free up another bit for
138        the future.
139    
140    23. Added support for CRLF line endings as well as CR and LF. As well as the
141        default being selectable at build time, it can now be changed at runtime
142        via the PCRE_NEWLINE_xxx flags. There are now options for pcregrep to
143        specify that it is scanning data with non-default line endings.
144    
145    24. Changed the definition of CXXLINK to make it agree with the definition of
146        LINK in the Makefile, by replacing LDFLAGS to CXXFLAGS.
147    
148    25. Applied Ian Taylor's patches to avoid using another stack frame for tail
149        recursions. This makes a big different to stack usage for some patterns.
150    
151    26. If a subpattern containing a named recursion or subroutine reference such
152        as (?P>B) was quantified, for example (xxx(?P>B)){3}, the calculation of
153        the space required for the compiled pattern went wrong and gave too small a
154        value. Depending on the environment, this could lead to "Failed: internal
155        error: code overflow at offset 49" or "glibc detected double free or
156        corruption" errors.
157    
158    27. Applied patches from Google (a) to support the new newline modes and (b) to
159        advance over multibyte UTF-8 characters in GlobalReplace.
160    
161    28. Change free() to pcre_free() in pcredemo.c. Apparently this makes a
162        difference for some implementation of PCRE in some Windows version.
163    
164    29. Added some extra testing facilities to pcretest:
165    
166        \q<number>   in a data line sets the "match limit" value
167        \Q<number>   in a data line sets the "match recursion limt" value
168        -S <number>  sets the stack size, where <number> is in megabytes
169    
170        The -S option isn't available for Windows.
171    
172    
173  Version 6.6 06-Feb-06  Version 6.6 06-Feb-06
174  ---------------------  ---------------------
175    

Legend:
Removed from v.89  
changed lines
  Added in v.91

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12