/[pcre]/code/trunk/ChangeLog
ViewVC logotype

Diff of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 63 by nigel, Sat Feb 24 21:40:03 2007 UTC revision 71 by nigel, Sat Feb 24 21:40:24 2007 UTC
# Line 1  Line 1 
1  ChangeLog for PCRE  ChangeLog for PCRE
2  ------------------  ------------------
3    
4  Version 4.00 17-Feb-03  Version 4.4 13-Aug-03
5  ----------------------  ---------------------
6    
7     1. In UTF-8 mode, a character class containing characters with values between
8        127 and 255 was not handled correctly if the compiled pattern was studied.
9        In fixing this, I have also improved the studying algorithm for such
10        classes (slightly).
11    
12     2. Three internal functions had redundant arguments passed to them. Removal
13        might give a very teeny performance improvement.
14    
15     3. Documentation bug: the value of the capture_top field in a callout is *one
16        more than* the number of the hightest numbered captured substring.
17    
18     4. The Makefile linked pcretest and pcregrep with -lpcre, which could result
19        in incorrectly linking with a previously installed version. They now link
20        explicitly with libpcre.la.
21    
22     5. configure.in no longer needs to recognize Cygwin specially.
23    
24     6. A problem in pcre.in for Windows platforms is fixed.
25    
26     7. If a pattern was successfully studied, and the -d (or /D) flag was given to
27        pcretest, it used to include the size of the study block as part of its
28        output. Unfortunately, the structure contains a field that has a different
29        size on different hardware architectures. This meant that the tests that
30        showed this size failed. As the block is currently always of a fixed size,
31        this information isn't actually particularly useful in pcretest output, so
32        I have just removed it.
33    
34     8. Three pre-processor statements accidentally did not start in column 1.
35        Sadly, there are *still* compilers around that complain, even though
36        standard C has not required this for well over a decade. Sigh.
37    
38     9. In pcretest, the code for checking callouts passed small integers in the
39        callout_data field, which is a void * field. However, some picky compilers
40        complained about the casts involved for this on 64-bit systems. Now
41        pcretest passes the address of the small integer instead, which should get
42        rid of the warnings.
43    
44    10. By default, when in UTF-8 mode, PCRE now checks for valid UTF-8 strings at
45        both compile and run time, and gives an error if an invalid UTF-8 sequence
46        is found. There is a option for disabling this check in cases where the
47        string is known to be correct and/or the maximum performance is wanted.
48    
49    11. In response to a bug report, I changed one line in Makefile.in from
50    
51            -Wl,--out-implib,.libs/lib@WIN_PREFIX@pcreposix.dll.a \
52        to
53            -Wl,--out-implib,.libs/@WIN_PREFIX@libpcreposix.dll.a \
54    
55        to look similar to other lines, but I have no way of telling whether this
56        is the right thing to do, as I do not use Windows. No doubt I'll get told
57        if it's wrong...
58    
59    
60    Version 4.3 21-May-03
61    ---------------------
62    
63    1. Two instances of @WIN_PREFIX@ omitted from the Windows targets in the
64       Makefile.
65    
66    2. Some refactoring to improve the quality of the code:
67    
68       (i)   The utf8_table... variables are now declared "const".
69    
70       (ii)  The code for \cx, which used the "case flipping" table to upper case
71             lower case letters, now just substracts 32. This is ASCII-specific,
72             but the whole concept of \cx is ASCII-specific, so it seems
73             reasonable.
74    
75       (iii) PCRE was using its character types table to recognize decimal and
76             hexadecimal digits in the pattern. This is silly, because it handles
77             only 0-9, a-f, and A-F, but the character types table is locale-
78             specific, which means strange things might happen. A private
79             table is now used for this - though it costs 256 bytes, a table is
80             much faster than multiple explicit tests. Of course, the standard
81             character types table is still used for matching digits in subject
82             strings against \d.
83    
84       (iv)  Strictly, the identifier ESC_t is reserved by POSIX (all identifiers
85             ending in _t are). So I've renamed it as ESC_tee.
86    
87    3. The first argument for regexec() in the POSIX wrapper should have been
88       defined as "const".
89    
90    4. Changed pcretest to use malloc() for its buffers so that they can be
91       Electric Fenced for debugging.
92    
93    5. There were several places in the code where, in UTF-8 mode, PCRE would try
94       to read one or more bytes before the start of the subject string. Often this
95       had no effect on PCRE's behaviour, but in some circumstances it could
96       provoke a segmentation fault.
97    
98    6. A lookbehind at the start of a pattern in UTF-8 mode could also cause PCRE
99       to try to read one or more bytes before the start of the subject string.
100    
101    7. A lookbehind in a pattern matched in non-UTF-8 mode on a PCRE compiled with
102       UTF-8 support could misbehave in various ways if the subject string
103       contained bytes with the 0x80 bit set and the 0x40 bit unset in a lookbehind
104       area. (PCRE was not checking for the UTF-8 mode flag, and trying to move
105       back over UTF-8 characters.)
106    
107    
108    Version 4.2 14-Apr-03
109    ---------------------
110    
111    1. Typo "#if SUPPORT_UTF8" instead of "#ifdef SUPPORT_UTF8" fixed.
112    
113    2. Changes to the building process, supplied by Ronald Landheer-Cieslak
114         [ON_WINDOWS]: new variable, "#" on non-Windows platforms
115         [NOT_ON_WINDOWS]: new variable, "#" on Windows platforms
116         [WIN_PREFIX]: new variable, "cyg" for Cygwin
117         * Makefile.in: use autoconf substitution for OBJEXT, EXEEXT, BUILD_OBJEXT
118           and BUILD_EXEEXT
119         Note: automatic setting of the BUILD variables is not yet working
120         set CPPFLAGS and BUILD_CPPFLAGS (but don't use yet) - should be used at
121           compile-time but not at link-time
122         [LINK]: use for linking executables only
123         make different versions for Windows and non-Windows
124         [LINKLIB]: new variable, copy of UNIX-style LINK, used for linking
125           libraries
126         [LINK_FOR_BUILD]: new variable
127         [OBJEXT]: use throughout
128         [EXEEXT]: use throughout
129         <winshared>: new target
130         <wininstall>: new target
131         <dftables.o>: use native compiler
132         <dftables>: use native linker
133         <install>: handle Windows platform correctly
134         <clean>: ditto
135         <check>: ditto
136         copy DLL to top builddir before testing
137    
138       As part of these changes, -no-undefined was removed again. This was reported
139       to give trouble on HP-UX 11.0, so getting rid of it seems like a good idea
140       in any case.
141    
142    3. Some tidies to get rid of compiler warnings:
143    
144       . In the match_data structure, match_limit was an unsigned long int, whereas
145         match_call_count was an int. I've made them both unsigned long ints.
146    
147       . In pcretest the fact that a const uschar * doesn't automatically cast to
148         a void * provoked a warning.
149    
150       . Turning on some more compiler warnings threw up some "shadow" variables
151         and a few more missing casts.
152    
153    4. If PCRE was complied with UTF-8 support, but called without the PCRE_UTF8
154       option, a class that contained a single character with a value between 128
155       and 255 (e.g. /[\xFF]/) caused PCRE to crash.
156    
157    5. If PCRE was compiled with UTF-8 support, but called without the PCRE_UTF8
158       option, a class that contained several characters, but with at least one
159       whose value was between 128 and 255 caused PCRE to crash.
160    
161    
162    Version 4.1 12-Mar-03
163    ---------------------
164    
165    1. Compiling with gcc -pedantic found a couple of places where casts were
166    needed, and a string in dftables.c that was longer than standard compilers are
167    required to support.
168    
169    2. Compiling with Sun's compiler found a few more places where the code could
170    be tidied up in order to avoid warnings.
171    
172    3. The variables for cross-compiling were called HOST_CC and HOST_CFLAGS; the
173    first of these names is deprecated in the latest Autoconf in favour of the name
174    CC_FOR_BUILD, because "host" is typically used to mean the system on which the
175    compiled code will be run. I can't find a reference for HOST_CFLAGS, but by
176    analogy I have changed it to CFLAGS_FOR_BUILD.
177    
178    4. Added -no-undefined to the linking command in the Makefile, because this is
179    apparently helpful for Windows. To make it work, also added "-L. -lpcre" to the
180    linking step for the pcreposix library.
181    
182    5. PCRE was failing to diagnose the case of two named groups with the same
183    name.
184    
185    6. A problem with one of PCRE's optimizations was discovered. PCRE remembers a
186    literal character that is needed in the subject for a match, and scans along to
187    ensure that it is present before embarking on the full matching process. This
188    saves time in cases of nested unlimited repeats that are never going to match.
189    Problem: the scan can take a lot of time if the subject is very long (e.g.
190    megabytes), thus penalizing straightforward matches. It is now done only if the
191    amount of subject to be scanned is less than 1000 bytes.
192    
193    7. A lesser problem with the same optimization is that it was recording the
194    first character of an anchored pattern as "needed", thus provoking a search
195    right along the subject, even when the first match of the pattern was going to
196    fail. The "needed" character is now not set for anchored patterns, unless it
197    follows something in the pattern that is of non-fixed length. Thus, it still
198    fulfils its original purpose of finding quick non-matches in cases of nested
199    unlimited repeats, but isn't used for simple anchored patterns such as /^abc/.
200    
201    
202    Version 4.0 17-Feb-03
203    ---------------------
204    
205  1. If a comment in an extended regex that started immediately after a meta-item  1. If a comment in an extended regex that started immediately after a meta-item
206  extended to the end of string, PCRE compiled incorrect data. This could lead to  extended to the end of string, PCRE compiled incorrect data. This could lead to

Legend:
Removed from v.63  
changed lines
  Added in v.71

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12