/[pcre]/code/trunk/ChangeLog
ViewVC logotype

Contents of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log


Revision 41 - (show annotations) (download)
Sat Feb 24 21:39:17 2007 UTC (7 years, 5 months ago) by nigel
File size: 20293 byte(s)
Load pcre-2.08a into code/trunk.

1 ChangeLog for PCRE
2 ------------------
3
4
5 Version 2.09 14-Sep-99
6 ----------------------
7
8 1. Add support for the /+ modifier to perltest (to output $` like it does in
9 pcretest).
10
11 2. Add support for the /g modifier to perltest.
12
13 3. Fix pcretest so that it behaves even more like Perl for /g when the pattern
14 matches null strings.
15
16 4. Fix perltest so that it doesn't do unwanted things when fed an empty
17 pattern. Perl treats empty patterns specially - it reuses the most recent
18 pattern, which is not what we want. Replace // by /(?#)/ in order to avoid this
19 effect.
20
21 5. The POSIX interface was broken in that it was just handing over the POSIX
22 captured string vector to pcre_exec(), but (since release 2.00) PCRE has
23 required a bigger vector, with some working space on the end. This means that
24 the POSIX wrapper now has to get and free some memory, and copy the results.
25
26
27 Version 2.08 31-Aug-99
28 ----------------------
29
30 1. When startoffset was not zero and the pattern began with ".*", PCRE was not
31 trying to match at the startoffset position, but instead was moving forward to
32 the next newline as if a previous match had failed.
33
34 2. pcretest was not making use of PCRE_NOTEMPTY when repeating for /g and /G,
35 and could get into a loop if a null string was matched other than at the start
36 of the subject.
37
38 3. Added definitions of PCRE_MAJOR and PCRE_MINOR to pcre.h so the version can
39 be distinguished at compile time, and for completeness also added PCRE_DATE.
40
41 5. Added Paul Sokolovsky's minor changes to make it easy to compile a Win32 DLL
42 in GnuWin32 environments.
43
44
45 Version 2.07 29-Jul-99
46 ----------------------
47
48 1. The documentation is now supplied in plain text form and HTML as well as in
49 the form of man page sources.
50
51 2. C++ compilers don't like assigning (void *) values to other pointer types.
52 In particular this affects malloc(). Although there is no problem in Standard
53 C, I've put in casts to keep C++ compilers happy.
54
55 3. Typo on pcretest.c; a cast of (unsigned char *) in the POSIX regexec() call
56 should be (const char *).
57
58 4. If NOPOSIX is defined, pcretest.c compiles without POSIX support. This may
59 be useful for non-Unix systems who don't want to bother with the POSIX stuff.
60 However, I haven't made this a standard facility. The documentation doesn't
61 mention it, and the Makefile doesn't support it.
62
63 5. The Makefile now contains an "install" target, with editable destinations at
64 the top of the file. The pcretest program is not installed.
65
66 6. pgrep -V now gives the PCRE version number and date.
67
68 7. Fixed bug: a zero repetition after a literal string (e.g. /abcde{0}/) was
69 causing the entire string to be ignored, instead of just the last character.
70
71 8. If a pattern like /"([^\\"]+|\\.)*"/ is applied in the normal way to a
72 non-matching string, it can take a very, very long time, even for strings of
73 quite modest length, because of the nested recursion. PCRE now does better in
74 some of these cases. It does this by remembering the last required literal
75 character in the pattern, and pre-searching the subject to ensure it is present
76 before running the real match. In other words, it applies a heuristic to detect
77 some types of certain failure quickly, and in the above example, if presented
78 with a string that has no trailing " it gives "no match" very quickly.
79
80 9. A new runtime option PCRE_NOTEMPTY causes null string matches to be ignored;
81 other alternatives are tried instead.
82
83
84 Version 2.06 09-Jun-99
85 ----------------------
86
87 1. Change pcretest's output for amount of store used to show just the code
88 space, because the remainder (the data block) varies in size between 32-bit and
89 64-bit systems.
90
91 2. Added an extra argument to pcre_exec() to supply an offset in the subject to
92 start matching at. This allows lookbehinds to work when searching for multiple
93 occurrences in a string.
94
95 3. Added additional options to pcretest for testing multiple occurrences:
96
97 /+ outputs the rest of the string that follows a match
98 /g loops for multiple occurrences, using the new startoffset argument
99 /G loops for multiple occurrences by passing an incremented pointer
100
101 4. PCRE wasn't doing the "first character" optimization for patterns starting
102 with \b or \B, though it was doing it for other lookbehind assertions. That is,
103 it wasn't noticing that a match for a pattern such as /\bxyz/ has to start with
104 the letter 'x'. On long subject strings, this gives a significant speed-up.
105
106
107 Version 2.05 21-Apr-99
108 ----------------------
109
110 1. Changed the type of magic_number from int to long int so that it works
111 properly on 16-bit systems.
112
113 2. Fixed a bug which caused patterns starting with .* not to work correctly
114 when the subject string contained newline characters. PCRE was assuming
115 anchoring for such patterns in all cases, which is not correct because .* will
116 not pass a newline unless PCRE_DOTALL is set. It now assumes anchoring only if
117 DOTALL is set at top level; otherwise it knows that patterns starting with .*
118 must be retried after every newline in the subject.
119
120
121 Version 2.04 18-Feb-99
122 ----------------------
123
124 1. For parenthesized subpatterns with repeats whose minimum was zero, the
125 computation of the store needed to hold the pattern was incorrect (too large).
126 If such patterns were nested a few deep, this could multiply and become a real
127 problem.
128
129 2. Added /M option to pcretest to show the memory requirement of a specific
130 pattern. Made -m a synonym of -s (which does this globally) for compatibility.
131
132 3. Subpatterns of the form (regex){n,m} (i.e. limited maximum) were being
133 compiled in such a way that the backtracking after subsequent failure was
134 pessimal. Something like (a){0,3} was compiled as (a)?(a)?(a)? instead of
135 ((a)((a)(a)?)?)? with disastrous performance if the maximum was of any size.
136
137
138 Version 2.03 02-Feb-99
139 ----------------------
140
141 1. Fixed typo and small mistake in man page.
142
143 2. Added 4th condition (GPL supersedes if conflict) and created separate
144 LICENCE file containing the conditions.
145
146 3. Updated pcretest so that patterns such as /abc\/def/ work like they do in
147 Perl, that is the internal \ allows the delimiter to be included in the
148 pattern. Locked out the use of \ as a delimiter. If \ immediately follows
149 the final delimiter, add \ to the end of the pattern (to test the error).
150
151 4. Added the convenience functions for extracting substrings after a successful
152 match. Updated pcretest to make it able to test these functions.
153
154
155 Version 2.02 14-Jan-99
156 ----------------------
157
158 1. Initialized the working variables associated with each extraction so that
159 their saving and restoring doesn't refer to uninitialized store.
160
161 2. Put dummy code into study.c in order to trick the optimizer of the IBM C
162 compiler for OS/2 into generating correct code. Apparently IBM isn't going to
163 fix the problem.
164
165 3. Pcretest: the timing code wasn't using LOOPREPEAT for timing execution
166 calls, and wasn't printing the correct value for compiling calls. Increased the
167 default value of LOOPREPEAT, and the number of significant figures in the
168 times.
169
170 4. Changed "/bin/rm" in the Makefile to "-rm" so it works on Windows NT.
171
172 5. Renamed "deftables" as "dftables" to get it down to 8 characters, to avoid
173 a building problem on Windows NT with a FAT file system.
174
175
176 Version 2.01 21-Oct-98
177 ----------------------
178
179 1. Changed the API for pcre_compile() to allow for the provision of a pointer
180 to character tables built by pcre_maketables() in the current locale. If NULL
181 is passed, the default tables are used.
182
183
184 Version 2.00 24-Sep-98
185 ----------------------
186
187 1. Since the (>?) facility is in Perl 5.005, don't require PCRE_EXTRA to enable
188 it any more.
189
190 2. Allow quantification of (?>) groups, and make it work correctly.
191
192 3. The first character computation wasn't working for (?>) groups.
193
194 4. Correct the implementation of \Z (it is permitted to match on the \n at the
195 end of the subject) and add 5.005's \z, which really does match only at the
196 very end of the subject.
197
198 5. Remove the \X "cut" facility; Perl doesn't have it, and (?> is neater.
199
200 6. Remove the ability to specify CASELESS, MULTILINE, DOTALL, and
201 DOLLAR_END_ONLY at runtime, to make it possible to implement the Perl 5.005
202 localized options. All options to pcre_study() were also removed.
203
204 7. Add other new features from 5.005:
205
206 $(?<= positive lookbehind
207 $(?<! negative lookbehind
208 (?imsx-imsx) added the unsetting capability
209 such a setting is global if at outer level; local otherwise
210 (?imsx-imsx:) non-capturing groups with option setting
211 (?(cond)re|re) conditional pattern matching
212
213 A backreference to itself in a repeated group matches the previous
214 captured string.
215
216 8. General tidying up of studying (both automatic and via "study")
217 consequential on the addition of new assertions.
218
219 9. As in 5.005, unlimited repeated groups that could match an empty substring
220 are no longer faulted at compile time. Instead, the loop is forcibly broken at
221 runtime if any iteration does actually match an empty substring.
222
223 10. Include the RunTest script in the distribution.
224
225 11. Added tests from the Perl 5.005_02 distribution. This showed up a few
226 discrepancies, some of which were old and were also with respect to 5.004. They
227 have now been fixed.
228
229
230 Version 1.09 28-Apr-98
231 ----------------------
232
233 1. A negated single character class followed by a quantifier with a minimum
234 value of one (e.g. [^x]{1,6} ) was not compiled correctly. This could lead to
235 program crashes, or just wrong answers. This did not apply to negated classes
236 containing more than one character, or to minima other than one.
237
238
239 Version 1.08 27-Mar-98
240 ----------------------
241
242 1. Add PCRE_UNGREEDY to invert the greediness of quantifiers.
243
244 2. Add (?U) and (?X) to set PCRE_UNGREEDY and PCRE_EXTRA respectively. The
245 latter must appear before anything that relies on it in the pattern.
246
247
248 Version 1.07 16-Feb-98
249 ----------------------
250
251 1. A pattern such as /((a)*)*/ was not being diagnosed as in error (unlimited
252 repeat of a potentially empty string).
253
254
255 Version 1.06 23-Jan-98
256 ----------------------
257
258 1. Added Markus Oberhumer's little patches for C++.
259
260 2. Literal strings longer than 255 characters were broken.
261
262
263 Version 1.05 23-Dec-97
264 ----------------------
265
266 1. Negated character classes containing more than one character were failing if
267 PCRE_CASELESS was set at run time.
268
269
270 Version 1.04 19-Dec-97
271 ----------------------
272
273 1. Corrected the man page, where some "const" qualifiers had been omitted.
274
275 2. Made debugging output print "{0,xxx}" instead of just "{,xxx}" to agree with
276 input syntax.
277
278 3. Fixed memory leak which occurred when a regex with back references was
279 matched with an offsets vector that wasn't big enough. The temporary memory
280 that is used in this case wasn't being freed if the match failed.
281
282 4. Tidied pcretest to ensure it frees memory that it gets.
283
284 5. Temporary memory was being obtained in the case where the passed offsets
285 vector was exactly big enough.
286
287 6. Corrected definition of offsetof() from change 5 below.
288
289 7. I had screwed up change 6 below and broken the rules for the use of
290 setjmp(). Now fixed.
291
292
293 Version 1.03 18-Dec-97
294 ----------------------
295
296 1. A erroneous regex with a missing opening parenthesis was correctly
297 diagnosed, but PCRE attempted to access brastack[-1], which could cause crashes
298 on some systems.
299
300 2. Replaced offsetof(real_pcre, code) by offsetof(real_pcre, code[0]) because
301 it was reported that one broken compiler failed on the former because "code" is
302 also an independent variable.
303
304 3. The erroneous regex a[]b caused an array overrun reference.
305
306 4. A regex ending with a one-character negative class (e.g. /[^k]$/) did not
307 fail on data ending with that character. (It was going on too far, and checking
308 the next character, typically a binary zero.) This was specific to the
309 optimized code for single-character negative classes.
310
311 5. Added a contributed patch from the TIN world which does the following:
312
313 + Add an undef for memmove, in case the the system defines a macro for it.
314
315 + Add a definition of offsetof(), in case there isn't one. (I don't know
316 the reason behind this - offsetof() is part of the ANSI standard - but
317 it does no harm).
318
319 + Reduce the ifdef's in pcre.c using macro DPRINTF, thereby eliminating
320 most of the places where whitespace preceded '#'. I have given up and
321 allowed the remaining 2 cases to be at the margin.
322
323 + Rename some variables in pcre to eliminate shadowing. This seems very
324 pedantic, but does no harm, of course.
325
326 6. Moved the call to setjmp() into its own function, to get rid of warnings
327 from gcc -Wall, and avoided calling it at all unless PCRE_EXTRA is used.
328
329 7. Constructs such as \d{8,} were compiling into the equivalent of
330 \d{8}\d{0,65527} instead of \d{8}\d* which didn't make much difference to the
331 outcome, but in this particular case used more store than had been allocated,
332 which caused the bug to be discovered because it threw up an internal error.
333
334 8. The debugging code in both pcre and pcretest for outputting the compiled
335 form of a regex was going wrong in the case of back references followed by
336 curly-bracketed repeats.
337
338
339 Version 1.02 12-Dec-97
340 ----------------------
341
342 1. Typos in pcre.3 and comments in the source fixed.
343
344 2. Applied a contributed patch to get rid of places where it used to remove
345 'const' from variables, and fixed some signed/unsigned and uninitialized
346 variable warnings.
347
348 3. Added the "runtest" target to Makefile.
349
350 4. Set default compiler flag to -O2 rather than just -O.
351
352
353 Version 1.01 19-Nov-97
354 ----------------------
355
356 1. PCRE was failing to diagnose unlimited repeat of empty string for patterns
357 like /([ab]*)*/, that is, for classes with more than one character in them.
358
359 2. Likewise, it wasn't diagnosing patterns with "once-only" subpatterns, such
360 as /((?>a*))*/ (a PCRE_EXTRA facility).
361
362
363 Version 1.00 18-Nov-97
364 ----------------------
365
366 1. Added compile-time macros to support systems such as SunOS4 which don't have
367 memmove() or strerror() but have other things that can be used instead.
368
369 2. Arranged that "make clean" removes the executables.
370
371
372 Version 0.99 27-Oct-97
373 ----------------------
374
375 1. Fixed bug in code for optimizing classes with only one character. It was
376 initializing a 32-byte map regardless, which could cause it to run off the end
377 of the memory it had got.
378
379 2. Added, conditional on PCRE_EXTRA, the proposed (?>REGEX) construction.
380
381
382 Version 0.98 22-Oct-97
383 ----------------------
384
385 1. Fixed bug in code for handling temporary memory usage when there are more
386 back references than supplied space in the ovector. This could cause segfaults.
387
388
389 Version 0.97 21-Oct-97
390 ----------------------
391
392 1. Added the \X "cut" facility, conditional on PCRE_EXTRA.
393
394 2. Optimized negated single characters not to use a bit map.
395
396 3. Brought error texts together as macro definitions; clarified some of them;
397 fixed one that was wrong - it said "range out of order" when it meant "invalid
398 escape sequence".
399
400 4. Changed some char * arguments to const char *.
401
402 5. Added PCRE_NOTBOL and PCRE_NOTEOL (from POSIX).
403
404 6. Added the POSIX-style API wrapper in pcreposix.a and testing facilities in
405 pcretest.
406
407
408 Version 0.96 16-Oct-97
409 ----------------------
410
411 1. Added a simple "pgrep" utility to the distribution.
412
413 2. Fixed an incompatibility with Perl: "{" is now treated as a normal character
414 unless it appears in one of the precise forms "{ddd}", "{ddd,}", or "{ddd,ddd}"
415 where "ddd" means "one or more decimal digits".
416
417 3. Fixed serious bug. If a pattern had a back reference, but the call to
418 pcre_exec() didn't supply a large enough ovector to record the related
419 identifying subpattern, the match always failed. PCRE now remembers the number
420 of the largest back reference, and gets some temporary memory in which to save
421 the offsets during matching if necessary, in order to ensure that
422 backreferences always work.
423
424 4. Increased the compatibility with Perl in a number of ways:
425
426 (a) . no longer matches \n by default; an option PCRE_DOTALL is provided
427 to request this handling. The option can be set at compile or exec time.
428
429 (b) $ matches before a terminating newline by default; an option
430 PCRE_DOLLAR_ENDONLY is provided to override this (but not in multiline
431 mode). The option can be set at compile or exec time.
432
433 (c) The handling of \ followed by a digit other than 0 is now supposed to be
434 the same as Perl's. If the decimal number it represents is less than 10
435 or there aren't that many previous left capturing parentheses, an octal
436 escape is read. Inside a character class, it's always an octal escape,
437 even if it is a single digit.
438
439 (d) An escaped but undefined alphabetic character is taken as a literal,
440 unless PCRE_EXTRA is set. Currently this just reserves the remaining
441 escapes.
442
443 (e) {0} is now permitted. (The previous item is removed from the compiled
444 pattern).
445
446 5. Changed all the names of code files so that the basic parts are no longer
447 than 10 characters, and abolished the teeny "globals.c" file.
448
449 6. Changed the handling of character classes; they are now done with a 32-byte
450 bit map always.
451
452 7. Added the -d and /D options to pcretest to make it possible to look at the
453 internals of compilation without having to recompile pcre.
454
455
456 Version 0.95 23-Sep-97
457 ----------------------
458
459 1. Fixed bug in pre-pass concerning escaped "normal" characters such as \x5c or
460 \x20 at the start of a run of normal characters. These were being treated as
461 real characters, instead of the source characters being re-checked.
462
463
464 Version 0.94 18-Sep-97
465 ----------------------
466
467 1. The functions are now thread-safe, with the caveat that the global variables
468 containing pointers to malloc() and free() or alternative functions are the
469 same for all threads.
470
471 2. Get pcre_study() to generate a bitmap of initial characters for non-
472 anchored patterns when this is possible, and use it if passed to pcre_exec().
473
474
475 Version 0.93 15-Sep-97
476 ----------------------
477
478 1. /(b)|(:+)/ was computing an incorrect first character.
479
480 2. Add pcre_study() to the API and the passing of pcre_extra to pcre_exec(),
481 but not actually doing anything yet.
482
483 3. Treat "-" characters in classes that cannot be part of ranges as literals,
484 as Perl does (e.g. [-az] or [az-]).
485
486 4. Set the anchored flag if a branch starts with .* or .*? because that tests
487 all possible positions.
488
489 5. Split up into different modules to avoid including unneeded functions in a
490 compiled binary. However, compile and exec are still in one module. The "study"
491 function is split off.
492
493 6. The character tables are now in a separate module whose source is generated
494 by an auxiliary program - but can then be edited by hand if required. There are
495 now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or
496 toupper() in the code.
497
498 7. Turn the malloc/free funtions variables into pcre_malloc and pcre_free and
499 make them global. Abolish the function for setting them, as the caller can now
500 set them directly.
501
502
503 Version 0.92 11-Sep-97
504 ----------------------
505
506 1. A repeat with a fixed maximum and a minimum of 1 for an ordinary character
507 (e.g. /a{1,3}/) was broken (I mis-optimized it).
508
509 2. Caseless matching was not working in character classes if the characters in
510 the pattern were in upper case.
511
512 3. Make ranges like [W-c] work in the same way as Perl for caseless matching.
513
514 4. Make PCRE_ANCHORED public and accept as a compile option.
515
516 5. Add an options word to pcre_exec() and accept PCRE_ANCHORED and
517 PCRE_CASELESS at run time. Add escapes \A and \I to pcretest to cause it to
518 pass them.
519
520 6. Give an error if bad option bits passed at compile or run time.
521
522 7. Add PCRE_MULTILINE at compile and exec time, and (?m) as well. Add \M to
523 pcretest to cause it to pass that flag.
524
525 8. Add pcre_info(), to get the number of identifying subpatterns, the stored
526 options, and the first character, if set.
527
528 9. Recognize C+ or C{n,m} where n >= 1 as providing a fixed starting character.
529
530
531 Version 0.91 10-Sep-97
532 ----------------------
533
534 1. PCRE was failing to diagnose unlimited repeats of subpatterns that could
535 match the empty string as in /(a*)*/. It was looping and ultimately crashing.
536
537 2. PCRE was looping on encountering an indefinitely repeated back reference to
538 a subpattern that had matched an empty string, e.g. /(a|)\1*/. It now does what
539 Perl does - treats the match as successful.
540
541 ****

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12