/[pcre]/code/trunk/ChangeLog
ViewVC logotype

Contents of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log


Revision 43 - (show annotations) (download)
Sat Feb 24 21:39:21 2007 UTC (7 years, 6 months ago) by nigel
File size: 21362 byte(s)
Load pcre-3.0 into code/trunk.

1 ChangeLog for PCRE
2 ------------------
3
4
5 Version 3.0 01-Feb-00
6 ---------------------
7
8 1. Add support for the /+ modifier to perltest (to output $` like it does in
9 pcretest).
10
11 2. Add support for the /g modifier to perltest.
12
13 3. Fix pcretest so that it behaves even more like Perl for /g when the pattern
14 matches null strings.
15
16 4. Fix perltest so that it doesn't do unwanted things when fed an empty
17 pattern. Perl treats empty patterns specially - it reuses the most recent
18 pattern, which is not what we want. Replace // by /(?#)/ in order to avoid this
19 effect.
20
21 5. The POSIX interface was broken in that it was just handing over the POSIX
22 captured string vector to pcre_exec(), but (since release 2.00) PCRE has
23 required a bigger vector, with some working space on the end. This means that
24 the POSIX wrapper now has to get and free some memory, and copy the results.
25
26 6. Added some simple autoconf support, placing the test data and the
27 documentation in separate directories, re-organizing some of the
28 information files, and making it build pcre-config (a GNU standard). Also added
29 libtool support for building PCRE as a shared library, which is now the
30 default.
31
32 7. Got rid of the leading zero in the definition of PCRE_MINOR because 08 and
33 09 are not valid octal constants. Single digits will be used for minor values
34 less than 10.
35
36 8. Defined REG_EXTENDED and REG_NOSUB as zero in the POSIX header, so that
37 existing programs that set these in the POSIX interface can use PCRE without
38 modification.
39
40 9. Added a new function, pcre_fullinfo() with an extensible interface. It can
41 return all that pcre_info() returns, plus additional data. The pcre_info()
42 function is retained for compatibility, but is considered to be obsolete.
43
44 10. Added experimental recursion feature (?R) to handle one common case that
45 Perl 5.6 will be able to do with (?p{...}).
46
47 11. Added support for POSIX character classes like [:alpha:], which Perl is
48 adopting.
49
50
51 Version 2.08 31-Aug-99
52 ----------------------
53
54 1. When startoffset was not zero and the pattern began with ".*", PCRE was not
55 trying to match at the startoffset position, but instead was moving forward to
56 the next newline as if a previous match had failed.
57
58 2. pcretest was not making use of PCRE_NOTEMPTY when repeating for /g and /G,
59 and could get into a loop if a null string was matched other than at the start
60 of the subject.
61
62 3. Added definitions of PCRE_MAJOR and PCRE_MINOR to pcre.h so the version can
63 be distinguished at compile time, and for completeness also added PCRE_DATE.
64
65 5. Added Paul Sokolovsky's minor changes to make it easy to compile a Win32 DLL
66 in GnuWin32 environments.
67
68
69 Version 2.07 29-Jul-99
70 ----------------------
71
72 1. The documentation is now supplied in plain text form and HTML as well as in
73 the form of man page sources.
74
75 2. C++ compilers don't like assigning (void *) values to other pointer types.
76 In particular this affects malloc(). Although there is no problem in Standard
77 C, I've put in casts to keep C++ compilers happy.
78
79 3. Typo on pcretest.c; a cast of (unsigned char *) in the POSIX regexec() call
80 should be (const char *).
81
82 4. If NOPOSIX is defined, pcretest.c compiles without POSIX support. This may
83 be useful for non-Unix systems who don't want to bother with the POSIX stuff.
84 However, I haven't made this a standard facility. The documentation doesn't
85 mention it, and the Makefile doesn't support it.
86
87 5. The Makefile now contains an "install" target, with editable destinations at
88 the top of the file. The pcretest program is not installed.
89
90 6. pgrep -V now gives the PCRE version number and date.
91
92 7. Fixed bug: a zero repetition after a literal string (e.g. /abcde{0}/) was
93 causing the entire string to be ignored, instead of just the last character.
94
95 8. If a pattern like /"([^\\"]+|\\.)*"/ is applied in the normal way to a
96 non-matching string, it can take a very, very long time, even for strings of
97 quite modest length, because of the nested recursion. PCRE now does better in
98 some of these cases. It does this by remembering the last required literal
99 character in the pattern, and pre-searching the subject to ensure it is present
100 before running the real match. In other words, it applies a heuristic to detect
101 some types of certain failure quickly, and in the above example, if presented
102 with a string that has no trailing " it gives "no match" very quickly.
103
104 9. A new runtime option PCRE_NOTEMPTY causes null string matches to be ignored;
105 other alternatives are tried instead.
106
107
108 Version 2.06 09-Jun-99
109 ----------------------
110
111 1. Change pcretest's output for amount of store used to show just the code
112 space, because the remainder (the data block) varies in size between 32-bit and
113 64-bit systems.
114
115 2. Added an extra argument to pcre_exec() to supply an offset in the subject to
116 start matching at. This allows lookbehinds to work when searching for multiple
117 occurrences in a string.
118
119 3. Added additional options to pcretest for testing multiple occurrences:
120
121 /+ outputs the rest of the string that follows a match
122 /g loops for multiple occurrences, using the new startoffset argument
123 /G loops for multiple occurrences by passing an incremented pointer
124
125 4. PCRE wasn't doing the "first character" optimization for patterns starting
126 with \b or \B, though it was doing it for other lookbehind assertions. That is,
127 it wasn't noticing that a match for a pattern such as /\bxyz/ has to start with
128 the letter 'x'. On long subject strings, this gives a significant speed-up.
129
130
131 Version 2.05 21-Apr-99
132 ----------------------
133
134 1. Changed the type of magic_number from int to long int so that it works
135 properly on 16-bit systems.
136
137 2. Fixed a bug which caused patterns starting with .* not to work correctly
138 when the subject string contained newline characters. PCRE was assuming
139 anchoring for such patterns in all cases, which is not correct because .* will
140 not pass a newline unless PCRE_DOTALL is set. It now assumes anchoring only if
141 DOTALL is set at top level; otherwise it knows that patterns starting with .*
142 must be retried after every newline in the subject.
143
144
145 Version 2.04 18-Feb-99
146 ----------------------
147
148 1. For parenthesized subpatterns with repeats whose minimum was zero, the
149 computation of the store needed to hold the pattern was incorrect (too large).
150 If such patterns were nested a few deep, this could multiply and become a real
151 problem.
152
153 2. Added /M option to pcretest to show the memory requirement of a specific
154 pattern. Made -m a synonym of -s (which does this globally) for compatibility.
155
156 3. Subpatterns of the form (regex){n,m} (i.e. limited maximum) were being
157 compiled in such a way that the backtracking after subsequent failure was
158 pessimal. Something like (a){0,3} was compiled as (a)?(a)?(a)? instead of
159 ((a)((a)(a)?)?)? with disastrous performance if the maximum was of any size.
160
161
162 Version 2.03 02-Feb-99
163 ----------------------
164
165 1. Fixed typo and small mistake in man page.
166
167 2. Added 4th condition (GPL supersedes if conflict) and created separate
168 LICENCE file containing the conditions.
169
170 3. Updated pcretest so that patterns such as /abc\/def/ work like they do in
171 Perl, that is the internal \ allows the delimiter to be included in the
172 pattern. Locked out the use of \ as a delimiter. If \ immediately follows
173 the final delimiter, add \ to the end of the pattern (to test the error).
174
175 4. Added the convenience functions for extracting substrings after a successful
176 match. Updated pcretest to make it able to test these functions.
177
178
179 Version 2.02 14-Jan-99
180 ----------------------
181
182 1. Initialized the working variables associated with each extraction so that
183 their saving and restoring doesn't refer to uninitialized store.
184
185 2. Put dummy code into study.c in order to trick the optimizer of the IBM C
186 compiler for OS/2 into generating correct code. Apparently IBM isn't going to
187 fix the problem.
188
189 3. Pcretest: the timing code wasn't using LOOPREPEAT for timing execution
190 calls, and wasn't printing the correct value for compiling calls. Increased the
191 default value of LOOPREPEAT, and the number of significant figures in the
192 times.
193
194 4. Changed "/bin/rm" in the Makefile to "-rm" so it works on Windows NT.
195
196 5. Renamed "deftables" as "dftables" to get it down to 8 characters, to avoid
197 a building problem on Windows NT with a FAT file system.
198
199
200 Version 2.01 21-Oct-98
201 ----------------------
202
203 1. Changed the API for pcre_compile() to allow for the provision of a pointer
204 to character tables built by pcre_maketables() in the current locale. If NULL
205 is passed, the default tables are used.
206
207
208 Version 2.00 24-Sep-98
209 ----------------------
210
211 1. Since the (>?) facility is in Perl 5.005, don't require PCRE_EXTRA to enable
212 it any more.
213
214 2. Allow quantification of (?>) groups, and make it work correctly.
215
216 3. The first character computation wasn't working for (?>) groups.
217
218 4. Correct the implementation of \Z (it is permitted to match on the \n at the
219 end of the subject) and add 5.005's \z, which really does match only at the
220 very end of the subject.
221
222 5. Remove the \X "cut" facility; Perl doesn't have it, and (?> is neater.
223
224 6. Remove the ability to specify CASELESS, MULTILINE, DOTALL, and
225 DOLLAR_END_ONLY at runtime, to make it possible to implement the Perl 5.005
226 localized options. All options to pcre_study() were also removed.
227
228 7. Add other new features from 5.005:
229
230 $(?<= positive lookbehind
231 $(?<! negative lookbehind
232 (?imsx-imsx) added the unsetting capability
233 such a setting is global if at outer level; local otherwise
234 (?imsx-imsx:) non-capturing groups with option setting
235 (?(cond)re|re) conditional pattern matching
236
237 A backreference to itself in a repeated group matches the previous
238 captured string.
239
240 8. General tidying up of studying (both automatic and via "study")
241 consequential on the addition of new assertions.
242
243 9. As in 5.005, unlimited repeated groups that could match an empty substring
244 are no longer faulted at compile time. Instead, the loop is forcibly broken at
245 runtime if any iteration does actually match an empty substring.
246
247 10. Include the RunTest script in the distribution.
248
249 11. Added tests from the Perl 5.005_02 distribution. This showed up a few
250 discrepancies, some of which were old and were also with respect to 5.004. They
251 have now been fixed.
252
253
254 Version 1.09 28-Apr-98
255 ----------------------
256
257 1. A negated single character class followed by a quantifier with a minimum
258 value of one (e.g. [^x]{1,6} ) was not compiled correctly. This could lead to
259 program crashes, or just wrong answers. This did not apply to negated classes
260 containing more than one character, or to minima other than one.
261
262
263 Version 1.08 27-Mar-98
264 ----------------------
265
266 1. Add PCRE_UNGREEDY to invert the greediness of quantifiers.
267
268 2. Add (?U) and (?X) to set PCRE_UNGREEDY and PCRE_EXTRA respectively. The
269 latter must appear before anything that relies on it in the pattern.
270
271
272 Version 1.07 16-Feb-98
273 ----------------------
274
275 1. A pattern such as /((a)*)*/ was not being diagnosed as in error (unlimited
276 repeat of a potentially empty string).
277
278
279 Version 1.06 23-Jan-98
280 ----------------------
281
282 1. Added Markus Oberhumer's little patches for C++.
283
284 2. Literal strings longer than 255 characters were broken.
285
286
287 Version 1.05 23-Dec-97
288 ----------------------
289
290 1. Negated character classes containing more than one character were failing if
291 PCRE_CASELESS was set at run time.
292
293
294 Version 1.04 19-Dec-97
295 ----------------------
296
297 1. Corrected the man page, where some "const" qualifiers had been omitted.
298
299 2. Made debugging output print "{0,xxx}" instead of just "{,xxx}" to agree with
300 input syntax.
301
302 3. Fixed memory leak which occurred when a regex with back references was
303 matched with an offsets vector that wasn't big enough. The temporary memory
304 that is used in this case wasn't being freed if the match failed.
305
306 4. Tidied pcretest to ensure it frees memory that it gets.
307
308 5. Temporary memory was being obtained in the case where the passed offsets
309 vector was exactly big enough.
310
311 6. Corrected definition of offsetof() from change 5 below.
312
313 7. I had screwed up change 6 below and broken the rules for the use of
314 setjmp(). Now fixed.
315
316
317 Version 1.03 18-Dec-97
318 ----------------------
319
320 1. A erroneous regex with a missing opening parenthesis was correctly
321 diagnosed, but PCRE attempted to access brastack[-1], which could cause crashes
322 on some systems.
323
324 2. Replaced offsetof(real_pcre, code) by offsetof(real_pcre, code[0]) because
325 it was reported that one broken compiler failed on the former because "code" is
326 also an independent variable.
327
328 3. The erroneous regex a[]b caused an array overrun reference.
329
330 4. A regex ending with a one-character negative class (e.g. /[^k]$/) did not
331 fail on data ending with that character. (It was going on too far, and checking
332 the next character, typically a binary zero.) This was specific to the
333 optimized code for single-character negative classes.
334
335 5. Added a contributed patch from the TIN world which does the following:
336
337 + Add an undef for memmove, in case the the system defines a macro for it.
338
339 + Add a definition of offsetof(), in case there isn't one. (I don't know
340 the reason behind this - offsetof() is part of the ANSI standard - but
341 it does no harm).
342
343 + Reduce the ifdef's in pcre.c using macro DPRINTF, thereby eliminating
344 most of the places where whitespace preceded '#'. I have given up and
345 allowed the remaining 2 cases to be at the margin.
346
347 + Rename some variables in pcre to eliminate shadowing. This seems very
348 pedantic, but does no harm, of course.
349
350 6. Moved the call to setjmp() into its own function, to get rid of warnings
351 from gcc -Wall, and avoided calling it at all unless PCRE_EXTRA is used.
352
353 7. Constructs such as \d{8,} were compiling into the equivalent of
354 \d{8}\d{0,65527} instead of \d{8}\d* which didn't make much difference to the
355 outcome, but in this particular case used more store than had been allocated,
356 which caused the bug to be discovered because it threw up an internal error.
357
358 8. The debugging code in both pcre and pcretest for outputting the compiled
359 form of a regex was going wrong in the case of back references followed by
360 curly-bracketed repeats.
361
362
363 Version 1.02 12-Dec-97
364 ----------------------
365
366 1. Typos in pcre.3 and comments in the source fixed.
367
368 2. Applied a contributed patch to get rid of places where it used to remove
369 'const' from variables, and fixed some signed/unsigned and uninitialized
370 variable warnings.
371
372 3. Added the "runtest" target to Makefile.
373
374 4. Set default compiler flag to -O2 rather than just -O.
375
376
377 Version 1.01 19-Nov-97
378 ----------------------
379
380 1. PCRE was failing to diagnose unlimited repeat of empty string for patterns
381 like /([ab]*)*/, that is, for classes with more than one character in them.
382
383 2. Likewise, it wasn't diagnosing patterns with "once-only" subpatterns, such
384 as /((?>a*))*/ (a PCRE_EXTRA facility).
385
386
387 Version 1.00 18-Nov-97
388 ----------------------
389
390 1. Added compile-time macros to support systems such as SunOS4 which don't have
391 memmove() or strerror() but have other things that can be used instead.
392
393 2. Arranged that "make clean" removes the executables.
394
395
396 Version 0.99 27-Oct-97
397 ----------------------
398
399 1. Fixed bug in code for optimizing classes with only one character. It was
400 initializing a 32-byte map regardless, which could cause it to run off the end
401 of the memory it had got.
402
403 2. Added, conditional on PCRE_EXTRA, the proposed (?>REGEX) construction.
404
405
406 Version 0.98 22-Oct-97
407 ----------------------
408
409 1. Fixed bug in code for handling temporary memory usage when there are more
410 back references than supplied space in the ovector. This could cause segfaults.
411
412
413 Version 0.97 21-Oct-97
414 ----------------------
415
416 1. Added the \X "cut" facility, conditional on PCRE_EXTRA.
417
418 2. Optimized negated single characters not to use a bit map.
419
420 3. Brought error texts together as macro definitions; clarified some of them;
421 fixed one that was wrong - it said "range out of order" when it meant "invalid
422 escape sequence".
423
424 4. Changed some char * arguments to const char *.
425
426 5. Added PCRE_NOTBOL and PCRE_NOTEOL (from POSIX).
427
428 6. Added the POSIX-style API wrapper in pcreposix.a and testing facilities in
429 pcretest.
430
431
432 Version 0.96 16-Oct-97
433 ----------------------
434
435 1. Added a simple "pgrep" utility to the distribution.
436
437 2. Fixed an incompatibility with Perl: "{" is now treated as a normal character
438 unless it appears in one of the precise forms "{ddd}", "{ddd,}", or "{ddd,ddd}"
439 where "ddd" means "one or more decimal digits".
440
441 3. Fixed serious bug. If a pattern had a back reference, but the call to
442 pcre_exec() didn't supply a large enough ovector to record the related
443 identifying subpattern, the match always failed. PCRE now remembers the number
444 of the largest back reference, and gets some temporary memory in which to save
445 the offsets during matching if necessary, in order to ensure that
446 backreferences always work.
447
448 4. Increased the compatibility with Perl in a number of ways:
449
450 (a) . no longer matches \n by default; an option PCRE_DOTALL is provided
451 to request this handling. The option can be set at compile or exec time.
452
453 (b) $ matches before a terminating newline by default; an option
454 PCRE_DOLLAR_ENDONLY is provided to override this (but not in multiline
455 mode). The option can be set at compile or exec time.
456
457 (c) The handling of \ followed by a digit other than 0 is now supposed to be
458 the same as Perl's. If the decimal number it represents is less than 10
459 or there aren't that many previous left capturing parentheses, an octal
460 escape is read. Inside a character class, it's always an octal escape,
461 even if it is a single digit.
462
463 (d) An escaped but undefined alphabetic character is taken as a literal,
464 unless PCRE_EXTRA is set. Currently this just reserves the remaining
465 escapes.
466
467 (e) {0} is now permitted. (The previous item is removed from the compiled
468 pattern).
469
470 5. Changed all the names of code files so that the basic parts are no longer
471 than 10 characters, and abolished the teeny "globals.c" file.
472
473 6. Changed the handling of character classes; they are now done with a 32-byte
474 bit map always.
475
476 7. Added the -d and /D options to pcretest to make it possible to look at the
477 internals of compilation without having to recompile pcre.
478
479
480 Version 0.95 23-Sep-97
481 ----------------------
482
483 1. Fixed bug in pre-pass concerning escaped "normal" characters such as \x5c or
484 \x20 at the start of a run of normal characters. These were being treated as
485 real characters, instead of the source characters being re-checked.
486
487
488 Version 0.94 18-Sep-97
489 ----------------------
490
491 1. The functions are now thread-safe, with the caveat that the global variables
492 containing pointers to malloc() and free() or alternative functions are the
493 same for all threads.
494
495 2. Get pcre_study() to generate a bitmap of initial characters for non-
496 anchored patterns when this is possible, and use it if passed to pcre_exec().
497
498
499 Version 0.93 15-Sep-97
500 ----------------------
501
502 1. /(b)|(:+)/ was computing an incorrect first character.
503
504 2. Add pcre_study() to the API and the passing of pcre_extra to pcre_exec(),
505 but not actually doing anything yet.
506
507 3. Treat "-" characters in classes that cannot be part of ranges as literals,
508 as Perl does (e.g. [-az] or [az-]).
509
510 4. Set the anchored flag if a branch starts with .* or .*? because that tests
511 all possible positions.
512
513 5. Split up into different modules to avoid including unneeded functions in a
514 compiled binary. However, compile and exec are still in one module. The "study"
515 function is split off.
516
517 6. The character tables are now in a separate module whose source is generated
518 by an auxiliary program - but can then be edited by hand if required. There are
519 now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or
520 toupper() in the code.
521
522 7. Turn the malloc/free funtions variables into pcre_malloc and pcre_free and
523 make them global. Abolish the function for setting them, as the caller can now
524 set them directly.
525
526
527 Version 0.92 11-Sep-97
528 ----------------------
529
530 1. A repeat with a fixed maximum and a minimum of 1 for an ordinary character
531 (e.g. /a{1,3}/) was broken (I mis-optimized it).
532
533 2. Caseless matching was not working in character classes if the characters in
534 the pattern were in upper case.
535
536 3. Make ranges like [W-c] work in the same way as Perl for caseless matching.
537
538 4. Make PCRE_ANCHORED public and accept as a compile option.
539
540 5. Add an options word to pcre_exec() and accept PCRE_ANCHORED and
541 PCRE_CASELESS at run time. Add escapes \A and \I to pcretest to cause it to
542 pass them.
543
544 6. Give an error if bad option bits passed at compile or run time.
545
546 7. Add PCRE_MULTILINE at compile and exec time, and (?m) as well. Add \M to
547 pcretest to cause it to pass that flag.
548
549 8. Add pcre_info(), to get the number of identifying subpatterns, the stored
550 options, and the first character, if set.
551
552 9. Recognize C+ or C{n,m} where n >= 1 as providing a fixed starting character.
553
554
555 Version 0.91 10-Sep-97
556 ----------------------
557
558 1. PCRE was failing to diagnose unlimited repeats of subpatterns that could
559 match the empty string as in /(a*)*/. It was looping and ultimately crashing.
560
561 2. PCRE was looping on encountering an indefinitely repeated back reference to
562 a subpattern that had matched an empty string, e.g. /(a|)\1*/. It now does what
563 Perl does - treats the match as successful.
564
565 ****

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12