--- code/trunk/ChangeLog 2007/04/17 15:55:53 152 +++ code/trunk/ChangeLog 2007/08/15 14:20:05 215 @@ -1,7 +1,188 @@ ChangeLog for PCRE ------------------ -Version 7.1 12-Mar-07 +Version 7.3 09-Aug-07 +--------------------- + + 1. In the rejigging of the build system that eventually resulted in 7.1, the + line "#include " was included in pcre_internal.h. The use of angle + brackets there is not right, since it causes compilers to look for an + installed pcre.h, not the version that is in the source that is being + compiled (which of course may be different). I have changed it back to: + + #include "pcre.h" + + I have a vague recollection that the change was concerned with compiling in + different directories, but in the new build system, that is taken care of + by the VPATH setting the Makefile. + + 2. The pattern .*$ when run in not-DOTALL UTF-8 mode with newline=any failed + when the subject happened to end in the byte 0x85 (e.g. if the last + character was \x{1ec5}). *Character* 0x85 is one of the "any" newline + characters but of course it shouldn't be taken as a newline when it is part + of another character. The bug was that, for an unlimited repeat of . in + not-DOTALL UTF-8 mode, PCRE was advancing by bytes rather than by + characters when looking for a newline. + + 3. A small performance improvement in the DOTALL UTF-8 mode .* case. + + 4. Debugging: adjusted the names of opcodes for different kinds of parentheses + in debug output. + + 5. Arrange to use "%I64d" instead of "%lld" and "%I64u" instead of "%llu" for + long printing in the pcrecpp unittest when running under MinGW. + + 6. ESC_K was left out of the EBCDIC table. + + 7. Change 7.0/38 introduced a new limit on the number of nested non-capturing + parentheses; I made it 1000, which seemed large enough. Unfortunately, the + limit also applies to "virtual nesting" when a pattern is recursive, and in + this case 1000 isn't so big. I have been able to remove this limit at the + expense of backing off one optimization in certain circumstances. Normally, + when pcre_exec() would call its internal match() function recursively and + immediately return the result unconditionally, it uses a "tail recursion" + feature to save stack. However, when a subpattern that can match an empty + string has an unlimited repetition quantifier, it no longer makes this + optimization. That gives it a stack frame in which to save the data for + checking that an empty string has been matched. Previously this was taken + from the 1000-entry workspace that had been reserved. So now there is no + explicit limit, but more stack is used. + + 8. Applied Daniel's patches to solve problems with the import/export magic + syntax that is required for Windows, and which was going wrong for the + pcreposix and pcrecpp parts of the library. These were overlooked when this + problem was solved for the main library. + + 9. There were some crude static tests to avoid integer overflow when computing + the size of patterns that contain repeated groups with explicit upper + limits. As the maximum quantifier is 65535, the maximum group length was + set at 30,000 so that the product of these two numbers did not overflow a + 32-bit integer. However, it turns out that people want to use groups that + are longer than 30,000 bytes (though not repeat them that many times). + Change 7.0/17 (the refactoring of the way the pattern size is computed) has + made it possible to implement the integer overflow checks in a much more + dynamic way, which I have now done. The artificial limitation on group + length has been removed - we now have only the limit on the total length of + the compiled pattern, which depends on the LINK_SIZE setting. + +10. Fixed a bug in the documentation for get/copy named substring when + duplicate names are permitted. If none of the named substrings are set, the + functions return PCRE_ERROR_NOSUBSTRING (7); the doc said they returned an + empty string. + +11. Because Perl interprets \Q...\E at a high level, and ignores orphan \E + instances, patterns such as [\Q\E] or [\E] or even [^\E] cause an error, + because the ] is interpreted as the first data character and the + terminating ] is not found. PCRE has been made compatible with Perl in this + regard. Previously, it interpreted [\Q\E] as an empty class, and [\E] could + cause memory overwriting. + +10. Like Perl, PCRE automatically breaks an unlimited repeat after an empty + string has been matched (to stop an infinite loop). It was not recognizing + a conditional subpattern that could match an empty string if that + subpattern was within another subpattern. For example, it looped when + trying to match (((?(1)X|))*) but it was OK with ((?(1)X|)*) where the + condition was not nested. This bug has been fixed. + +12. A pattern like \X?\d or \P{L}?\d in non-UTF-8 mode could cause a backtrack + past the start of the subject in the presence of bytes with the top bit + set, for example "\x8aBCD". + +13. Added Perl 5.10 experimental backtracking controls (*FAIL), (*F), (*PRUNE), + (*SKIP), (*THEN), (*COMMIT), and (*ACCEPT). + +14. Optimized (?!) to (*FAIL). + +15. Updated the test for a valid UTF-8 string to conform to the later RFC 3629. + This restricts code points to be within the range 0 to 0x10FFFF, excluding + the "low surrogate" sequence 0xD800 to 0xDFFF. Previously, PCRE allowed the + full range 0 to 0x7FFFFFFF, as defined by RFC 2279. Internally, it still + does: it's just the validity check that is more restrictive. + +16. Inserted checks for integer overflows during escape sequence (backslash) + processing, and also fixed erroneous offset values for syntax errors during + backslash processing. + +17. Fixed another case of looking too far back in non-UTF-8 mode (cf 12 above) + for patterns like [\PPP\x8a]{1,}\x80 with the subject "A\x80". + +18. An unterminated class in a pattern like (?1)\c[ with a "forward reference" + caused an overrun. + + +Version 7.2 19-Jun-07 +--------------------- + + 1. If the fr_FR locale cannot be found for test 3, try the "french" locale, + which is apparently normally available under Windows. + + 2. Re-jig the pcregrep tests with different newline settings in an attempt + to make them independent of the local environment's newline setting. + + 3. Add code to configure.ac to remove -g from the CFLAGS default settings. + + 4. Some of the "internals" tests were previously cut out when the link size + was not 2, because the output contained actual offsets. The recent new + "Z" feature of pcretest means that these can be cut out, making the tests + usable with all link sizes. + + 5. Implemented Stan Switzer's goto replacement for longjmp() when not using + stack recursion. This gives a massive performance boost under BSD, but just + a small improvement under Linux. However, it saves one field in the frame + in all cases. + + 6. Added more features from the forthcoming Perl 5.10: + + (a) (?-n) (where n is a string of digits) is a relative subroutine or + recursion call. It refers to the nth most recently opened parentheses. + + (b) (?+n) is also a relative subroutine call; it refers to the nth next + to be opened parentheses. + + (c) Conditions that refer to capturing parentheses can be specified + relatively, for example, (?(-2)... or (?(+3)... + + (d) \K resets the start of the current match so that everything before + is not part of it. + + (e) \k{name} is synonymous with \k and \k'name' (.NET compatible). + + (f) \g{name} is another synonym - part of Perl 5.10's unification of + reference syntax. + + (g) (?| introduces a group in which the numbering of parentheses in each + alternative starts with the same number. + + (h) \h, \H, \v, and \V match horizontal and vertical whitespace. + + 7. Added two new calls to pcre_fullinfo(): PCRE_INFO_OKPARTIAL and + PCRE_INFO_JCHANGED. + + 8. A pattern such as (.*(.)?)* caused pcre_exec() to fail by either not + terminating or by crashing. Diagnosed by Viktor Griph; it was in the code + for detecting groups that can match an empty string. + + 9. A pattern with a very large number of alternatives (more than several + hundred) was running out of internal workspace during the pre-compile + phase, where pcre_compile() figures out how much memory will be needed. A + bit of new cunning has reduced the workspace needed for groups with + alternatives. The 1000-alternative test pattern now uses 12 bytes of + workspace instead of running out of the 4096 that are available. + +10. Inserted some missing (unsigned int) casts to get rid of compiler warnings. + +11. Applied patch from Google to remove an optimization that didn't quite work. + The report of the bug said: + + pcrecpp::RE("a*").FullMatch("aaa") matches, while + pcrecpp::RE("a*?").FullMatch("aaa") does not, and + pcrecpp::RE("a*?\\z").FullMatch("aaa") does again. + +12. If \p or \P was used in non-UTF-8 mode on a character greater than 127 + it matched the wrong number of bytes. + + +Version 7.1 24-Apr-07 --------------------- 1. Applied Bob Rossi and Daniel G's patches to convert the build system to one @@ -149,6 +330,8 @@ 23. Added some casts to kill warnings from HP-UX ia64 compiler. +24. Added a man page for pcre-config. + Version 7.0 19-Dec-06 ---------------------