| 1 |
ChangeLog for PCRE |
ChangeLog for PCRE |
| 2 |
------------------ |
------------------ |
| 3 |
|
|
| 4 |
|
Version 4.3 21-May-03 |
| 5 |
|
--------------------- |
| 6 |
|
|
| 7 |
|
1. Two instances of @WIN_PREFIX@ omitted from the Windows targets in the |
| 8 |
|
Makefile. |
| 9 |
|
|
| 10 |
|
2. Some refactoring to improve the quality of the code: |
| 11 |
|
|
| 12 |
|
(i) The utf8_table... variables are now declared "const". |
| 13 |
|
|
| 14 |
|
(ii) The code for \cx, which used the "case flipping" table to upper case |
| 15 |
|
lower case letters, now just substracts 32. This is ASCII-specific, |
| 16 |
|
but the whole concept of \cx is ASCII-specific, so it seems |
| 17 |
|
reasonable. |
| 18 |
|
|
| 19 |
|
(iii) PCRE was using its character types table to recognize decimal and |
| 20 |
|
hexadecimal digits in the pattern. This is silly, because it handles |
| 21 |
|
only 0-9, a-f, and A-F, but the character types table is locale- |
| 22 |
|
specific, which means strange things might happen. A private |
| 23 |
|
table is now used for this - though it costs 256 bytes, a table is |
| 24 |
|
much faster than multiple explicit tests. Of course, the standard |
| 25 |
|
character types table is still used for matching digits in subject |
| 26 |
|
strings against \d. |
| 27 |
|
|
| 28 |
|
(iv) Strictly, the identifier ESC_t is reserved by POSIX (all identifiers |
| 29 |
|
ending in _t are). So I've renamed it as ESC_tee. |
| 30 |
|
|
| 31 |
|
3. The first argument for regexec() in the POSIX wrapper should have been |
| 32 |
|
defined as "const". |
| 33 |
|
|
| 34 |
|
4. Changed pcretest to use malloc() for its buffers so that they can be |
| 35 |
|
Electric Fenced for debugging. |
| 36 |
|
|
| 37 |
|
5. There were several places in the code where, in UTF-8 mode, PCRE would try |
| 38 |
|
to read one or more bytes before the start of the subject string. Often this |
| 39 |
|
had no effect on PCRE's behaviour, but in some circumstances it could |
| 40 |
|
provoke a segmentation fault. |
| 41 |
|
|
| 42 |
|
6. A lookbehind at the start of a pattern in UTF-8 mode could also cause PCRE |
| 43 |
|
to try to read one or more bytes before the start of the subject string. |
| 44 |
|
|
| 45 |
|
7. A lookbehind in a pattern matched in non-UTF-8 mode on a PCRE compiled with |
| 46 |
|
UTF-8 support could misbehave in various ways if the subject string |
| 47 |
|
contained bytes with the 0x80 bit set and the 0x40 bit unset in a lookbehind |
| 48 |
|
area. (PCRE was not checking for the UTF-8 mode flag, and trying to move |
| 49 |
|
back over UTF-8 characters.) |
| 50 |
|
|
| 51 |
|
|
| 52 |
Version 4.2 14-Apr-03 |
Version 4.2 14-Apr-03 |
| 53 |
--------------------- |
--------------------- |
| 54 |
|
|