| 1 |
News about PCRE releases
|
| 2 |
------------------------
|
| 3 |
|
| 4 |
Release 8.33 28-May-2013
|
| 5 |
--------------------------
|
| 6 |
|
| 7 |
A number of bugs are fixed, and some performance improvements have been made.
|
| 8 |
There are also some new features, of which these are the most important:
|
| 9 |
|
| 10 |
. The behaviour of the backtracking verbs has been rationalized and
|
| 11 |
documented in more detail.
|
| 12 |
|
| 13 |
. JIT now supports callouts and all of the backtracking verbs.
|
| 14 |
|
| 15 |
. Unicode validation has been updated in the light of Unicode Corrigendum #9,
|
| 16 |
which points out that "non characters" are not "characters that may not
|
| 17 |
appear in Unicode strings" but rather "characters that are reserved for
|
| 18 |
internal use and have only local meaning".
|
| 19 |
|
| 20 |
. (*LIMIT_MATCH=d) and (*LIMIT_RECURSION=d) have been added so that the
|
| 21 |
creator of a pattern can specify lower (but not higher) limits for the
|
| 22 |
matching process.
|
| 23 |
|
| 24 |
. The PCRE_NEVER_UTF option is available to prevent pattern-writers from using
|
| 25 |
the (*UTF) feature, as this could be a security issue.
|
| 26 |
|
| 27 |
|
| 28 |
Release 8.32 30-November-2012
|
| 29 |
-----------------------------
|
| 30 |
|
| 31 |
This release fixes a number of bugs, but also has some new features. These are
|
| 32 |
the highlights:
|
| 33 |
|
| 34 |
. There is now support for 32-bit character strings and UTF-32. Like the
|
| 35 |
16-bit support, this is done by compiling a separate 32-bit library.
|
| 36 |
|
| 37 |
. \X now matches a Unicode extended grapheme cluster.
|
| 38 |
|
| 39 |
. Case-independent matching of Unicode characters that have more than one
|
| 40 |
"other case" now makes all three (or more) characters equivalent. This
|
| 41 |
applies, for example, to Greek Sigma, which has two lowercase versions.
|
| 42 |
|
| 43 |
. Unicode character properties are updated to Unicode 6.2.0.
|
| 44 |
|
| 45 |
. The EBCDIC support, which had decayed, has had a spring clean.
|
| 46 |
|
| 47 |
. A number of JIT optimizations have been added, which give faster JIT
|
| 48 |
execution speed. In addition, a new direct interface to JIT execution is
|
| 49 |
available. This bypasses some of the sanity checks of pcre_exec() to give a
|
| 50 |
noticeable speed-up.
|
| 51 |
|
| 52 |
. A number of issues in pcregrep have been fixed, making it more compatible
|
| 53 |
with GNU grep. In particular, --exclude and --include (and variants) apply
|
| 54 |
to all files now, not just those obtained from scanning a directory
|
| 55 |
recursively. In Windows environments, the default action for directories is
|
| 56 |
now "skip" instead of "read" (which provokes an error).
|
| 57 |
|
| 58 |
. If the --only-matching (-o) option in pcregrep is specified multiple
|
| 59 |
times, each one causes appropriate output. For example, -o1 -o2 outputs the
|
| 60 |
substrings matched by the 1st and 2nd capturing parentheses. A separating
|
| 61 |
string can be specified by --om-separator (default empty).
|
| 62 |
|
| 63 |
. When PCRE is built via Autotools using a version of gcc that has the
|
| 64 |
"visibility" feature, it is used to hide internal library functions that are
|
| 65 |
not part of the public API.
|
| 66 |
|
| 67 |
|
| 68 |
Release 8.31 06-July-2012
|
| 69 |
-------------------------
|
| 70 |
|
| 71 |
This is mainly a bug-fixing release, with a small number of developments:
|
| 72 |
|
| 73 |
. The JIT compiler now supports partial matching and the (*MARK) and
|
| 74 |
(*COMMIT) verbs.
|
| 75 |
|
| 76 |
. PCRE_INFO_MAXLOOKBEHIND can be used to find the longest lookbehind in a
|
| 77 |
pattern.
|
| 78 |
|
| 79 |
. There should be a performance improvement when using the heap instead of the
|
| 80 |
stack for recursion.
|
| 81 |
|
| 82 |
. pcregrep can now be linked with libedit as an alternative to libreadline.
|
| 83 |
|
| 84 |
. pcregrep now has a --file-list option where the list of files to scan is
|
| 85 |
given as a file.
|
| 86 |
|
| 87 |
. pcregrep now recognizes binary files and there are related options.
|
| 88 |
|
| 89 |
. The Unicode tables have been updated to 6.1.0.
|
| 90 |
|
| 91 |
As always, the full list of changes is in the ChangeLog file.
|
| 92 |
|
| 93 |
|
| 94 |
Release 8.30 04-February-2012
|
| 95 |
-----------------------------
|
| 96 |
|
| 97 |
Release 8.30 introduces a major new feature: support for 16-bit character
|
| 98 |
strings, compiled as a separate library. There are a few changes to the
|
| 99 |
8-bit library, in addition to some bug fixes.
|
| 100 |
|
| 101 |
. The pcre_info() function, which has been obsolete for over 10 years, has
|
| 102 |
been removed.
|
| 103 |
|
| 104 |
. When a compiled pattern was saved to a file and later reloaded on a host
|
| 105 |
with different endianness, PCRE used automatically to swap the bytes in some
|
| 106 |
of the data fields. With the advent of the 16-bit library, where more of this
|
| 107 |
swapping is needed, it is no longer done automatically. Instead, the bad
|
| 108 |
endianness is detected and a specific error is given. The user can then call
|
| 109 |
a new function called pcre_pattern_to_host_byte_order() (or an equivalent
|
| 110 |
16-bit function) to do the swap.
|
| 111 |
|
| 112 |
. In UTF-8 mode, the values 0xd800 to 0xdfff are not legal Unicode
|
| 113 |
code points and are now faulted. (They are the so-called "surrogates"
|
| 114 |
that are reserved for coding high values in UTF-16.)
|
| 115 |
|
| 116 |
|
| 117 |
Release 8.21 12-Dec-2011
|
| 118 |
------------------------
|
| 119 |
|
| 120 |
This is almost entirely a bug-fix release. The only new feature is the ability
|
| 121 |
to obtain the size of the memory used by the JIT compiler.
|
| 122 |
|
| 123 |
|
| 124 |
Release 8.20 21-Oct-2011
|
| 125 |
------------------------
|
| 126 |
|
| 127 |
The main change in this release is the inclusion of Zoltan Herczeg's
|
| 128 |
just-in-time compiler support, which can be accessed by building PCRE with
|
| 129 |
--enable-jit. Large performance benefits can be had in many situations. 8.20
|
| 130 |
also fixes an unfortunate bug that was introduced in 8.13 as well as tidying up
|
| 131 |
a number of infelicities and differences from Perl.
|
| 132 |
|
| 133 |
|
| 134 |
Release 8.13 16-Aug-2011
|
| 135 |
------------------------
|
| 136 |
|
| 137 |
This is mainly a bug-fix release. There has been a lot of internal refactoring.
|
| 138 |
The Unicode tables have been updated. The only new feature in the library is
|
| 139 |
the passing of *MARK information to callouts. Some additions have been made to
|
| 140 |
pcretest to make testing easier and more comprehensive. There is a new option
|
| 141 |
for pcregrep to adjust its internal buffer size.
|
| 142 |
|
| 143 |
|
| 144 |
Release 8.12 15-Jan-2011
|
| 145 |
------------------------
|
| 146 |
|
| 147 |
This release fixes some bugs in pcregrep, one of which caused the tests to fail
|
| 148 |
on 64-bit big-endian systems. There are no changes to the code of the library.
|
| 149 |
|
| 150 |
|
| 151 |
Release 8.11 10-Dec-2010
|
| 152 |
------------------------
|
| 153 |
|
| 154 |
A number of bugs in the library and in pcregrep have been fixed. As always, see
|
| 155 |
ChangeLog for details. The following are the non-bug-fix changes:
|
| 156 |
|
| 157 |
. Added --match-limit and --recursion-limit to pcregrep.
|
| 158 |
|
| 159 |
. Added an optional parentheses number to the -o and --only-matching options
|
| 160 |
of pcregrep.
|
| 161 |
|
| 162 |
. Changed the way PCRE_PARTIAL_HARD affects the matching of $, \z, \Z, \b, and
|
| 163 |
\B.
|
| 164 |
|
| 165 |
. Added PCRE_ERROR_SHORTUTF8 to make it possible to distinguish between a
|
| 166 |
bad UTF-8 sequence and one that is incomplete when using PCRE_PARTIAL_HARD.
|
| 167 |
|
| 168 |
. Recognize (*NO_START_OPT) at the start of a pattern to set the PCRE_NO_
|
| 169 |
START_OPTIMIZE option, which is now allowed at compile time
|
| 170 |
|
| 171 |
|
| 172 |
Release 8.10 25-Jun-2010
|
| 173 |
------------------------
|
| 174 |
|
| 175 |
There are two major additions: support for (*MARK) and friends, and the option
|
| 176 |
PCRE_UCP, which changes the behaviour of \b, \d, \s, and \w (and their
|
| 177 |
opposites) so that they make use of Unicode properties. There are also a number
|
| 178 |
of lesser new features, and several bugs have been fixed. A new option,
|
| 179 |
--line-buffered, has been added to pcregrep, for use when it is connected to
|
| 180 |
pipes.
|
| 181 |
|
| 182 |
|
| 183 |
Release 8.02 19-Mar-2010
|
| 184 |
------------------------
|
| 185 |
|
| 186 |
Another bug-fix release.
|
| 187 |
|
| 188 |
|
| 189 |
Release 8.01 19-Jan-2010
|
| 190 |
------------------------
|
| 191 |
|
| 192 |
This is a bug-fix release. Several bugs in the code itself and some bugs and
|
| 193 |
infelicities in the build system have been fixed.
|
| 194 |
|
| 195 |
|
| 196 |
Release 8.00 19-Oct-09
|
| 197 |
----------------------
|
| 198 |
|
| 199 |
Bugs have been fixed in the library and in pcregrep. There are also some
|
| 200 |
enhancements. Restrictions on patterns used for partial matching have been
|
| 201 |
removed, extra information is given for partial matches, the partial matching
|
| 202 |
process has been improved, and an option to make a partial match override a
|
| 203 |
full match is available. The "study" process has been enhanced by finding a
|
| 204 |
lower bound matching length. Groups with duplicate numbers may now have
|
| 205 |
duplicated names without the use of PCRE_DUPNAMES. However, they may not have
|
| 206 |
different names. The documentation has been revised to reflect these changes.
|
| 207 |
The version number has been expanded to 3 digits as it is clear that the rate
|
| 208 |
of change is not slowing down.
|
| 209 |
|
| 210 |
|
| 211 |
Release 7.9 11-Apr-09
|
| 212 |
---------------------
|
| 213 |
|
| 214 |
Mostly bugfixes and tidies with just a couple of minor functional additions.
|
| 215 |
|
| 216 |
|
| 217 |
Release 7.8 05-Sep-08
|
| 218 |
---------------------
|
| 219 |
|
| 220 |
More bug fixes, plus a performance improvement in Unicode character property
|
| 221 |
lookup.
|
| 222 |
|
| 223 |
|
| 224 |
Release 7.7 07-May-08
|
| 225 |
---------------------
|
| 226 |
|
| 227 |
This is once again mainly a bug-fix release, but there are a couple of new
|
| 228 |
features.
|
| 229 |
|
| 230 |
|
| 231 |
Release 7.6 28-Jan-08
|
| 232 |
---------------------
|
| 233 |
|
| 234 |
The main reason for having this release so soon after 7.5 is because it fixes a
|
| 235 |
potential buffer overflow problem in pcre_compile() when run in UTF-8 mode. In
|
| 236 |
addition, the CMake configuration files have been brought up to date.
|
| 237 |
|
| 238 |
|
| 239 |
Release 7.5 10-Jan-08
|
| 240 |
---------------------
|
| 241 |
|
| 242 |
This is mainly a bug-fix release. However the ability to link pcregrep with
|
| 243 |
libz or libbz2 and the ability to link pcretest with libreadline have been
|
| 244 |
added. Also the --line-offsets and --file-offsets options were added to
|
| 245 |
pcregrep.
|
| 246 |
|
| 247 |
|
| 248 |
Release 7.4 21-Sep-07
|
| 249 |
---------------------
|
| 250 |
|
| 251 |
The only change of specification is the addition of options to control whether
|
| 252 |
\R matches any Unicode line ending (the default) or just CR, LF, and CRLF.
|
| 253 |
Otherwise, the changes are bug fixes and a refactoring to reduce the number of
|
| 254 |
relocations needed in a shared library. There have also been some documentation
|
| 255 |
updates, in particular, some more information about using CMake to build PCRE
|
| 256 |
has been added to the NON-UNIX-USE file.
|
| 257 |
|
| 258 |
|
| 259 |
Release 7.3 28-Aug-07
|
| 260 |
---------------------
|
| 261 |
|
| 262 |
Most changes are bug fixes. Some that are not:
|
| 263 |
|
| 264 |
1. There is some support for Perl 5.10's experimental "backtracking control
|
| 265 |
verbs" such as (*PRUNE).
|
| 266 |
|
| 267 |
2. UTF-8 checking is now as per RFC 3629 instead of RFC 2279; this is more
|
| 268 |
restrictive in the strings it accepts.
|
| 269 |
|
| 270 |
3. Checking for potential integer overflow has been made more dynamic, and as a
|
| 271 |
consequence there is no longer a hard limit on the size of a subpattern that
|
| 272 |
has a limited repeat count.
|
| 273 |
|
| 274 |
4. When CRLF is a valid line-ending sequence, pcre_exec() and pcre_dfa_exec()
|
| 275 |
no longer advance by two characters instead of one when an unanchored match
|
| 276 |
fails at CRLF if there are explicit CR or LF matches within the pattern.
|
| 277 |
This gets rid of some anomalous effects that previously occurred.
|
| 278 |
|
| 279 |
5. Some PCRE-specific settings for varying the newline options at the start of
|
| 280 |
a pattern have been added.
|
| 281 |
|
| 282 |
|
| 283 |
Release 7.2 19-Jun-07
|
| 284 |
---------------------
|
| 285 |
|
| 286 |
WARNING: saved patterns that were compiled by earlier versions of PCRE must be
|
| 287 |
recompiled for use with 7.2 (necessitated by the addition of \K, \h, \H, \v,
|
| 288 |
and \V).
|
| 289 |
|
| 290 |
Correction to the notes for 7.1: the note about shared libraries for Windows is
|
| 291 |
wrong. Previously, three libraries were built, but each could function
|
| 292 |
independently. For example, the pcreposix library also included all the
|
| 293 |
functions from the basic pcre library. The change is that the three libraries
|
| 294 |
are no longer independent. They are like the Unix libraries. To use the
|
| 295 |
pcreposix functions, for example, you need to link with both the pcreposix and
|
| 296 |
the basic pcre library.
|
| 297 |
|
| 298 |
Some more features from Perl 5.10 have been added:
|
| 299 |
|
| 300 |
(?-n) and (?+n) relative references for recursion and subroutines.
|
| 301 |
|
| 302 |
(?(-n) and (?(+n) relative references as conditions.
|
| 303 |
|
| 304 |
\k{name} and \g{name} are synonyms for \k<name>.
|
| 305 |
|
| 306 |
\K to reset the start of the matched string; for example, (foo)\Kbar
|
| 307 |
matches bar preceded by foo, but only sets bar as the matched string.
|
| 308 |
|
| 309 |
(?| introduces a group where the capturing parentheses in each alternative
|
| 310 |
start from the same number; for example, (?|(abc)|(xyz)) sets capturing
|
| 311 |
parentheses number 1 in both cases.
|
| 312 |
|
| 313 |
\h, \H, \v, \V match horizontal and vertical whitespace, respectively.
|
| 314 |
|
| 315 |
|
| 316 |
Release 7.1 24-Apr-07
|
| 317 |
---------------------
|
| 318 |
|
| 319 |
There is only one new feature in this release: a linebreak setting of
|
| 320 |
PCRE_NEWLINE_ANYCRLF. It is a cut-down version of PCRE_NEWLINE_ANY, which
|
| 321 |
recognizes only CRLF, CR, and LF as linebreaks.
|
| 322 |
|
| 323 |
A few bugs are fixed (see ChangeLog for details), but the major change is a
|
| 324 |
complete re-implementation of the build system. This now has full Autotools
|
| 325 |
support and so is now "standard" in some sense. It should help with compiling
|
| 326 |
PCRE in a wide variety of environments.
|
| 327 |
|
| 328 |
NOTE: when building shared libraries for Windows, three dlls are now built,
|
| 329 |
called libpcre, libpcreposix, and libpcrecpp. Previously, everything was
|
| 330 |
included in a single dll.
|
| 331 |
|
| 332 |
Another important change is that the dftables auxiliary program is no longer
|
| 333 |
compiled and run at "make" time by default. Instead, a default set of character
|
| 334 |
tables (assuming ASCII coding) is used. If you want to use dftables to generate
|
| 335 |
the character tables as previously, add --enable-rebuild-chartables to the
|
| 336 |
"configure" command. You must do this if you are compiling PCRE to run on a
|
| 337 |
system that uses EBCDIC code.
|
| 338 |
|
| 339 |
There is a discussion about character tables in the README file. The default is
|
| 340 |
not to use dftables so that that there is no problem when cross-compiling.
|
| 341 |
|
| 342 |
|
| 343 |
Release 7.0 19-Dec-06
|
| 344 |
---------------------
|
| 345 |
|
| 346 |
This release has a new major number because there have been some internal
|
| 347 |
upheavals to facilitate the addition of new optimizations and other facilities,
|
| 348 |
and to make subsequent maintenance and extension easier. Compilation is likely
|
| 349 |
to be a bit slower, but there should be no major effect on runtime performance.
|
| 350 |
Previously compiled patterns are NOT upwards compatible with this release. If
|
| 351 |
you have saved compiled patterns from a previous release, you will have to
|
| 352 |
re-compile them. Important changes that are visible to users are:
|
| 353 |
|
| 354 |
1. The Unicode property tables have been updated to Unicode 5.0.0, which adds
|
| 355 |
some more scripts.
|
| 356 |
|
| 357 |
2. The option PCRE_NEWLINE_ANY causes PCRE to recognize any Unicode newline
|
| 358 |
sequence as a newline.
|
| 359 |
|
| 360 |
3. The \R escape matches a single Unicode newline sequence as a single unit.
|
| 361 |
|
| 362 |
4. New features that will appear in Perl 5.10 are now in PCRE. These include
|
| 363 |
alternative Perl syntax for named parentheses, and Perl syntax for
|
| 364 |
recursion.
|
| 365 |
|
| 366 |
5. The C++ wrapper interface has been extended by the addition of a
|
| 367 |
QuoteMeta function and the ability to allow copy construction and
|
| 368 |
assignment.
|
| 369 |
|
| 370 |
For a complete list of changes, see the ChangeLog file.
|
| 371 |
|
| 372 |
|
| 373 |
Release 6.7 04-Jul-06
|
| 374 |
---------------------
|
| 375 |
|
| 376 |
The main additions to this release are the ability to use the same name for
|
| 377 |
multiple sets of parentheses, and support for CRLF line endings in both the
|
| 378 |
library and pcregrep (and in pcretest for testing).
|
| 379 |
|
| 380 |
Thanks to Ian Taylor, the stack usage for many kinds of pattern has been
|
| 381 |
significantly reduced for certain subject strings.
|
| 382 |
|
| 383 |
|
| 384 |
Release 6.5 01-Feb-06
|
| 385 |
---------------------
|
| 386 |
|
| 387 |
Important changes in this release:
|
| 388 |
|
| 389 |
1. A number of new features have been added to pcregrep.
|
| 390 |
|
| 391 |
2. The Unicode property tables have been updated to Unicode 4.1.0, and the
|
| 392 |
supported properties have been extended with script names such as "Arabic",
|
| 393 |
and the derived properties "Any" and "L&". This has necessitated a change to
|
| 394 |
the interal format of compiled patterns. Any saved compiled patterns that
|
| 395 |
use \p or \P must be recompiled.
|
| 396 |
|
| 397 |
3. The specification of recursion in patterns has been changed so that all
|
| 398 |
recursive subpatterns are automatically treated as atomic groups. Thus, for
|
| 399 |
example, (?R) is treated as if it were (?>(?R)). This is necessary because
|
| 400 |
otherwise there are situations where recursion does not work.
|
| 401 |
|
| 402 |
See the ChangeLog for a complete list of changes, which include a number of bug
|
| 403 |
fixes and tidies.
|
| 404 |
|
| 405 |
|
| 406 |
Release 6.0 07-Jun-05
|
| 407 |
---------------------
|
| 408 |
|
| 409 |
The release number has been increased to 6.0 because of the addition of several
|
| 410 |
major new pieces of functionality.
|
| 411 |
|
| 412 |
A new function, pcre_dfa_exec(), which implements pattern matching using a DFA
|
| 413 |
algorithm, has been added. This has a number of advantages for certain cases,
|
| 414 |
though it does run more slowly, and lacks the ability to capture substrings. On
|
| 415 |
the other hand, it does find all matches, not just the first, and it works
|
| 416 |
better for partial matching. The pcrematching man page discusses the
|
| 417 |
differences.
|
| 418 |
|
| 419 |
The pcretest program has been enhanced so that it can make use of the new
|
| 420 |
pcre_dfa_exec() matching function and the extra features it provides.
|
| 421 |
|
| 422 |
The distribution now includes a C++ wrapper library. This is built
|
| 423 |
automatically if a C++ compiler is found. The pcrecpp man page discusses this
|
| 424 |
interface.
|
| 425 |
|
| 426 |
The code itself has been re-organized into many more files, one for each
|
| 427 |
function, so it no longer requires everything to be linked in when static
|
| 428 |
linkage is used. As a consequence, some internal functions have had to have
|
| 429 |
their names exposed. These functions all have names starting with _pcre_. They
|
| 430 |
are undocumented, and are not intended for use by outside callers.
|
| 431 |
|
| 432 |
The pcregrep program has been enhanced with new functionality such as
|
| 433 |
multiline-matching and options for output more matching context. See the
|
| 434 |
ChangeLog for a complete list of changes to the library and the utility
|
| 435 |
programs.
|
| 436 |
|
| 437 |
|
| 438 |
Release 5.0 13-Sep-04
|
| 439 |
---------------------
|
| 440 |
|
| 441 |
The licence under which PCRE is released has been changed to the more
|
| 442 |
conventional "BSD" licence.
|
| 443 |
|
| 444 |
In the code, some bugs have been fixed, and there are also some major changes
|
| 445 |
in this release (which is why I've increased the number to 5.0). Some changes
|
| 446 |
are internal rearrangements, and some provide a number of new facilities. The
|
| 447 |
new features are:
|
| 448 |
|
| 449 |
1. There's an "automatic callout" feature that inserts callouts before every
|
| 450 |
item in the regex, and there's a new callout field that gives the position
|
| 451 |
in the pattern - useful for debugging and tracing.
|
| 452 |
|
| 453 |
2. The extra_data structure can now be used to pass in a set of character
|
| 454 |
tables at exec time. This is useful if compiled regex are saved and re-used
|
| 455 |
at a later time when the tables may not be at the same address. If the
|
| 456 |
default internal tables are used, the pointer saved with the compiled
|
| 457 |
pattern is now set to NULL, which means that you don't need to do anything
|
| 458 |
special unless you are using custom tables.
|
| 459 |
|
| 460 |
3. It is possible, with some restrictions on the content of the regex, to
|
| 461 |
request "partial" matching. A special return code is given if all of the
|
| 462 |
subject string matched part of the regex. This could be useful for testing
|
| 463 |
an input field as it is being typed.
|
| 464 |
|
| 465 |
4. There is now some optional support for Unicode character properties, which
|
| 466 |
means that the patterns items such as \p{Lu} and \X can now be used. Only
|
| 467 |
the general category properties are supported. If PCRE is compiled with this
|
| 468 |
support, an additional 90K data structure is include, which increases the
|
| 469 |
size of the library dramatically.
|
| 470 |
|
| 471 |
5. There is support for saving compiled patterns and re-using them later.
|
| 472 |
|
| 473 |
6. There is support for running regular expressions that were compiled on a
|
| 474 |
different host with the opposite endianness.
|
| 475 |
|
| 476 |
7. The pcretest program has been extended to accommodate the new features.
|
| 477 |
|
| 478 |
The main internal rearrangement is that sequences of literal characters are no
|
| 479 |
longer handled as strings. Instead, each character is handled on its own. This
|
| 480 |
makes some UTF-8 handling easier, and makes the support of partial matching
|
| 481 |
possible. Compiled patterns containing long literal strings will be larger as a
|
| 482 |
result of this change; I hope that performance will not be much affected.
|
| 483 |
|
| 484 |
|
| 485 |
Release 4.5 01-Dec-03
|
| 486 |
---------------------
|
| 487 |
|
| 488 |
Again mainly a bug-fix and tidying release, with only a couple of new features:
|
| 489 |
|
| 490 |
1. It's possible now to compile PCRE so that it does not use recursive
|
| 491 |
function calls when matching. Instead it gets memory from the heap. This slows
|
| 492 |
things down, but may be necessary on systems with limited stacks.
|
| 493 |
|
| 494 |
2. UTF-8 string checking has been tightened to reject overlong sequences and to
|
| 495 |
check that a starting offset points to the start of a character. Failure of the
|
| 496 |
latter returns a new error code: PCRE_ERROR_BADUTF8_OFFSET.
|
| 497 |
|
| 498 |
3. PCRE can now be compiled for systems that use EBCDIC code.
|
| 499 |
|
| 500 |
|
| 501 |
Release 4.4 21-Aug-03
|
| 502 |
---------------------
|
| 503 |
|
| 504 |
This is mainly a bug-fix and tidying release. The only new feature is that PCRE
|
| 505 |
checks UTF-8 strings for validity by default. There is an option to suppress
|
| 506 |
this, just in case anybody wants that teeny extra bit of performance.
|
| 507 |
|
| 508 |
|
| 509 |
Releases 4.1 - 4.3
|
| 510 |
------------------
|
| 511 |
|
| 512 |
Sorry, I forgot about updating the NEWS file for these releases. Please take a
|
| 513 |
look at ChangeLog.
|
| 514 |
|
| 515 |
|
| 516 |
Release 4.0 17-Feb-03
|
| 517 |
---------------------
|
| 518 |
|
| 519 |
There have been a lot of changes for the 4.0 release, adding additional
|
| 520 |
functionality and mending bugs. Below is a list of the highlights of the new
|
| 521 |
functionality. For full details of these features, please consult the
|
| 522 |
documentation. For a complete list of changes, see the ChangeLog file.
|
| 523 |
|
| 524 |
1. Support for Perl's \Q...\E escapes.
|
| 525 |
|
| 526 |
2. "Possessive quantifiers" ?+, *+, ++, and {,}+ which come from Sun's Java
|
| 527 |
package. They provide some syntactic sugar for simple cases of "atomic
|
| 528 |
grouping".
|
| 529 |
|
| 530 |
3. Support for the \G assertion. It is true when the current matching position
|
| 531 |
is at the start point of the match.
|
| 532 |
|
| 533 |
4. A new feature that provides some of the functionality that Perl provides
|
| 534 |
with (?{...}). The facility is termed a "callout". The way it is done in PCRE
|
| 535 |
is for the caller to provide an optional function, by setting pcre_callout to
|
| 536 |
its entry point. To get the function called, the regex must include (?C) at
|
| 537 |
appropriate points.
|
| 538 |
|
| 539 |
5. Support for recursive calls to individual subpatterns. This makes it really
|
| 540 |
easy to get totally confused.
|
| 541 |
|
| 542 |
6. Support for named subpatterns. The Python syntax (?P<name>...) is used to
|
| 543 |
name a group.
|
| 544 |
|
| 545 |
7. Several extensions to UTF-8 support; it is now fairly complete. There is an
|
| 546 |
option for pcregrep to make it operate in UTF-8 mode.
|
| 547 |
|
| 548 |
8. The single man page has been split into a number of separate man pages.
|
| 549 |
These also give rise to individual HTML pages which are put in a separate
|
| 550 |
directory. There is an index.html page that lists them all. Some hyperlinking
|
| 551 |
between the pages has been installed.
|
| 552 |
|
| 553 |
|
| 554 |
Release 3.5 15-Aug-01
|
| 555 |
---------------------
|
| 556 |
|
| 557 |
1. The configuring system has been upgraded to use later versions of autoconf
|
| 558 |
and libtool. By default it builds both a shared and a static library if the OS
|
| 559 |
supports it. You can use --disable-shared or --disable-static on the configure
|
| 560 |
command if you want only one of them.
|
| 561 |
|
| 562 |
2. The pcretest utility is now installed along with pcregrep because it is
|
| 563 |
useful for users (to test regexs) and by doing this, it automatically gets
|
| 564 |
relinked by libtool. The documentation has been turned into a man page, so
|
| 565 |
there are now .1, .txt, and .html versions in /doc.
|
| 566 |
|
| 567 |
3. Upgrades to pcregrep:
|
| 568 |
(i) Added long-form option names like gnu grep.
|
| 569 |
(ii) Added --help to list all options with an explanatory phrase.
|
| 570 |
(iii) Added -r, --recursive to recurse into sub-directories.
|
| 571 |
(iv) Added -f, --file to read patterns from a file.
|
| 572 |
|
| 573 |
4. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
|
| 574 |
script, to force use of CR or LF instead of \n in the source. On non-Unix
|
| 575 |
systems, the value can be set in config.h.
|
| 576 |
|
| 577 |
5. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
|
| 578 |
absolute limit. Changed the text of the error message to make this clear, and
|
| 579 |
likewise updated the man page.
|
| 580 |
|
| 581 |
6. The limit of 99 on the number of capturing subpatterns has been removed.
|
| 582 |
The new limit is 65535, which I hope will not be a "real" limit.
|
| 583 |
|
| 584 |
|
| 585 |
Release 3.3 01-Aug-00
|
| 586 |
---------------------
|
| 587 |
|
| 588 |
There is some support for UTF-8 character strings. This is incomplete and
|
| 589 |
experimental. The documentation describes what is and what is not implemented.
|
| 590 |
Otherwise, this is just a bug-fixing release.
|
| 591 |
|
| 592 |
|
| 593 |
Release 3.0 01-Feb-00
|
| 594 |
---------------------
|
| 595 |
|
| 596 |
1. A "configure" script is now used to configure PCRE for Unix systems. It
|
| 597 |
builds a Makefile, a config.h file, and the pcre-config script.
|
| 598 |
|
| 599 |
2. PCRE is built as a shared library by default.
|
| 600 |
|
| 601 |
3. There is support for POSIX classes such as [:alpha:].
|
| 602 |
|
| 603 |
5. There is an experimental recursion feature.
|
| 604 |
|
| 605 |
----------------------------------------------------------------------------
|
| 606 |
IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00
|
| 607 |
|
| 608 |
Please note that there has been a change in the API such that a larger
|
| 609 |
ovector is required at matching time, to provide some additional workspace.
|
| 610 |
The new man page has details. This change was necessary in order to support
|
| 611 |
some of the new functionality in Perl 5.005.
|
| 612 |
|
| 613 |
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00
|
| 614 |
|
| 615 |
Another (I hope this is the last!) change has been made to the API for the
|
| 616 |
pcre_compile() function. An additional argument has been added to make it
|
| 617 |
possible to pass over a pointer to character tables built in the current
|
| 618 |
locale by pcre_maketables(). To use the default tables, this new argument
|
| 619 |
should be passed as NULL.
|
| 620 |
|
| 621 |
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05
|
| 622 |
|
| 623 |
Yet another (and again I hope this really is the last) change has been made
|
| 624 |
to the API for the pcre_exec() function. An additional argument has been
|
| 625 |
added to make it possible to start the match other than at the start of the
|
| 626 |
subject string. This is important if there are lookbehinds. The new man
|
| 627 |
page has the details, but you just want to convert existing programs, all
|
| 628 |
you need to do is to stick in a new fifth argument to pcre_exec(), with a
|
| 629 |
value of zero. For example, change
|
| 630 |
|
| 631 |
pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize)
|
| 632 |
to
|
| 633 |
pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize)
|
| 634 |
|
| 635 |
****
|