| 115 |
different configurations, and it also runs some of them with valgrind, all of |
different configurations, and it also runs some of them with valgrind, all of |
| 116 |
which can take quite some time. |
which can take quite some time. |
| 117 |
|
|
| 118 |
. Run perltest.pl on the test data for tests 1, 4, 6, 11, and 12. The output |
. Run perltest.pl on the test data for tests 1, 4, and 6. The output |
| 119 |
should match the PCRE test output, apart from the version identification at |
should match the PCRE test output, apart from the version identification at |
| 120 |
the start of each test. The other tests are not Perl-compatible (they use |
the start of each test. The other tests are not Perl-compatible (they use |
| 121 |
various PCRE-specific features or options). |
various PCRE-specific features or options). |
| 180 |
|
|
| 181 |
* "Ends with literal string" - note that a single character doesn't gain much |
* "Ends with literal string" - note that a single character doesn't gain much |
| 182 |
over the existing "required byte" (reqbyte) feature that just remembers one |
over the existing "required byte" (reqbyte) feature that just remembers one |
| 183 |
byte. |
data unit. |
| 184 |
|
|
| 185 |
* These probably need to go in pcre_study(): |
* These probably need to go in pcre_study(): |
| 186 |
|
|
| 187 |
o Remember an initial string rather than just 1 char? |
o Remember an initial string rather than just 1 char? |
| 188 |
|
|
| 189 |
o A required byte from alternatives - not just the last char, but an |
o A required data unit from alternatives - not just the last unit, but an |
| 190 |
earlier one if common to all alternatives. |
earlier one if common to all alternatives. |
| 191 |
|
|
| 192 |
o Friedl contains other ideas. |
o Friedl contains other ideas. |
| 206 |
|
|
| 207 |
. Perl 6 will be a revolution. Is it a revolution too far for PCRE? |
. Perl 6 will be a revolution. Is it a revolution too far for PCRE? |
| 208 |
|
|
|
. Unicode |
|
|
|
|
|
* There has been a request for direct support of 16-bit characters and |
|
|
UTF-16 (Bugzilla #1049). However, since Unicode is moving beyond purely |
|
|
16-bit characters, is this worth it at all? One possible way of handling |
|
|
16-bit characters would be to "load" them in the same way that UTF-8 |
|
|
characters are loaded. Another possibility is to provide a set of |
|
|
translation functions, and build an index during translation so that the |
|
|
returned offsets can automatically be translated (using the index) after a |
|
|
match. |
|
|
|
|
|
* A different approach to Unicode might be to use a typedef to do everything |
|
|
in unsigned shorts instead of unsigned chars. Actually, we'd have to have a |
|
|
new typedef to distinguish data from bits of compiled pattern that are in |
|
|
bytes, I think. There would need to be conversion functions in and out. I |
|
|
don't think this is particularly trivial - and anyway, Unicode now has |
|
|
characters that need more than 16 bits, so is this at all sensible? I |
|
|
suspect not. |
|
|
|
|
| 209 |
. Allow errorptr and erroroffset to be NULL. I don't like this idea. |
. Allow errorptr and erroroffset to be NULL. I don't like this idea. |
| 210 |
|
|
| 211 |
. Line endings: |
. Line endings: |
| 231 |
support --outputfile=name. |
support --outputfile=name. |
| 232 |
|
|
| 233 |
. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 8. |
. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 8. |
| 234 |
|
(And now presumably UTF-16 and UCP for the 16-bit library.) |
| 235 |
|
|
| 236 |
. Add a user pointer to pcre_malloc/free functions -- some option would be |
. Add a user pointer to pcre_malloc/free functions -- some option would be |
| 237 |
needed to retain backward compatibility. |
needed to retain backward compatibility. |
| 246 |
. Wild thought: the ability to compile from PCRE's internal byte code to a real |
. Wild thought: the ability to compile from PCRE's internal byte code to a real |
| 247 |
FSM and a very fast (third) matcher to process the result. There would be |
FSM and a very fast (third) matcher to process the result. There would be |
| 248 |
even more restrictions than for pcre_dfa_exec(), however. This is not easy. |
even more restrictions than for pcre_dfa_exec(), however. This is not easy. |
| 249 |
|
This is probably obsolete now that we have the JIT support. |
| 250 |
|
|
| 251 |
. Should pcretest have some private locale data, to avoid relying on the |
. Should pcretest have some private locale data, to avoid relying on the |
| 252 |
available locales for the test data, since different OS have different ideas? |
available locales for the test data, since different OS have different ideas? |
| 270 |
|
|
| 271 |
. A user is going to supply a patch to generalize the API for user-specific |
. A user is going to supply a patch to generalize the API for user-specific |
| 272 |
memory allocation so that it is more flexible in threaded environments. This |
memory allocation so that it is more flexible in threaded environments. This |
| 273 |
was promised a long time ago, and never appeared... |
was promised a long time ago, and never appeared. However, this is a live |
| 274 |
|
issue not only for threaded environments, but for libraries that use PCRE and |
| 275 |
. Write a function that generates random matching strings for a compiled regex. |
want not to be beholden to their caller's memory allocation. |
| 276 |
|
|
| 277 |
. Write a wrapper to maintain a structure with specified runtime parameters, |
. Write a wrapper to maintain a structure with specified runtime parameters, |
| 278 |
such as recurse limit, and pass these to PCRE each time it is called. Also |
such as recurse limit, and pass these to PCRE each time it is called. Also |
| 279 |
maybe malloc and free. A user sent a prototype. |
maybe malloc and free. A user sent a prototype. This relates the the previous |
| 280 |
|
item. |
| 281 |
|
|
| 282 |
|
. Write a function that generates random matching strings for a compiled regex. |
| 283 |
|
|
| 284 |
. Pcregrep: an option to specify the output line separator, either as a string |
. Pcregrep: an option to specify the output line separator, either as a string |
| 285 |
or select from a fixed list. This is not dead easy, because at the moment it |
or select from a fixed list. This is not dead easy, because at the moment it |
| 310 |
Philip Hazel |
Philip Hazel |
| 311 |
Email local part: ph10 |
Email local part: ph10 |
| 312 |
Email domain: cam.ac.uk |
Email domain: cam.ac.uk |
| 313 |
Last updated: 11 October 2011 |
Last updated: 14 January 2012 |