| 32 |
<li><a name="TOC17" href="#SEC17">DUPLICATE SUBPATTERN NAMES</a> |
<li><a name="TOC17" href="#SEC17">DUPLICATE SUBPATTERN NAMES</a> |
| 33 |
<li><a name="TOC18" href="#SEC18">FINDING ALL POSSIBLE MATCHES</a> |
<li><a name="TOC18" href="#SEC18">FINDING ALL POSSIBLE MATCHES</a> |
| 34 |
<li><a name="TOC19" href="#SEC19">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a> |
<li><a name="TOC19" href="#SEC19">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a> |
| 35 |
|
<li><a name="TOC20" href="#SEC20">SEE ALSO</a> |
| 36 |
</ul> |
</ul> |
| 37 |
<br><a name="SEC1" href="#TOC1">PCRE NATIVE API</a><br> |
<br><a name="SEC1" href="#TOC1">PCRE NATIVE API</a><br> |
| 38 |
<P> |
<P> |
| 141 |
</P> |
</P> |
| 142 |
<br><a name="SEC2" href="#TOC1">PCRE API OVERVIEW</a><br> |
<br><a name="SEC2" href="#TOC1">PCRE API OVERVIEW</a><br> |
| 143 |
<P> |
<P> |
| 144 |
PCRE has its own native API, which is described in this document. There is |
PCRE has its own native API, which is described in this document. There are |
| 145 |
also a set of wrapper functions that correspond to the POSIX regular expression |
also some wrapper functions that correspond to the POSIX regular expression |
| 146 |
API. These are described in the |
API. These are described in the |
| 147 |
<a href="pcreposix.html"><b>pcreposix</b></a> |
<a href="pcreposix.html"><b>pcreposix</b></a> |
| 148 |
documentation. Both of these APIs define a set of C function calls. A C++ |
documentation. Both of these APIs define a set of C function calls. A C++ |
| 171 |
A second matching function, <b>pcre_dfa_exec()</b>, which is not |
A second matching function, <b>pcre_dfa_exec()</b>, which is not |
| 172 |
Perl-compatible, is also provided. This uses a different algorithm for the |
Perl-compatible, is also provided. This uses a different algorithm for the |
| 173 |
matching. The alternative algorithm finds all possible matches (at a given |
matching. The alternative algorithm finds all possible matches (at a given |
| 174 |
point in the subject). However, this algorithm does not return captured |
point in the subject), and scans the subject just once. However, this algorithm |
| 175 |
substrings. A description of the two matching algorithms and their advantages |
does not return captured substrings. A description of the two matching |
| 176 |
and disadvantages is given in the |
algorithms and their advantages and disadvantages is given in the |
| 177 |
<a href="pcrematching.html"><b>pcrematching</b></a> |
<a href="pcrematching.html"><b>pcrematching</b></a> |
| 178 |
documentation. |
documentation. |
| 179 |
</P> |
</P> |
| 244 |
</P> |
</P> |
| 245 |
<br><a name="SEC3" href="#TOC1">NEWLINES</a><br> |
<br><a name="SEC3" href="#TOC1">NEWLINES</a><br> |
| 246 |
<P> |
<P> |
| 247 |
PCRE supports three different conventions for indicating line breaks in |
PCRE supports four different conventions for indicating line breaks in |
| 248 |
strings: a single CR character, a single LF character, or the two-character |
strings: a single CR (carriage return) character, a single LF (linefeed) |
| 249 |
sequence CRLF. All three are used as "standard" by different operating systems. |
character, the two-character sequence CRLF, or any Unicode newline sequence. |
| 250 |
When PCRE is built, a default can be specified. The default default is LF, |
The Unicode newline sequences are the three just mentioned, plus the single |
| 251 |
which is the Unix standard. When PCRE is run, the default can be overridden, |
characters VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, |
| 252 |
either when a pattern is compiled, or when it is matched. |
U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). |
| 253 |
<br> |
</P> |
| 254 |
<br> |
<P> |
| 255 |
|
Each of the first three conventions is used by at least one operating system as |
| 256 |
|
its standard newline sequence. When PCRE is built, a default can be specified. |
| 257 |
|
The default default is LF, which is the Unix standard. When PCRE is run, the |
| 258 |
|
default can be overridden, either when a pattern is compiled, or when it is |
| 259 |
|
matched. |
| 260 |
|
</P> |
| 261 |
|
<P> |
| 262 |
In the PCRE documentation the word "newline" is used to mean "the character or |
In the PCRE documentation the word "newline" is used to mean "the character or |
| 263 |
pair of characters that indicate a line break". |
pair of characters that indicate a line break". The choice of newline |
| 264 |
|
convention affects the handling of the dot, circumflex, and dollar |
| 265 |
|
metacharacters, the handling of #-comments in /x mode, and, when CRLF is a |
| 266 |
|
recognized line ending sequence, the match position advancement for a |
| 267 |
|
non-anchored pattern. The choice of newline convention does not affect the |
| 268 |
|
interpretation of the \n or \r escape sequences. |
| 269 |
</P> |
</P> |
| 270 |
<br><a name="SEC4" href="#TOC1">MULTITHREADING</a><br> |
<br><a name="SEC4" href="#TOC1">MULTITHREADING</a><br> |
| 271 |
<P> |
<P> |
| 314 |
PCRE_CONFIG_NEWLINE |
PCRE_CONFIG_NEWLINE |
| 315 |
</pre> |
</pre> |
| 316 |
The output is an integer whose value specifies the default character sequence |
The output is an integer whose value specifies the default character sequence |
| 317 |
that is recognized as meaning "newline". The three values that are supported |
that is recognized as meaning "newline". The four values that are supported |
| 318 |
are: 10 for LF, 13 for CR, and 3338 for CRLF. The default should normally be |
are: 10 for LF, 13 for CR, 3338 for CRLF, and -1 for ANY. The default should |
| 319 |
the standard sequence for your operating system. |
normally be the standard sequence for your operating system. |
| 320 |
<pre> |
<pre> |
| 321 |
PCRE_CONFIG_LINK_SIZE |
PCRE_CONFIG_LINK_SIZE |
| 322 |
</pre> |
</pre> |
| 387 |
argument, which is an address (see below). |
argument, which is an address (see below). |
| 388 |
</P> |
</P> |
| 389 |
<P> |
<P> |
| 390 |
The <i>options</i> argument contains independent bits that affect the |
The <i>options</i> argument contains various bit settings that affect the |
| 391 |
compilation. It should be zero if no options are required. The available |
compilation. It should be zero if no options are required. The available |
| 392 |
options are described below. Some of them, in particular, those that are |
options are described below. Some of them, in particular, those that are |
| 393 |
compatible with Perl, can also be set and unset from within the pattern (see |
compatible with Perl, can also be set and unset from within the pattern (see |
| 480 |
including those that indicate newline. Without it, a dot does not match when |
including those that indicate newline. Without it, a dot does not match when |
| 481 |
the current position is at a newline. This option is equivalent to Perl's /s |
the current position is at a newline. This option is equivalent to Perl's /s |
| 482 |
option, and it can be changed within a pattern by a (?s) option setting. A |
option, and it can be changed within a pattern by a (?s) option setting. A |
| 483 |
negative class such as [^a] always matches newlines, independent of the setting |
negative class such as [^a] always matches newline characters, independent of |
| 484 |
of this option. |
the setting of this option. |
| 485 |
<pre> |
<pre> |
| 486 |
PCRE_DUPNAMES |
PCRE_DUPNAMES |
| 487 |
</pre> |
</pre> |
| 544 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
| 545 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
| 546 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
| 547 |
|
PCRE_NEWLINE_ANY |
| 548 |
</pre> |
</pre> |
| 549 |
These options override the default newline definition that was chosen when PCRE |
These options override the default newline definition that was chosen when PCRE |
| 550 |
was built. Setting the first or the second specifies that a newline is |
was built. Setting the first or the second specifies that a newline is |
| 551 |
indicated by a single character (CR or LF, respectively). Setting both of them |
indicated by a single character (CR or LF, respectively). Setting |
| 552 |
specifies that a newline is indicated by the two-character CRLF sequence. For |
PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character |
| 553 |
convenience, PCRE_NEWLINE_CRLF is defined to contain both bits. The only time |
CRLF sequence. Setting PCRE_NEWLINE_ANY specifies that any Unicode newline |
| 554 |
that a line break is relevant when compiling a pattern is if PCRE_EXTENDED is |
sequence should be recognized. The Unicode newline sequences are the three just |
| 555 |
set, and an unescaped # outside a character class is encountered. This |
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
| 556 |
indicates a comment that lasts until after the next newline. |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
| 557 |
|
(paragraph separator, U+2029). The last two are recognized only in UTF-8 mode. |
| 558 |
|
</P> |
| 559 |
|
<P> |
| 560 |
|
The newline setting in the options word uses three bits that are treated |
| 561 |
|
as a number, giving eight possibilities. Currently only five are used (default |
| 562 |
|
plus the four values above). This means that if you set more than one newline |
| 563 |
|
option, the combination may or may not be sensible. For example, |
| 564 |
|
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
| 565 |
|
other combinations yield unused numbers and cause an error. |
| 566 |
|
</P> |
| 567 |
|
<P> |
| 568 |
|
The only time that a line break is specially recognized when compiling a |
| 569 |
|
pattern is if PCRE_EXTENDED is set, and an unescaped # outside a character |
| 570 |
|
class is encountered. This indicates a comment that lasts until after the next |
| 571 |
|
line break sequence. In other circumstances, line break sequences are treated |
| 572 |
|
as literal data, except that in PCRE_EXTENDED mode, both CR and LF are treated |
| 573 |
|
as whitespace characters and are therefore ignored. |
| 574 |
</P> |
</P> |
| 575 |
<P> |
<P> |
| 576 |
The newline option set at compile time becomes the default that is used for |
The newline option that is set at compile time becomes the default that is used |
| 577 |
<b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, but it can be overridden. |
for <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, but it can be overridden. |
| 578 |
<pre> |
<pre> |
| 579 |
PCRE_NO_AUTO_CAPTURE |
PCRE_NO_AUTO_CAPTURE |
| 580 |
</pre> |
</pre> |
| 618 |
<P> |
<P> |
| 619 |
The following table lists the error codes than may be returned by |
The following table lists the error codes than may be returned by |
| 620 |
<b>pcre_compile2()</b>, along with the error messages that may be returned by |
<b>pcre_compile2()</b>, along with the error messages that may be returned by |
| 621 |
both compiling functions. |
both compiling functions. As PCRE has developed, some error codes have fallen |
| 622 |
|
out of use. To avoid confusion, they have not been re-used. |
| 623 |
<pre> |
<pre> |
| 624 |
0 no error |
0 no error |
| 625 |
1 \ at end of pattern |
1 \ at end of pattern |
| 631 |
7 invalid escape sequence in character class |
7 invalid escape sequence in character class |
| 632 |
8 range out of order in character class |
8 range out of order in character class |
| 633 |
9 nothing to repeat |
9 nothing to repeat |
| 634 |
10 operand of unlimited repeat could match the empty string |
10 [this code is not in use] |
| 635 |
11 internal error: unexpected repeat |
11 internal error: unexpected repeat |
| 636 |
12 unrecognized character after (? |
12 unrecognized character after (? |
| 637 |
13 POSIX named classes are supported only within a class |
13 POSIX named classes are supported only within a class |
| 640 |
16 erroffset passed as NULL |
16 erroffset passed as NULL |
| 641 |
17 unknown option bit(s) set |
17 unknown option bit(s) set |
| 642 |
18 missing ) after comment |
18 missing ) after comment |
| 643 |
19 parentheses nested too deeply |
19 [this code is not in use] |
| 644 |
20 regular expression too large |
20 regular expression too large |
| 645 |
21 failed to get memory |
21 failed to get memory |
| 646 |
22 unmatched parentheses |
22 unmatched parentheses |
| 654 |
30 unknown POSIX class name |
30 unknown POSIX class name |
| 655 |
31 POSIX collating elements are not supported |
31 POSIX collating elements are not supported |
| 656 |
32 this version of PCRE is not compiled with PCRE_UTF8 support |
32 this version of PCRE is not compiled with PCRE_UTF8 support |
| 657 |
33 spare error |
33 [this code is not in use] |
| 658 |
34 character value in \x{...} sequence is too large |
34 character value in \x{...} sequence is too large |
| 659 |
35 invalid condition (?(0) |
35 invalid condition (?(0) |
| 660 |
36 \C not allowed in lookbehind assertion |
36 \C not allowed in lookbehind assertion |
| 663 |
39 closing ) for (?C expected |
39 closing ) for (?C expected |
| 664 |
40 recursive call could loop indefinitely |
40 recursive call could loop indefinitely |
| 665 |
41 unrecognized character after (?P |
41 unrecognized character after (?P |
| 666 |
42 syntax error after (?P |
42 syntax error in subpattern name (missing terminator) |
| 667 |
43 two named subpatterns have the same name |
43 two named subpatterns have the same name |
| 668 |
44 invalid UTF-8 string |
44 invalid UTF-8 string |
| 669 |
45 support for \P, \p, and \X has not been compiled |
45 support for \P, \p, and \X has not been compiled |
| 673 |
49 too many named subpatterns (maximum 10,000) |
49 too many named subpatterns (maximum 10,000) |
| 674 |
50 repeated subpattern is too long |
50 repeated subpattern is too long |
| 675 |
51 octal value is greater than \377 (not in UTF-8 mode) |
51 octal value is greater than \377 (not in UTF-8 mode) |
| 676 |
|
52 internal error: overran compiling workspace |
| 677 |
|
53 internal error: previously-checked referenced subpattern not found |
| 678 |
|
54 DEFINE group contains more than one branch |
| 679 |
|
55 repeating a DEFINE group is not allowed |
| 680 |
|
56 inconsistent NEWLINE options" |
| 681 |
</PRE> |
</PRE> |
| 682 |
</P> |
</P> |
| 683 |
<br><a name="SEC9" href="#TOC1">STUDYING A PATTERN</a><br> |
<br><a name="SEC9" href="#TOC1">STUDYING A PATTERN</a><br> |
| 847 |
</P> |
</P> |
| 848 |
<P> |
<P> |
| 849 |
If there is a fixed first byte, for example, from a pattern such as |
If there is a fixed first byte, for example, from a pattern such as |
| 850 |
(cat|cow|coyote). Otherwise, if either |
(cat|cow|coyote), its value is returned. Otherwise, if either |
| 851 |
<br> |
<br> |
| 852 |
<br> |
<br> |
| 853 |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch |
| 905 |
their parentheses numbers. For example, consider the following pattern (assume |
their parentheses numbers. For example, consider the following pattern (assume |
| 906 |
PCRE_EXTENDED is set, so white space - including newlines - is ignored): |
PCRE_EXTENDED is set, so white space - including newlines - is ignored): |
| 907 |
<pre> |
<pre> |
| 908 |
(?P<date> (?P<year>(\d\d)?\d\d) - (?P<month>\d\d) - (?P<day>\d\d) ) |
(?<date> (?<year>(\d\d)?\d\d) - (?<month>\d\d) - (?<day>\d\d) ) |
| 909 |
</pre> |
</pre> |
| 910 |
There are four named subpatterns, so the table has four entries, and each entry |
There are four named subpatterns, so the table has four entries, and each entry |
| 911 |
in the table is eight bytes long. The table is as follows, with non-printing |
in the table is eight bytes long. The table is as follows, with non-printing |
| 1153 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
| 1154 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
| 1155 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
| 1156 |
|
PCRE_NEWLINE_ANY |
| 1157 |
</pre> |
</pre> |
| 1158 |
These options override the newline definition that was chosen or defaulted when |
These options override the newline definition that was chosen or defaulted when |
| 1159 |
the pattern was compiled. For details, see the description <b>pcre_compile()</b> |
the pattern was compiled. For details, see the description of |
| 1160 |
above. During matching, the newline choice affects the behaviour of the dot, |
<b>pcre_compile()</b> above. During matching, the newline choice affects the |
| 1161 |
circumflex, and dollar metacharacters. |
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter |
| 1162 |
|
the way the match position is advanced after a match failure for an unanchored |
| 1163 |
|
pattern. When PCRE_NEWLINE_CRLF or PCRE_NEWLINE_ANY is set, and a match attempt |
| 1164 |
|
fails when the current position is at a CRLF sequence, the match position is |
| 1165 |
|
advanced by two characters instead of one, in other words, to after the CRLF. |
| 1166 |
<pre> |
<pre> |
| 1167 |
PCRE_NOTBOL |
PCRE_NOTBOL |
| 1168 |
</pre> |
</pre> |
| 1376 |
other endianness. This is the error that PCRE gives when the magic number is |
other endianness. This is the error that PCRE gives when the magic number is |
| 1377 |
not present. |
not present. |
| 1378 |
<pre> |
<pre> |
| 1379 |
PCRE_ERROR_UNKNOWN_NODE (-5) |
PCRE_ERROR_UNKNOWN_OPCODE (-5) |
| 1380 |
</pre> |
</pre> |
| 1381 |
While running the pattern match, an unknown item was encountered in the |
While running the pattern match, an unknown item was encountered in the |
| 1382 |
compiled pattern. This error could be caused by a bug in PCRE or by overwriting |
compiled pattern. This error could be caused by a bug in PCRE or by overwriting |
| 1402 |
<b>pcre_extra</b> structure (or defaulted) was reached. See the description |
<b>pcre_extra</b> structure (or defaulted) was reached. See the description |
| 1403 |
above. |
above. |
| 1404 |
<pre> |
<pre> |
|
PCRE_ERROR_RECURSIONLIMIT (-21) |
|
|
</pre> |
|
|
The internal recursion limit, as specified by the <i>match_limit_recursion</i> |
|
|
field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the |
|
|
description above. |
|
|
<pre> |
|
| 1405 |
PCRE_ERROR_CALLOUT (-9) |
PCRE_ERROR_CALLOUT (-9) |
| 1406 |
</pre> |
</pre> |
| 1407 |
This error is never generated by <b>pcre_exec()</b> itself. It is provided for |
This error is never generated by <b>pcre_exec()</b> itself. It is provided for |
| 1439 |
PCRE_ERROR_BADCOUNT (-15) |
PCRE_ERROR_BADCOUNT (-15) |
| 1440 |
</pre> |
</pre> |
| 1441 |
This error is given if the value of the <i>ovecsize</i> argument is negative. |
This error is given if the value of the <i>ovecsize</i> argument is negative. |
| 1442 |
|
<pre> |
| 1443 |
|
PCRE_ERROR_RECURSIONLIMIT (-21) |
| 1444 |
|
</pre> |
| 1445 |
|
The internal recursion limit, as specified by the <i>match_limit_recursion</i> |
| 1446 |
|
field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the |
| 1447 |
|
description above. |
| 1448 |
|
<pre> |
| 1449 |
|
PCRE_ERROR_NULLWSLIMIT (-22) |
| 1450 |
|
</pre> |
| 1451 |
|
When a group that can match an empty substring is repeated with an unbounded |
| 1452 |
|
upper limit, the subject position at the start of the group must be remembered, |
| 1453 |
|
so that a test for an empty string can be made when the end of the group is |
| 1454 |
|
reached. Some workspace is required for this; if it runs out, this error is |
| 1455 |
|
given. |
| 1456 |
|
<pre> |
| 1457 |
|
PCRE_ERROR_BADNEWLINE (-23) |
| 1458 |
|
</pre> |
| 1459 |
|
An invalid combination of PCRE_NEWLINE_<i>xxx</i> options was given. |
| 1460 |
|
</P> |
| 1461 |
|
<P> |
| 1462 |
|
Error numbers -16 to -20 are not used by <b>pcre_exec()</b>. |
| 1463 |
</P> |
</P> |
| 1464 |
<br><a name="SEC15" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br> |
<br><a name="SEC15" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br> |
| 1465 |
<P> |
<P> |
| 1514 |
<i>buffersize</i>, while for <b>pcre_get_substring()</b> a new block of memory is |
<i>buffersize</i>, while for <b>pcre_get_substring()</b> a new block of memory is |
| 1515 |
obtained via <b>pcre_malloc</b>, and its address is returned via |
obtained via <b>pcre_malloc</b>, and its address is returned via |
| 1516 |
<i>stringptr</i>. The yield of the function is the length of the string, not |
<i>stringptr</i>. The yield of the function is the length of the string, not |
| 1517 |
including the terminating zero, or one of |
including the terminating zero, or one of these error codes: |
| 1518 |
<pre> |
<pre> |
| 1519 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
| 1520 |
</pre> |
</pre> |
| 1531 |
memory that is obtained via <b>pcre_malloc</b>. The address of the memory block |
memory that is obtained via <b>pcre_malloc</b>. The address of the memory block |
| 1532 |
is returned via <i>listptr</i>, which is also the start of the list of string |
is returned via <i>listptr</i>, which is also the start of the list of string |
| 1533 |
pointers. The end of the list is marked by a NULL pointer. The yield of the |
pointers. The end of the list is marked by a NULL pointer. The yield of the |
| 1534 |
function is zero if all went well, or |
function is zero if all went well, or the error code |
| 1535 |
<pre> |
<pre> |
| 1536 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
| 1537 |
</pre> |
</pre> |
| 1577 |
To extract a substring by name, you first have to find associated number. |
To extract a substring by name, you first have to find associated number. |
| 1578 |
For example, for this pattern |
For example, for this pattern |
| 1579 |
<pre> |
<pre> |
| 1580 |
(a+)b(?P<xxx>\d+)... |
(a+)b(?<xxx>\d+)... |
| 1581 |
</pre> |
</pre> |
| 1582 |
the number of the subpattern called "xxx" is 2. If the name is known to be |
the number of the subpattern called "xxx" is 2. If the name is known to be |
| 1583 |
unique (PCRE_DUPNAMES was not set), you can find the number from the name by |
unique (PCRE_DUPNAMES was not set), you can find the number from the name by |
| 1632 |
fourth are pointers to variables which are updated by the function. After it |
fourth are pointers to variables which are updated by the function. After it |
| 1633 |
has run, they point to the first and last entries in the name-to-number table |
has run, they point to the first and last entries in the name-to-number table |
| 1634 |
for the given name. The function itself returns the length of each entry, or |
for the given name. The function itself returns the length of each entry, or |
| 1635 |
PCRE_ERROR_NOSUBSTRING if there are none. The format of the table is described |
PCRE_ERROR_NOSUBSTRING (-7) if there are none. The format of the table is |
| 1636 |
above in the section entitled <i>Information about a pattern</i>. Given all the |
described above in the section entitled <i>Information about a pattern</i>. |
| 1637 |
relevant entries for the name, you can extract each of their numbers, and hence |
Given all the relevant entries for the name, you can extract each of their |
| 1638 |
the captured data, if any. |
numbers, and hence the captured data, if any. |
| 1639 |
</P> |
</P> |
| 1640 |
<br><a name="SEC18" href="#TOC1">FINDING ALL POSSIBLE MATCHES</a><br> |
<br><a name="SEC18" href="#TOC1">FINDING ALL POSSIBLE MATCHES</a><br> |
| 1641 |
<P> |
<P> |
| 1665 |
</P> |
</P> |
| 1666 |
<P> |
<P> |
| 1667 |
The function <b>pcre_dfa_exec()</b> is called to match a subject string against |
The function <b>pcre_dfa_exec()</b> is called to match a subject string against |
| 1668 |
a compiled pattern, using a "DFA" matching algorithm. This has different |
a compiled pattern, using a matching algorithm that scans the subject string |
| 1669 |
characteristics to the normal algorithm, and is not compatible with Perl. Some |
just once, and does not backtrack. This has different characteristics to the |
| 1670 |
of the features of PCRE patterns are not supported. Nevertheless, there are |
normal algorithm, and is not compatible with Perl. Some of the features of PCRE |
| 1671 |
times when this kind of matching can be useful. For a discussion of the two |
patterns are not supported. Nevertheless, there are times when this kind of |
| 1672 |
matching algorithms, see the |
matching can be useful. For a discussion of the two matching algorithms, see |
| 1673 |
|
the |
| 1674 |
<a href="pcrematching.html"><b>pcrematching</b></a> |
<a href="pcrematching.html"><b>pcrematching</b></a> |
| 1675 |
documentation. |
documentation. |
| 1676 |
</P> |
</P> |
| 1729 |
PCRE_DFA_SHORTEST |
PCRE_DFA_SHORTEST |
| 1730 |
</pre> |
</pre> |
| 1731 |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as |
| 1732 |
soon as it has found one match. Because of the way the DFA algorithm works, |
soon as it has found one match. Because of the way the alternative algorithm |
| 1733 |
this is necessarily the shortest possible match at the first possible matching |
works, this is necessarily the shortest possible match at the first possible |
| 1734 |
point in the subject string. |
matching point in the subject string. |
| 1735 |
<pre> |
<pre> |
| 1736 |
PCRE_DFA_RESTART |
PCRE_DFA_RESTART |
| 1737 |
</pre> |
</pre> |
| 1769 |
On success, the yield of the function is a number greater than zero, which is |
On success, the yield of the function is a number greater than zero, which is |
| 1770 |
the number of matched substrings. The substrings themselves are returned in |
the number of matched substrings. The substrings themselves are returned in |
| 1771 |
<i>ovector</i>. Each string uses two elements; the first is the offset to the |
<i>ovector</i>. Each string uses two elements; the first is the offset to the |
| 1772 |
start, and the second is the offset to the end. All the strings have the same |
start, and the second is the offset to the end. In fact, all the strings have |
| 1773 |
start offset. (Space could have been saved by giving this only once, but it was |
the same start offset. (Space could have been saved by giving this only once, |
| 1774 |
decided to retain some compatibility with the way <b>pcre_exec()</b> returns |
but it was decided to retain some compatibility with the way <b>pcre_exec()</b> |
| 1775 |
data, even though the meaning of the strings is different.) |
returns data, even though the meaning of the strings is different.) |
| 1776 |
</P> |
</P> |
| 1777 |
<P> |
<P> |
| 1778 |
The strings are returned in reverse order of length; that is, the longest |
The strings are returned in reverse order of length; that is, the longest |
| 1798 |
<pre> |
<pre> |
| 1799 |
PCRE_ERROR_DFA_UCOND (-17) |
PCRE_ERROR_DFA_UCOND (-17) |
| 1800 |
</pre> |
</pre> |
| 1801 |
This return is given if <b>pcre_dfa_exec()</b> encounters a condition item in a |
This return is given if <b>pcre_dfa_exec()</b> encounters a condition item that |
| 1802 |
pattern that uses a back reference for the condition. This is not supported. |
uses a back reference for the condition, or a test for recursion in a specific |
| 1803 |
|
group. These are not supported. |
| 1804 |
<pre> |
<pre> |
| 1805 |
PCRE_ERROR_DFA_UMLIMIT (-18) |
PCRE_ERROR_DFA_UMLIMIT (-18) |
| 1806 |
</pre> |
</pre> |
| 1820 |
error is given if the output vector is not large enough. This should be |
error is given if the output vector is not large enough. This should be |
| 1821 |
extremely rare, as a vector of size 1000 is used. |
extremely rare, as a vector of size 1000 is used. |
| 1822 |
</P> |
</P> |
| 1823 |
|
<br><a name="SEC20" href="#TOC1">SEE ALSO</a><br> |
| 1824 |
|
<P> |
| 1825 |
|
<b>pcrebuild</b>(3), <b>pcrecallout</b>(3), <b>pcrecpp(3)</b>(3), |
| 1826 |
|
<b>pcrematching</b>(3), <b>pcrepartial</b>(3), <b>pcreposix</b>(3), |
| 1827 |
|
<b>pcreprecompile</b>(3), <b>pcresample</b>(3), <b>pcrestack</b>(3). |
| 1828 |
|
</P> |
| 1829 |
<P> |
<P> |
| 1830 |
Last updated: 08 June 2006 |
Last updated: 30 November 2006 |
| 1831 |
<br> |
<br> |
| 1832 |
Copyright © 1997-2006 University of Cambridge. |
Copyright © 1997-2006 University of Cambridge. |
| 1833 |
<p> |
<p> |