| 440 |
pointed to by \fIerroffset\fP, which must not be NULL. If it is, an immediate |
pointed to by \fIerroffset\fP, which must not be NULL. If it is, an immediate |
| 441 |
error is given. Some errors are not detected until checks are carried out when |
error is given. Some errors are not detected until checks are carried out when |
| 442 |
the whole pattern has been scanned; in this case the offset is set to the end |
the whole pattern has been scanned; in this case the offset is set to the end |
| 443 |
of the pattern. |
of the pattern. |
| 444 |
.P |
.P |
| 445 |
Note that the offset is in bytes, not characters, even in UTF-8 mode. It may |
Note that the offset is in bytes, not characters, even in UTF-8 mode. It may |
| 446 |
point into the middle of a UTF-8 character (for example, when |
point into the middle of a UTF-8 character (for example, when |
| 447 |
PCRE_ERROR_BADUTF8 is returned for an invalid UTF-8 string). |
PCRE_ERROR_BADUTF8 is returned for an invalid UTF-8 string). |
| 448 |
.P |
.P |
| 523 |
.sp |
.sp |
| 524 |
PCRE_DOTALL |
PCRE_DOTALL |
| 525 |
.sp |
.sp |
| 526 |
If this bit is set, a dot metacharater in the pattern matches all characters, |
If this bit is set, a dot metacharacter in the pattern matches a character of |
| 527 |
including those that indicate newline. Without it, a dot does not match when |
any value, including one that indicates a newline. However, it only ever |
| 528 |
the current position is at a newline. This option is equivalent to Perl's /s |
matches one character, even if newlines are coded as CRLF. Without this option, |
| 529 |
option, and it can be changed within a pattern by a (?s) option setting. A |
a dot does not match when the current position is at a newline. This option is |
| 530 |
negative class such as [^a] always matches newline characters, independent of |
equivalent to Perl's /s option, and it can be changed within a pattern by a |
| 531 |
the setting of this option. |
(?s) option setting. A negative class such as [^a] always matches newline |
| 532 |
|
characters, independent of the setting of this option. |
| 533 |
.sp |
.sp |
| 534 |
PCRE_DUPNAMES |
PCRE_DUPNAMES |
| 535 |
.sp |
.sp |
| 551 |
ignored. This is equivalent to Perl's /x option, and it can be changed within a |
ignored. This is equivalent to Perl's /x option, and it can be changed within a |
| 552 |
pattern by a (?x) option setting. |
pattern by a (?x) option setting. |
| 553 |
.P |
.P |
| 554 |
|
Which characters are interpreted as newlines |
| 555 |
|
is controlled by the options passed to \fBpcre_compile()\fP or by a special |
| 556 |
|
sequence at the start of the pattern, as described in the section entitled |
| 557 |
|
.\" HTML <a href="pcrepattern.html#newlines"> |
| 558 |
|
.\" </a> |
| 559 |
|
"Newline conventions" |
| 560 |
|
.\" |
| 561 |
|
in the \fBpcrepattern\fP documentation. Note that the end of this type of |
| 562 |
|
comment is a literal newline sequence in the pattern; escape sequences that |
| 563 |
|
happen to represent a newline do not count. |
| 564 |
|
.P |
| 565 |
This option makes it possible to include comments inside complicated patterns. |
This option makes it possible to include comments inside complicated patterns. |
| 566 |
Note, however, that this applies only to data characters. Whitespace characters |
Note, however, that this applies only to data characters. Whitespace characters |
| 567 |
may never appear within special character sequences in a pattern, for example |
may never appear within special character sequences in a pattern, for example |
| 568 |
within the sequence (?( which introduces a conditional subpattern. |
within the sequence (?( that introduces a conditional subpattern. |
| 569 |
.sp |
.sp |
| 570 |
PCRE_EXTRA |
PCRE_EXTRA |
| 571 |
.sp |
.sp |
| 640 |
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
| 641 |
other combinations may yield unused numbers and cause an error. |
other combinations may yield unused numbers and cause an error. |
| 642 |
.P |
.P |
| 643 |
The only time that a line break is specially recognized when compiling a |
The only time that a line break in a pattern is specially recognized when |
| 644 |
pattern is if PCRE_EXTENDED is set, and an unescaped # outside a character |
compiling is when PCRE_EXTENDED is set. CR and LF are whitespace characters, |
| 645 |
class is encountered. This indicates a comment that lasts until after the next |
and so are ignored in this mode. Also, an unescaped # outside a character class |
| 646 |
line break sequence. In other circumstances, line break sequences are treated |
indicates a comment that lasts until after the next line break sequence. In |
| 647 |
as literal data, except that in PCRE_EXTENDED mode, both CR and LF are treated |
other circumstances, line break sequences in patterns are treated as literal |
| 648 |
as whitespace characters and are therefore ignored. |
data. |
| 649 |
.P |
.P |
| 650 |
The newline option that is set at compile time becomes the default that is used |
The newline option that is set at compile time becomes the default that is used |
| 651 |
for \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, but it can be overridden. |
for \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, but it can be overridden. |
| 660 |
.sp |
.sp |
| 661 |
PCRE_UCP |
PCRE_UCP |
| 662 |
.sp |
.sp |
| 663 |
This option changes the way PCRE processes \eb, \ed, \es, \ew, and some of the |
This option changes the way PCRE processes \eB, \eb, \eD, \ed, \eS, \es, \eW, |
| 664 |
POSIX character classes. By default, only ASCII characters are recognized, but |
\ew, and some of the POSIX character classes. By default, only ASCII characters |
| 665 |
if PCRE_UCP is set, Unicode properties are used instead to classify characters. |
are recognized, but if PCRE_UCP is set, Unicode properties are used instead to |
| 666 |
More details are given in the section on |
classify characters. More details are given in the section on |
| 667 |
.\" HTML <a href="pcre.html#genericchartypes"> |
.\" HTML <a href="pcre.html#genericchartypes"> |
| 668 |
.\" </a> |
.\" </a> |
| 669 |
generic character types |
generic character types |
| 868 |
The two optimizations just described can be disabled by setting the |
The two optimizations just described can be disabled by setting the |
| 869 |
PCRE_NO_START_OPTIMIZE option when calling \fBpcre_exec()\fP or |
PCRE_NO_START_OPTIMIZE option when calling \fBpcre_exec()\fP or |
| 870 |
\fBpcre_dfa_exec()\fP. You might want to do this if your pattern contains |
\fBpcre_dfa_exec()\fP. You might want to do this if your pattern contains |
| 871 |
callouts, or make use of (*MARK), and you make use of these in cases where |
callouts or (*MARK), and you want to make use of these facilities in cases |
| 872 |
matching fails. See the discussion of PCRE_NO_START_OPTIMIZE |
where matching fails. See the discussion of PCRE_NO_START_OPTIMIZE |
| 873 |
.\" HTML <a href="#execoptions"> |
.\" HTML <a href="#execoptions"> |
| 874 |
.\" </a> |
.\" </a> |
| 875 |
below. |
below. |
| 1466 |
.\" HREF |
.\" HREF |
| 1467 |
\fBpcredemo\fP |
\fBpcredemo\fP |
| 1468 |
.\" |
.\" |
| 1469 |
sample program. In the most general case, you have to check to see if the |
sample program. In the most general case, you have to check to see if the |
| 1470 |
newline convention recognizes CRLF as a newline, and if so, and the current |
newline convention recognizes CRLF as a newline, and if so, and the current |
| 1471 |
character is CR followed by LF, advance the starting offset by two characters |
character is CR followed by LF, advance the starting offset by two characters |
| 1472 |
instead of one. |
instead of one. |
| 1473 |
.sp |
.sp |
| 1563 |
If PCRE_PARTIAL_HARD is set, it overrides PCRE_PARTIAL_SOFT. In this case, if a |
If PCRE_PARTIAL_HARD is set, it overrides PCRE_PARTIAL_SOFT. In this case, if a |
| 1564 |
partial match is found, \fBpcre_exec()\fP immediately returns |
partial match is found, \fBpcre_exec()\fP immediately returns |
| 1565 |
PCRE_ERROR_PARTIAL, without considering any other alternatives. In other words, |
PCRE_ERROR_PARTIAL, without considering any other alternatives. In other words, |
| 1566 |
when PCRE_PARTIAL_HARD is set, a partial match is considered to be more |
when PCRE_PARTIAL_HARD is set, a partial match is considered to be more |
| 1567 |
important that an alternative complete match. |
important that an alternative complete match. |
| 1568 |
.P |
.P |
| 1569 |
In both cases, the portion of the string that was inspected when the partial |
In both cases, the portion of the string that was inspected when the partial |
| 1580 |
.sp |
.sp |
| 1581 |
The subject string is passed to \fBpcre_exec()\fP as a pointer in |
The subject string is passed to \fBpcre_exec()\fP as a pointer in |
| 1582 |
\fIsubject\fP, a length (in bytes) in \fIlength\fP, and a starting byte offset |
\fIsubject\fP, a length (in bytes) in \fIlength\fP, and a starting byte offset |
| 1583 |
in \fIstartoffset\fP. If this is negative or greater than the length of the |
in \fIstartoffset\fP. If this is negative or greater than the length of the |
| 1584 |
subject, \fBpcre_exec()\fP returns PCRE_ERROR_BADOFFSET. |
subject, \fBpcre_exec()\fP returns PCRE_ERROR_BADOFFSET. When the starting |
| 1585 |
.P |
offset is zero, the search for a match starts at the beginning of the subject, |
| 1586 |
In UTF-8 mode, the byte offset must point to the start of a UTF-8 character (or |
and this is by far the most common case. In UTF-8 mode, the byte offset must |
| 1587 |
the end of the subject). Unlike the pattern string, the subject may contain |
point to the start of a UTF-8 character (or the end of the subject). Unlike the |
| 1588 |
binary zero bytes. When the starting offset is zero, the search for a match |
pattern string, the subject may contain binary zero bytes. |
|
starts at the beginning of the subject, and this is by far the most common |
|
|
case. |
|
| 1589 |
.P |
.P |
| 1590 |
A non-zero starting offset is useful when searching for another match in the |
A non-zero starting offset is useful when searching for another match in the |
| 1591 |
same subject by calling \fBpcre_exec()\fP again after a previous success. |
same subject by calling \fBpcre_exec()\fP again after a previous success. |
| 1614 |
.\" HREF |
.\" HREF |
| 1615 |
\fBpcredemo\fP |
\fBpcredemo\fP |
| 1616 |
.\" |
.\" |
| 1617 |
sample program. In the most general case, you have to check to see if the |
sample program. In the most general case, you have to check to see if the |
| 1618 |
newline convention recognizes CRLF as a newline, and if so, and the current |
newline convention recognizes CRLF as a newline, and if so, and the current |
| 1619 |
character is CR followed by LF, advance the starting offset by two characters |
character is CR followed by LF, advance the starting offset by two characters |
| 1620 |
instead of one. |
instead of one. |
| 1621 |
.P |
.P |
| 1772 |
PCRE_ERROR_BADUTF8 (-10) |
PCRE_ERROR_BADUTF8 (-10) |
| 1773 |
.sp |
.sp |
| 1774 |
A string that contains an invalid UTF-8 byte sequence was passed as a subject. |
A string that contains an invalid UTF-8 byte sequence was passed as a subject. |
| 1775 |
However, if PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8 |
However, if PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8 |
| 1776 |
character at the end of the subject, PCRE_ERROR_SHORTUTF8 is used instead. |
character at the end of the subject, PCRE_ERROR_SHORTUTF8 is used instead. |
| 1777 |
.sp |
.sp |
| 1778 |
PCRE_ERROR_BADUTF8_OFFSET (-11) |
PCRE_ERROR_BADUTF8_OFFSET (-11) |
| 1779 |
.sp |
.sp |
| 1780 |
The UTF-8 byte sequence that was passed as a subject was valid, but the value |
The UTF-8 byte sequence that was passed as a subject was valid, but the value |
| 1781 |
of \fIstartoffset\fP did not point to the beginning of a UTF-8 character or the |
of \fIstartoffset\fP did not point to the beginning of a UTF-8 character or the |
| 1782 |
end of the subject. |
end of the subject. |
| 1783 |
.sp |
.sp |
| 1784 |
PCRE_ERROR_PARTIAL (-12) |
PCRE_ERROR_PARTIAL (-12) |
| 1817 |
.sp |
.sp |
| 1818 |
PCRE_ERROR_BADOFFSET (-24) |
PCRE_ERROR_BADOFFSET (-24) |
| 1819 |
.sp |
.sp |
| 1820 |
The value of \fIstartoffset\fP was negative or greater than the length of the |
The value of \fIstartoffset\fP was negative or greater than the length of the |
| 1821 |
subject, that is, the value in \fIlength\fP. |
subject, that is, the value in \fIlength\fP. |
| 1822 |
.sp |
.sp |
| 1823 |
PCRE_ERROR_SHORTUTF8 (-25) |
PCRE_ERROR_SHORTUTF8 (-25) |
| 1824 |
.sp |
.sp |
| 1825 |
The subject string ended with an incomplete (truncated) UTF-8 character, and |
The subject string ended with an incomplete (truncated) UTF-8 character, and |
| 1826 |
the PCRE_PARTIAL_HARD option was set. Without this option, PCRE_ERROR_BADUTF8 |
the PCRE_PARTIAL_HARD option was set. Without this option, PCRE_ERROR_BADUTF8 |
| 1827 |
is returned in this situation. |
is returned in this situation. |
| 1828 |
.P |
.P |
| 1829 |
Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP. |
Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP. |
| 2252 |
.rs |
.rs |
| 2253 |
.sp |
.sp |
| 2254 |
.nf |
.nf |
| 2255 |
Last updated: 06 November 2010 |
Last updated: 13 November 2010 |
| 2256 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |
| 2257 |
.fi |
.fi |