| 1515 |
\fBpcre\fP |
\fBpcre\fP |
| 1516 |
.\" |
.\" |
| 1517 |
page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns |
page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns |
| 1518 |
the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value, |
the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains a value that does |
| 1519 |
|
not point to the start of a UTF-8 character (or to the end of the subject), |
| 1520 |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
| 1521 |
.P |
.P |
| 1522 |
If you already know that your subject is valid, and you want to skip these |
If you already know that your subject is valid, and you want to skip these |
| 1524 |
calling \fBpcre_exec()\fP. You might want to do this for the second and |
calling \fBpcre_exec()\fP. You might want to do this for the second and |
| 1525 |
subsequent calls to \fBpcre_exec()\fP if you are making repeated calls to find |
subsequent calls to \fBpcre_exec()\fP if you are making repeated calls to find |
| 1526 |
all the matches in a single subject string. However, you should be sure that |
all the matches in a single subject string. However, you should be sure that |
| 1527 |
the value of \fIstartoffset\fP points to the start of a UTF-8 character. When |
the value of \fIstartoffset\fP points to the start of a UTF-8 character (or the |
| 1528 |
PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a |
end of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an |
| 1529 |
subject, or a value of \fIstartoffset\fP that does not point to the start of a |
invalid UTF-8 string as a subject or an invalid value of \fIstartoffset\fP is |
| 1530 |
UTF-8 character, is undefined. Your program may crash. |
undefined. Your program may crash. |
| 1531 |
.sp |
.sp |
| 1532 |
PCRE_PARTIAL_HARD |
PCRE_PARTIAL_HARD |
| 1533 |
PCRE_PARTIAL_SOFT |
PCRE_PARTIAL_SOFT |
| 1556 |
.\" |
.\" |
| 1557 |
documentation. |
documentation. |
| 1558 |
. |
. |
| 1559 |
|
. |
| 1560 |
.SS "The string to be matched by \fBpcre_exec()\fP" |
.SS "The string to be matched by \fBpcre_exec()\fP" |
| 1561 |
.rs |
.rs |
| 1562 |
.sp |
.sp |
| 1563 |
The subject string is passed to \fBpcre_exec()\fP as a pointer in |
The subject string is passed to \fBpcre_exec()\fP as a pointer in |
| 1564 |
\fIsubject\fP, a length (in bytes) in \fIlength\fP, and a starting byte offset |
\fIsubject\fP, a length (in bytes) in \fIlength\fP, and a starting byte offset |
| 1565 |
in \fIstartoffset\fP. In UTF-8 mode, the byte offset must point to the start of |
in \fIstartoffset\fP. If this is negative or greater than the length of the |
| 1566 |
a UTF-8 character. Unlike the pattern string, the subject may contain binary |
subject, \fBpcre_exec()\fP returns PCRE_ERROR_BADOFFSET. |
| 1567 |
zero bytes. When the starting offset is zero, the search for a match starts at |
.P |
| 1568 |
the beginning of the subject, and this is by far the most common case. |
In UTF-8 mode, the byte offset must point to the start of a UTF-8 character (or |
| 1569 |
|
the end of the subject). Unlike the pattern string, the subject may contain |
| 1570 |
|
binary zero bytes. When the starting offset is zero, the search for a match |
| 1571 |
|
starts at the beginning of the subject, and this is by far the most common |
| 1572 |
|
case. |
| 1573 |
.P |
.P |
| 1574 |
A non-zero starting offset is useful when searching for another match in the |
A non-zero starting offset is useful when searching for another match in the |
| 1575 |
same subject by calling \fBpcre_exec()\fP again after a previous success. |
same subject by calling \fBpcre_exec()\fP again after a previous success. |
| 1589 |
set to 4, it finds the second occurrence of "iss" because it is able to look |
set to 4, it finds the second occurrence of "iss" because it is able to look |
| 1590 |
behind the starting point to discover that it is preceded by a letter. |
behind the starting point to discover that it is preceded by a letter. |
| 1591 |
.P |
.P |
| 1592 |
|
Finding all the matches in a subject is tricky when the pattern can match an |
| 1593 |
|
empty string. It is possible to emulate Perl's /g behaviour by first trying the |
| 1594 |
|
match again at the same offset, with the PCRE_NOTEMPTY_ATSTART and |
| 1595 |
|
PCRE_ANCHORED options, and then if that fails, advancing the starting offset |
| 1596 |
|
and trying an ordinary match again. There is some code that demonstrates how to |
| 1597 |
|
do this in the |
| 1598 |
|
.\" HREF |
| 1599 |
|
\fBpcredemo\fP |
| 1600 |
|
.\" |
| 1601 |
|
sample program. In the most general case, you have to check to see if the |
| 1602 |
|
newline convention recognizes CRLF as a newline, and if so, and the current |
| 1603 |
|
character is CR followed by LF, advance the starting offset by two characters |
| 1604 |
|
instead of one. |
| 1605 |
|
.P |
| 1606 |
If a non-zero starting offset is passed when the pattern is anchored, one |
If a non-zero starting offset is passed when the pattern is anchored, one |
| 1607 |
attempt to match at the given offset is made. This can only succeed if the |
attempt to match at the given offset is made. This can only succeed if the |
| 1608 |
pattern does not require the match to be at the start of the subject. |
pattern does not require the match to be at the start of the subject. |
| 1609 |
. |
. |
| 1610 |
|
. |
| 1611 |
.SS "How \fBpcre_exec()\fP returns captured substrings" |
.SS "How \fBpcre_exec()\fP returns captured substrings" |
| 1612 |
.rs |
.rs |
| 1613 |
.sp |
.sp |
| 1790 |
PCRE_ERROR_BADNEWLINE (-23) |
PCRE_ERROR_BADNEWLINE (-23) |
| 1791 |
.sp |
.sp |
| 1792 |
An invalid combination of PCRE_NEWLINE_\fIxxx\fP options was given. |
An invalid combination of PCRE_NEWLINE_\fIxxx\fP options was given. |
| 1793 |
|
.sp |
| 1794 |
|
PCRE_ERROR_BADOFFSET (-24) |
| 1795 |
|
.sp |
| 1796 |
|
The value of \fIstartoffset\fP was negative or greater than the length of the |
| 1797 |
|
subject, that is, the value in \fIlength\fP. |
| 1798 |
.P |
.P |
| 1799 |
Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP. |
Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP. |
| 1800 |
. |
. |
| 2222 |
.rs |
.rs |
| 2223 |
.sp |
.sp |
| 2224 |
.nf |
.nf |
| 2225 |
Last updated: 01 November 2010 |
Last updated: 06 November 2010 |
| 2226 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |
| 2227 |
.fi |
.fi |