| 1376 |
</b><br> |
</b><br> |
| 1377 |
<P> |
<P> |
| 1378 |
The subject string is passed to <b>pcre_exec()</b> as a pointer in |
The subject string is passed to <b>pcre_exec()</b> as a pointer in |
| 1379 |
<i>subject</i>, a length in <i>length</i>, and a starting byte offset in |
<i>subject</i>, a length (in bytes) in <i>length</i>, and a starting byte offset |
| 1380 |
<i>startoffset</i>. In UTF-8 mode, the byte offset must point to the start of a |
in <i>startoffset</i>. In UTF-8 mode, the byte offset must point to the start of |
| 1381 |
UTF-8 character. Unlike the pattern string, the subject may contain binary zero |
a UTF-8 character. Unlike the pattern string, the subject may contain binary |
| 1382 |
bytes. When the starting offset is zero, the search for a match starts at the |
zero bytes. When the starting offset is zero, the search for a match starts at |
| 1383 |
beginning of the subject, and this is by far the most common case. |
the beginning of the subject, and this is by far the most common case. |
| 1384 |
</P> |
</P> |
| 1385 |
<P> |
<P> |
| 1386 |
A non-zero starting offset is useful when searching for another match in the |
A non-zero starting offset is useful when searching for another match in the |
| 1418 |
kinds of parenthesized subpattern that do not cause substrings to be captured. |
kinds of parenthesized subpattern that do not cause substrings to be captured. |
| 1419 |
</P> |
</P> |
| 1420 |
<P> |
<P> |
| 1421 |
Captured substrings are returned to the caller via a vector of integer offsets |
Captured substrings are returned to the caller via a vector of integers whose |
| 1422 |
whose address is passed in <i>ovector</i>. The number of elements in the vector |
address is passed in <i>ovector</i>. The number of elements in the vector is |
| 1423 |
is passed in <i>ovecsize</i>, which must be a non-negative number. <b>Note</b>: |
passed in <i>ovecsize</i>, which must be a non-negative number. <b>Note</b>: this |
| 1424 |
this argument is NOT the size of <i>ovector</i> in bytes. |
argument is NOT the size of <i>ovector</i> in bytes. |
| 1425 |
</P> |
</P> |
| 1426 |
<P> |
<P> |
| 1427 |
The first two-thirds of the vector is used to pass back captured substrings, |
The first two-thirds of the vector is used to pass back captured substrings, |
| 1428 |
each substring using a pair of integers. The remaining third of the vector is |
each substring using a pair of integers. The remaining third of the vector is |
| 1429 |
used as workspace by <b>pcre_exec()</b> while matching capturing subpatterns, |
used as workspace by <b>pcre_exec()</b> while matching capturing subpatterns, |
| 1430 |
and is not available for passing back information. The length passed in |
and is not available for passing back information. The number passed in |
| 1431 |
<i>ovecsize</i> should always be a multiple of three. If it is not, it is |
<i>ovecsize</i> should always be a multiple of three. If it is not, it is |
| 1432 |
rounded down. |
rounded down. |
| 1433 |
</P> |
</P> |
| 1434 |
<P> |
<P> |
| 1435 |
When a match is successful, information about captured substrings is returned |
When a match is successful, information about captured substrings is returned |
| 1436 |
in pairs of integers, starting at the beginning of <i>ovector</i>, and |
in pairs of integers, starting at the beginning of <i>ovector</i>, and |
| 1437 |
continuing up to two-thirds of its length at the most. The first element of a |
continuing up to two-thirds of its length at the most. The first element of |
| 1438 |
pair is set to the offset of the first character in a substring, and the second |
each pair is set to the byte offset of the first character in a substring, and |
| 1439 |
is set to the offset of the first character after the end of a substring. The |
the second is set to the byte offset of the first character after the end of a |
| 1440 |
first pair, <i>ovector[0]</i> and <i>ovector[1]</i>, identify the portion of the |
substring. <b>Note</b>: these values are always byte offsets, even in UTF-8 |
| 1441 |
subject string matched by the entire pattern. The next pair is used for the |
mode. They are not character counts. |
| 1442 |
first capturing subpattern, and so on. The value returned by <b>pcre_exec()</b> |
</P> |
| 1443 |
is one more than the highest numbered pair that has been set. For example, if |
<P> |
| 1444 |
two substrings have been captured, the returned value is 3. If there are no |
The first pair of integers, <i>ovector[0]</i> and <i>ovector[1]</i>, identify the |
| 1445 |
capturing subpatterns, the return value from a successful match is 1, |
portion of the subject string matched by the entire pattern. The next pair is |
| 1446 |
indicating that just the first pair of offsets has been set. |
used for the first capturing subpattern, and so on. The value returned by |
| 1447 |
|
<b>pcre_exec()</b> is one more than the highest numbered pair that has been set. |
| 1448 |
|
For example, if two substrings have been captured, the returned value is 3. If |
| 1449 |
|
there are no capturing subpatterns, the return value from a successful match is |
| 1450 |
|
1, indicating that just the first pair of offsets has been set. |
| 1451 |
</P> |
</P> |
| 1452 |
<P> |
<P> |
| 1453 |
If a capturing subpattern is matched repeatedly, it is the last portion of the |
If a capturing subpattern is matched repeatedly, it is the last portion of the |
| 1456 |
<P> |
<P> |
| 1457 |
If the vector is too small to hold all the captured substring offsets, it is |
If the vector is too small to hold all the captured substring offsets, it is |
| 1458 |
used as far as possible (up to two-thirds of its length), and the function |
used as far as possible (up to two-thirds of its length), and the function |
| 1459 |
returns a value of zero. In particular, if the substring offsets are not of |
returns a value of zero. If the substring offsets are not of interest, |
| 1460 |
interest, <b>pcre_exec()</b> may be called with <i>ovector</i> passed as NULL and |
<b>pcre_exec()</b> may be called with <i>ovector</i> passed as NULL and |
| 1461 |
<i>ovecsize</i> as zero. However, if the pattern contains back references and |
<i>ovecsize</i> as zero. However, if the pattern contains back references and |
| 1462 |
the <i>ovector</i> is not big enough to remember the related substrings, PCRE |
the <i>ovector</i> is not big enough to remember the related substrings, PCRE |
| 1463 |
has to get additional memory for use during matching. Thus it is usually |
has to get additional memory for use during matching. Thus it is usually |
| 1976 |
</P> |
</P> |
| 1977 |
<br><a name="SEC22" href="#TOC1">REVISION</a><br> |
<br><a name="SEC22" href="#TOC1">REVISION</a><br> |
| 1978 |
<P> |
<P> |
| 1979 |
Last updated: 12 April 2008 |
Last updated: 24 August 2008 |
| 1980 |
<br> |
<br> |
| 1981 |
Copyright © 1997-2008 University of Cambridge. |
Copyright © 1997-2008 University of Cambridge. |
| 1982 |
<br> |
<br> |