| 20 |
.br |
.br |
| 21 |
.B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR," |
.B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR," |
| 22 |
.ti +5n |
.ti +5n |
| 23 |
.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIoptions\fR, |
.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIstartoffset\fR, |
| 24 |
.ti +5n |
.ti +5n |
| 25 |
.B int *\fIovector\fR, int \fIovecsize\fR); |
.B int \fIoptions\fR, int *\fIovector\fR, int \fIovecsize\fR); |
| 26 |
.PP |
.PP |
| 27 |
.br |
.br |
| 28 |
.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR, |
.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR, |
| 249 |
The tables are built in memory that is obtained via \fBpcre_malloc\fR. The |
The tables are built in memory that is obtained via \fBpcre_malloc\fR. The |
| 250 |
pointer that is passed to \fBpcre_compile\fR is saved with the compiled |
pointer that is passed to \fBpcre_compile\fR is saved with the compiled |
| 251 |
pattern, and the same tables are used via this pointer by \fBpcre_study()\fR |
pattern, and the same tables are used via this pointer by \fBpcre_study()\fR |
| 252 |
and \fBpcre_match()\fR. Thus for any single pattern, compilation, studying and |
and \fBpcre_exec()\fR. Thus for any single pattern, compilation, studying and |
| 253 |
matching all happen in the same locale, but different patterns can be compiled |
matching all happen in the same locale, but different patterns can be compiled |
| 254 |
in different locales. It is the caller's responsibility to ensure that the |
in different locales. It is the caller's responsibility to ensure that the |
| 255 |
memory containing the tables remains available for as long as it is needed. |
memory containing the tables remains available for as long as it is needed. |
| 293 |
pattern has been studied, the result of the study should be passed in the |
pattern has been studied, the result of the study should be passed in the |
| 294 |
\fIextra\fR argument. Otherwise this must be NULL. |
\fIextra\fR argument. Otherwise this must be NULL. |
| 295 |
|
|
|
The subject string is passed as a pointer in \fIsubject\fR and a length in |
|
|
\fIlength\fR. Unlike the pattern string, it may contain binary zero characters. |
|
|
|
|
| 296 |
The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose |
The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose |
| 297 |
unused bits must be zero. However, if a pattern was compiled with |
unused bits must be zero. However, if a pattern was compiled with |
| 298 |
PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it |
PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it |
| 313 |
it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never |
it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never |
| 314 |
to match. |
to match. |
| 315 |
|
|
| 316 |
|
The subject string is passed as a pointer in \fIsubject\fR, a length in |
| 317 |
|
\fIlength\fR, and a starting offset in \fIstartoffset\fR. Unlike the pattern |
| 318 |
|
string, it may contain binary zero characters. When the starting offset is |
| 319 |
|
zero, the search for a match starts at the beginning of the subject, and this |
| 320 |
|
is by far the most common case. |
| 321 |
|
|
| 322 |
|
A non-zero starting offset is useful when searching for another match in the |
| 323 |
|
same subject by calling \fBpcre_exec()\fR again after a previous success. |
| 324 |
|
Setting \fIstartoffset\fR differs from just passing over a shortened string and |
| 325 |
|
setting PCRE_NOTBOL in the case of a pattern that begins with any kind of |
| 326 |
|
lookbehind. For example, consider the pattern |
| 327 |
|
|
| 328 |
|
\\Biss\\B |
| 329 |
|
|
| 330 |
|
which finds occurrences of "iss" in the middle of words. (\\B matches only if |
| 331 |
|
the current position in the subject is not a word boundary.) When applied to |
| 332 |
|
the string "Mississipi" the first call to \fBpcre_exec()\fR finds the first |
| 333 |
|
occurrence. If \fBpcre_exec()\fR is called again with just the remainder of the |
| 334 |
|
subject, namely "issipi", it does not match, because \\B is always false at the |
| 335 |
|
start of the subject, which is deemed to be a word boundary. However, if |
| 336 |
|
\fBpcre_exec()\fR is passed the entire string again, but with \fIstartoffset\fR |
| 337 |
|
set to 4, it finds the second occurrence of "iss" because it is able to look |
| 338 |
|
behind the starting point to discover that it is preceded by a letter. |
| 339 |
|
|
| 340 |
|
If a non-zero starting offset is passed when the pattern is anchored, one |
| 341 |
|
attempt to match at the given offset is tried. This can only succeed if the |
| 342 |
|
pattern does not require the match to be at the start of the subject. |
| 343 |
|
|
| 344 |
In general, a pattern matches a certain portion of the subject, and in |
In general, a pattern matches a certain portion of the subject, and in |
| 345 |
addition, further substrings from the subject may be picked out by parts of the |
addition, further substrings from the subject may be picked out by parts of the |
| 346 |
pattern. Following the usage in Jeffrey Friedl's book, this is called |
pattern. Following the usage in Jeffrey Friedl's book, this is called |
| 755 |
The \\A, \\Z, and \\z assertions differ from the traditional circumflex and |
The \\A, \\Z, and \\z assertions differ from the traditional circumflex and |
| 756 |
dollar (described below) in that they only ever match at the very start and end |
dollar (described below) in that they only ever match at the very start and end |
| 757 |
of the subject string, whatever options are set. They are not affected by the |
of the subject string, whatever options are set. They are not affected by the |
| 758 |
PCRE_NOTBOL or PCRE_NOTEOL options. The difference between \\Z and \\z is that |
PCRE_NOTBOL or PCRE_NOTEOL options. If the \fIstartoffset\fR argument of |
| 759 |
\\Z matches before a newline that is the last character of the string as well |
\fBpcre_exec()\fR is non-zero, \\A can never match. The difference between \\Z |
| 760 |
as at the end of the string, whereas \\z matches only at the end. |
and \\z is that \\Z matches before a newline that is the last character of the |
| 761 |
|
string as well as at the end of the string, whereas \\z matches only at the |
| 762 |
|
end. |
| 763 |
|
|
| 764 |
|
|
| 765 |
.SH CIRCUMFLEX AND DOLLAR |
.SH CIRCUMFLEX AND DOLLAR |
| 766 |
Outside a character class, in the default matching mode, the circumflex |
Outside a character class, in the default matching mode, the circumflex |
| 767 |
character is an assertion which is true only if the current matching point is |
character is an assertion which is true only if the current matching point is |
| 768 |
at the start of the subject string. Inside a character class, circumflex has an |
at the start of the subject string. If the \fIstartoffset\fR argument of |
| 769 |
entirely different meaning (see below). |
\fBpcre_exec()\fR is non-zero, circumflex can never match. Inside a character |
| 770 |
|
class, circumflex has an entirely different meaning (see below). |
| 771 |
|
|
| 772 |
Circumflex need not be the first character of the pattern if a number of |
Circumflex need not be the first character of the pattern if a number of |
| 773 |
alternatives are involved, but it should be the first thing in each alternative |
alternatives are involved, but it should be the first thing in each alternative |
| 794 |
addition to matching at the start and end of the subject string. For example, |
addition to matching at the start and end of the subject string. For example, |
| 795 |
the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode, |
the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode, |
| 796 |
but not otherwise. Consequently, patterns that are anchored in single line mode |
but not otherwise. Consequently, patterns that are anchored in single line mode |
| 797 |
because all branches start with "^" are not anchored in multiline mode. The |
because all branches start with "^" are not anchored in multiline mode, and a |
| 798 |
PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set. |
match for circumflex is possible when the \fIstartoffset\fR argument of |
| 799 |
|
\fBpcre_exec()\fR is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if |
| 800 |
|
PCRE_MULTILINE is set. |
| 801 |
|
|
| 802 |
Note that the sequences \\A, \\Z, and \\z can be used to match the start and |
Note that the sequences \\A, \\Z, and \\z can be used to match the start and |
| 803 |
end of the subject in both modes, and if all branches of a pattern start with |
end of the subject in both modes, and if all branches of a pattern start with |
| 1249 |
preceded by "foo". |
preceded by "foo". |
| 1250 |
|
|
| 1251 |
Assertion subpatterns are not capturing subpatterns, and may not be repeated, |
Assertion subpatterns are not capturing subpatterns, and may not be repeated, |
| 1252 |
because it makes no sense to assert the same thing several times. If an |
because it makes no sense to assert the same thing several times. If any kind |
| 1253 |
assertion contains capturing subpatterns within it, these are always counted |
of assertion contains capturing subpatterns within it, these are counted for |
| 1254 |
for the purposes of numbering the capturing subpatterns in the whole pattern. |
the purposes of numbering the capturing subpatterns in the whole pattern. |
| 1255 |
Substring capturing is carried out for positive assertions, but it does not |
However, substring capturing is carried out only for positive assertions, |
| 1256 |
make sense for negative assertions. |
because it does not make sense for negative assertions. |
| 1257 |
|
|
| 1258 |
Assertions count towards the maximum of 200 parenthesized subpatterns. |
Assertions count towards the maximum of 200 parenthesized subpatterns. |
| 1259 |
|
|
| 1420 |
.br |
.br |
| 1421 |
Phone: +44 1223 334714 |
Phone: +44 1223 334714 |
| 1422 |
|
|
| 1423 |
|
Last updated: 10 June 1999 |
| 1424 |
|
.br |
| 1425 |
Copyright (c) 1997-1999 University of Cambridge. |
Copyright (c) 1997-1999 University of Cambridge. |