| 21 |
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a> |
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a> |
| 22 |
<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a> |
<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a> |
| 23 |
<li><a name="TOC8" href="#SEC8">AUTHOR</a> |
<li><a name="TOC8" href="#SEC8">AUTHOR</a> |
| 24 |
|
<li><a name="TOC9" href="#SEC9">REVISION</a> |
| 25 |
</ul> |
</ul> |
| 26 |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br> |
| 27 |
<P> |
<P> |
| 59 |
call the native ones, it is also necessary to add <b>-lpcre</b>. |
call the native ones, it is also necessary to add <b>-lpcre</b>. |
| 60 |
</P> |
</P> |
| 61 |
<P> |
<P> |
| 62 |
I have implemented only those option bits that can be reasonably mapped to PCRE |
I have implemented only those POSIX option bits that can be reasonably mapped |
| 63 |
native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined |
to PCRE native options. In addition, the option REG_EXTENDED is defined with |
| 64 |
with the value zero. They have no effect, but since programs that are written |
the value zero. This has no effect, but since programs that are written to the |
| 65 |
to the POSIX interface often use them, this makes it easier to slot in PCRE as |
POSIX interface often use it, this makes it easier to slot in PCRE as a |
| 66 |
a replacement library. Other POSIX options are not even defined. |
replacement library. Other POSIX options are not even defined. |
| 67 |
|
</P> |
| 68 |
|
<P> |
| 69 |
|
There are also some other options that are not defined by POSIX. These have |
| 70 |
|
been added at the request of users who want to make use of certain |
| 71 |
|
PCRE-specific features via the POSIX calling interface. |
| 72 |
</P> |
</P> |
| 73 |
<P> |
<P> |
| 74 |
When PCRE is called via these functions, it is only the API that is POSIX-like |
When PCRE is called via these functions, it is only the API that is POSIX-like |
| 95 |
internal form. The pattern is a C string terminated by a binary zero, and |
internal form. The pattern is a C string terminated by a binary zero, and |
| 96 |
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer |
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer |
| 97 |
to a <b>regex_t</b> structure that is used as a base for storing information |
to a <b>regex_t</b> structure that is used as a base for storing information |
| 98 |
about the compiled expression. |
about the compiled regular expression. |
| 99 |
</P> |
</P> |
| 100 |
<P> |
<P> |
| 101 |
The argument <i>cflags</i> is either zero, or contains one or more of the bits |
The argument <i>cflags</i> is either zero, or contains one or more of the bits |
| 103 |
<pre> |
<pre> |
| 104 |
REG_DOTALL |
REG_DOTALL |
| 105 |
</pre> |
</pre> |
| 106 |
The PCRE_DOTALL option is set when the expression is passed for compilation to |
The PCRE_DOTALL option is set when the regular expression is passed for |
| 107 |
the native function. Note that REG_DOTALL is not part of the POSIX standard. |
compilation to the native function. Note that REG_DOTALL is not part of the |
| 108 |
|
POSIX standard. |
| 109 |
<pre> |
<pre> |
| 110 |
REG_ICASE |
REG_ICASE |
| 111 |
</pre> |
</pre> |
| 112 |
The PCRE_CASELESS option is set when the expression is passed for compilation |
The PCRE_CASELESS option is set when the regular expression is passed for |
| 113 |
to the native function. |
compilation to the native function. |
| 114 |
<pre> |
<pre> |
| 115 |
REG_NEWLINE |
REG_NEWLINE |
| 116 |
</pre> |
</pre> |
| 117 |
The PCRE_MULTILINE option is set when the expression is passed for compilation |
The PCRE_MULTILINE option is set when the regular expression is passed for |
| 118 |
to the native function. Note that this does <i>not</i> mimic the defined POSIX |
compilation to the native function. Note that this does <i>not</i> mimic the |
| 119 |
behaviour for REG_NEWLINE (see the following section). |
defined POSIX behaviour for REG_NEWLINE (see the following section). |
| 120 |
|
<pre> |
| 121 |
|
REG_NOSUB |
| 122 |
|
</pre> |
| 123 |
|
The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed |
| 124 |
|
for compilation to the native function. In addition, when a pattern that is |
| 125 |
|
compiled with this flag is passed to <b>regexec()</b> for matching, the |
| 126 |
|
<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings |
| 127 |
|
are returned. |
| 128 |
|
<pre> |
| 129 |
|
REG_UNGREEDY |
| 130 |
|
</pre> |
| 131 |
|
The PCRE_UNGREEDY option is set when the regular expression is passed for |
| 132 |
|
compilation to the native function. Note that REG_UNGREEDY is not part of the |
| 133 |
|
POSIX standard. |
| 134 |
|
<pre> |
| 135 |
|
REG_UTF8 |
| 136 |
|
</pre> |
| 137 |
|
The PCRE_UTF8 option is set when the regular expression is passed for |
| 138 |
|
compilation to the native function. This causes the pattern itself and all data |
| 139 |
|
strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8 |
| 140 |
|
is not part of the POSIX standard. |
| 141 |
</P> |
</P> |
| 142 |
<P> |
<P> |
| 143 |
In the absence of these flags, no options are passed to the native function. |
In the absence of these flags, no options are passed to the native function. |
| 145 |
particular, the way it handles newline characters in the subject string is the |
particular, the way it handles newline characters in the subject string is the |
| 146 |
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only |
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only |
| 147 |
<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way |
<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way |
| 148 |
newlines are matched by . (they aren't) or by a negative class such as [^a] |
newlines are matched by . (they are not) or by a negative class such as [^a] |
| 149 |
(they are). |
(they are). |
| 150 |
</P> |
</P> |
| 151 |
<P> |
<P> |
| 154 |
is public: <i>re_nsub</i> contains the number of capturing subpatterns in |
is public: <i>re_nsub</i> contains the number of capturing subpatterns in |
| 155 |
the regular expression. Various error codes are defined in the header file. |
the regular expression. Various error codes are defined in the header file. |
| 156 |
</P> |
</P> |
| 157 |
|
<P> |
| 158 |
|
NOTE: If the yield of <b>regcomp()</b> is non-zero, you must not attempt to |
| 159 |
|
use the contents of the <i>preg</i> structure. If, for example, you pass it to |
| 160 |
|
<b>regexec()</b>, the result is undefined and your program is likely to crash. |
| 161 |
|
</P> |
| 162 |
<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br> |
<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br> |
| 163 |
<P> |
<P> |
| 164 |
This area is not simple, because POSIX and Perl take different views of things. |
This area is not simple, because POSIX and Perl take different views of things. |
| 196 |
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br> |
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br> |
| 197 |
<P> |
<P> |
| 198 |
The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i> |
The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i> |
| 199 |
against a given <i>string</i>, which is terminated by a zero byte, subject to |
against a given <i>string</i>, which is by default terminated by a zero byte |
| 200 |
the options in <i>eflags</i>. These can be: |
(but see REG_STARTEND below), subject to the options in <i>eflags</i>. These can |
| 201 |
|
be: |
| 202 |
<pre> |
<pre> |
| 203 |
REG_NOTBOL |
REG_NOTBOL |
| 204 |
</pre> |
</pre> |
| 205 |
The PCRE_NOTBOL option is set when calling the underlying PCRE matching |
The PCRE_NOTBOL option is set when calling the underlying PCRE matching |
| 206 |
function. |
function. |
| 207 |
<pre> |
<pre> |
| 208 |
|
REG_NOTEMPTY |
| 209 |
|
</pre> |
| 210 |
|
The PCRE_NOTEMPTY option is set when calling the underlying PCRE matching |
| 211 |
|
function. Note that REG_NOTEMPTY is not part of the POSIX standard. However, |
| 212 |
|
setting this option can give more POSIX-like behaviour in some situations. |
| 213 |
|
<pre> |
| 214 |
REG_NOTEOL |
REG_NOTEOL |
| 215 |
</pre> |
</pre> |
| 216 |
The PCRE_NOTEOL option is set when calling the underlying PCRE matching |
The PCRE_NOTEOL option is set when calling the underlying PCRE matching |
| 217 |
function. |
function. |
| 218 |
</P> |
<pre> |
| 219 |
<P> |
REG_STARTEND |
| 220 |
The portion of the string that was matched, and also any captured substrings, |
</pre> |
| 221 |
are returned via the <i>pmatch</i> argument, which points to an array of |
The string is considered to start at <i>string</i> + <i>pmatch[0].rm_so</i> and |
| 222 |
<i>nmatch</i> structures of type <i>regmatch_t</i>, containing the members |
to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i> |
| 223 |
<i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first character of |
(there need not actually be a NUL at that location), regardless of the value of |
| 224 |
each substring and the offset to the first character after the end of each |
<i>nmatch</i>. This is a BSD extension, compatible with but not specified by |
| 225 |
substring, respectively. The 0th element of the vector relates to the entire |
IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software |
| 226 |
portion of <i>string</i> that was matched; subsequent elements relate to the |
intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does |
| 227 |
capturing subpatterns of the regular expression. Unused entries in the array |
not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not |
| 228 |
have both structure members set to -1. |
how it is matched. |
| 229 |
|
</P> |
| 230 |
|
<P> |
| 231 |
|
If the pattern was compiled with the REG_NOSUB flag, no data about any matched |
| 232 |
|
strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of |
| 233 |
|
<b>regexec()</b> are ignored. |
| 234 |
|
</P> |
| 235 |
|
<P> |
| 236 |
|
If the value of <i>nmatch</i> is zero, or if the value <i>pmatch</i> is NULL, |
| 237 |
|
no data about any matched strings is returned. |
| 238 |
|
</P> |
| 239 |
|
<P> |
| 240 |
|
Otherwise,the portion of the string that was matched, and also any captured |
| 241 |
|
substrings, are returned via the <i>pmatch</i> argument, which points to an |
| 242 |
|
array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the |
| 243 |
|
members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first |
| 244 |
|
character of each substring and the offset to the first character after the end |
| 245 |
|
of each substring, respectively. The 0th element of the vector relates to the |
| 246 |
|
entire portion of <i>string</i> that was matched; subsequent elements relate to |
| 247 |
|
the capturing subpatterns of the regular expression. Unused entries in the |
| 248 |
|
array have both structure members set to -1. |
| 249 |
</P> |
</P> |
| 250 |
<P> |
<P> |
| 251 |
A successful match yields a zero return; various error codes are defined in the |
A successful match yields a zero return; various error codes are defined in the |
| 270 |
<P> |
<P> |
| 271 |
Philip Hazel |
Philip Hazel |
| 272 |
<br> |
<br> |
| 273 |
University Computing Service, |
University Computing Service |
| 274 |
|
<br> |
| 275 |
|
Cambridge CB2 3QH, England. |
| 276 |
<br> |
<br> |
|
Cambridge CB2 3QG, England. |
|
| 277 |
</P> |
</P> |
| 278 |
|
<br><a name="SEC9" href="#TOC1">REVISION</a><br> |
| 279 |
<P> |
<P> |
| 280 |
Last updated: 28 February 2005 |
Last updated: 02 September 2009 |
| 281 |
|
<br> |
| 282 |
|
Copyright © 1997-2009 University of Cambridge. |
| 283 |
<br> |
<br> |
|
Copyright © 1997-2005 University of Cambridge. |
|
| 284 |
<p> |
<p> |
| 285 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |
| 286 |
</p> |
</p> |