| 3 |
<title>pcreposix specification</title> |
<title>pcreposix specification</title> |
| 4 |
</head> |
</head> |
| 5 |
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> |
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> |
| 6 |
This HTML document has been generated automatically from the original man page. |
<h1>pcreposix man page</h1> |
| 7 |
If there is any nonsense in it, please consult the man page, in case the |
<p> |
| 8 |
conversion went wrong.<br> |
Return to the <a href="index.html">PCRE index page</a>. |
| 9 |
|
</p> |
| 10 |
|
<p> |
| 11 |
|
This page is part of the PCRE HTML documentation. It was generated automatically |
| 12 |
|
from the original man page. If there is any nonsense in it, please consult the |
| 13 |
|
man page, in case the conversion went wrong. |
| 14 |
|
<br> |
| 15 |
<ul> |
<ul> |
| 16 |
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a> |
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a> |
| 17 |
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a> |
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a> |
| 19 |
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a> |
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a> |
| 20 |
<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a> |
<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a> |
| 21 |
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a> |
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a> |
| 22 |
<li><a name="TOC7" href="#SEC7">STORAGE</a> |
<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a> |
| 23 |
<li><a name="TOC8" href="#SEC8">AUTHOR</a> |
<li><a name="TOC8" href="#SEC8">AUTHOR</a> |
| 24 |
</ul> |
</ul> |
| 25 |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br> |
| 46 |
This set of functions provides a POSIX-style API to the PCRE regular expression |
This set of functions provides a POSIX-style API to the PCRE regular expression |
| 47 |
package. See the |
package. See the |
| 48 |
<a href="pcreapi.html"><b>pcreapi</b></a> |
<a href="pcreapi.html"><b>pcreapi</b></a> |
| 49 |
documentation for a description of the native API, which contains additional |
documentation for a description of PCRE's native API, which contains additional |
| 50 |
functionality. |
functionality. |
| 51 |
</P> |
</P> |
| 52 |
<P> |
<P> |
| 54 |
the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b> |
the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b> |
| 55 |
header file, and on Unix systems the library itself is called |
header file, and on Unix systems the library itself is called |
| 56 |
<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the |
<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the |
| 57 |
command for linking an application which uses them. Because the POSIX functions |
command for linking an application that uses them. Because the POSIX functions |
| 58 |
call the native ones, it is also necessary to add \fR-lpcre\fR. |
call the native ones, it is also necessary to add <b>-lpcre</b>. |
| 59 |
</P> |
</P> |
| 60 |
<P> |
<P> |
| 61 |
I have implemented only those option bits that can be reasonably mapped to PCRE |
I have implemented only those option bits that can be reasonably mapped to PCRE |
| 81 |
constants whose names start with "REG_"; these are used for setting options and |
constants whose names start with "REG_"; these are used for setting options and |
| 82 |
identifying error codes. |
identifying error codes. |
| 83 |
</P> |
</P> |
| 84 |
|
<P> |
| 85 |
|
</P> |
| 86 |
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br> |
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br> |
| 87 |
<P> |
<P> |
| 88 |
The function <b>regcomp()</b> is called to compile a pattern into an |
The function <b>regcomp()</b> is called to compile a pattern into an |
| 89 |
internal form. The pattern is a C string terminated by a binary zero, and |
internal form. The pattern is a C string terminated by a binary zero, and |
| 90 |
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer |
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer |
| 91 |
to a regex_t structure which is used as a base for storing information about |
to a <b>regex_t</b> structure that is used as a base for storing information |
| 92 |
the compiled expression. |
about the compiled expression. |
| 93 |
</P> |
</P> |
| 94 |
<P> |
<P> |
| 95 |
The argument <i>cflags</i> is either zero, or contains one or more of the bits |
The argument <i>cflags</i> is either zero, or contains one or more of the bits |
| 96 |
defined by the following macros: |
defined by the following macros: |
|
</P> |
|
|
<P> |
|
| 97 |
<pre> |
<pre> |
| 98 |
REG_ICASE |
REG_ICASE |
| 99 |
</PRE> |
</pre> |
|
</P> |
|
|
<P> |
|
| 100 |
The PCRE_CASELESS option is set when the expression is passed for compilation |
The PCRE_CASELESS option is set when the expression is passed for compilation |
| 101 |
to the native function. |
to the native function. |
|
</P> |
|
|
<P> |
|
| 102 |
<pre> |
<pre> |
| 103 |
REG_NEWLINE |
REG_NEWLINE |
| 104 |
</PRE> |
</pre> |
|
</P> |
|
|
<P> |
|
| 105 |
The PCRE_MULTILINE option is set when the expression is passed for compilation |
The PCRE_MULTILINE option is set when the expression is passed for compilation |
| 106 |
to the native function. Note that this does <i>not</i> mimic the defined POSIX |
to the native function. Note that this does <i>not</i> mimic the defined POSIX |
| 107 |
behaviour for REG_NEWLINE (see the following section). |
behaviour for REG_NEWLINE (see the following section). |
| 127 |
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never |
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never |
| 128 |
intended to be a POSIX engine. The following table lists the different |
intended to be a POSIX engine. The following table lists the different |
| 129 |
possibilities for matching newline characters in PCRE: |
possibilities for matching newline characters in PCRE: |
|
</P> |
|
|
<P> |
|
| 130 |
<pre> |
<pre> |
| 131 |
Default Change with |
Default Change with |
| 132 |
</PRE> |
|
|
</P> |
|
|
<P> |
|
|
<pre> |
|
| 133 |
. matches newline no PCRE_DOTALL |
. matches newline no PCRE_DOTALL |
| 134 |
newline matches [^a] yes not changeable |
newline matches [^a] yes not changeable |
| 135 |
$ matches \n at end yes PCRE_DOLLARENDONLY |
$ matches \n at end yes PCRE_DOLLARENDONLY |
| 136 |
$ matches \n in middle no PCRE_MULTILINE |
$ matches \n in middle no PCRE_MULTILINE |
| 137 |
^ matches \n in middle no PCRE_MULTILINE |
^ matches \n in middle no PCRE_MULTILINE |
| 138 |
</PRE> |
</pre> |
|
</P> |
|
|
<P> |
|
| 139 |
This is the equivalent table for POSIX: |
This is the equivalent table for POSIX: |
|
</P> |
|
|
<P> |
|
| 140 |
<pre> |
<pre> |
| 141 |
Default Change with |
Default Change with |
| 142 |
</PRE> |
|
| 143 |
</P> |
. matches newline yes REG_NEWLINE |
| 144 |
<P> |
newline matches [^a] yes REG_NEWLINE |
| 145 |
<pre> |
$ matches \n at end no REG_NEWLINE |
| 146 |
. matches newline yes REG_NEWLINE |
$ matches \n in middle no REG_NEWLINE |
| 147 |
newline matches [^a] yes REG_NEWLINE |
^ matches \n in middle no REG_NEWLINE |
| 148 |
$ matches \n at end no REG_NEWLINE |
</pre> |
|
$ matches \n in middle no REG_NEWLINE |
|
|
^ matches \n in middle no REG_NEWLINE |
|
|
</PRE> |
|
|
</P> |
|
|
<P> |
|
| 149 |
PCRE's behaviour is the same as Perl's, except that there is no equivalent for |
PCRE's behaviour is the same as Perl's, except that there is no equivalent for |
| 150 |
PCRE_DOLLARENDONLY in Perl. In both PCRE and Perl, there is no way to stop |
PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop |
| 151 |
newline from matching [^a]. |
newline from matching [^a]. |
| 152 |
</P> |
</P> |
| 153 |
<P> |
<P> |
| 154 |
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and |
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and |
| 155 |
PCRE_DOLLARENDONLY, but there is no way to make PCRE behave exactly as for the |
PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the |
| 156 |
REG_NEWLINE action. |
REG_NEWLINE action. |
| 157 |
</P> |
</P> |
| 158 |
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br> |
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br> |
| 159 |
<P> |
<P> |
| 160 |
The function <b>regexec()</b> is called to match a pre-compiled pattern |
The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i> |
| 161 |
<i>preg</i> against a given <i>string</i>, which is terminated by a zero byte, |
against a given <i>string</i>, which is terminated by a zero byte, subject to |
| 162 |
subject to the options in <i>eflags</i>. These can be: |
the options in <i>eflags</i>. These can be: |
|
</P> |
|
|
<P> |
|
| 163 |
<pre> |
<pre> |
| 164 |
REG_NOTBOL |
REG_NOTBOL |
| 165 |
</PRE> |
</pre> |
|
</P> |
|
|
<P> |
|
| 166 |
The PCRE_NOTBOL option is set when calling the underlying PCRE matching |
The PCRE_NOTBOL option is set when calling the underlying PCRE matching |
| 167 |
function. |
function. |
|
</P> |
|
|
<P> |
|
| 168 |
<pre> |
<pre> |
| 169 |
REG_NOTEOL |
REG_NOTEOL |
| 170 |
</PRE> |
</pre> |
|
</P> |
|
|
<P> |
|
| 171 |
The PCRE_NOTEOL option is set when calling the underlying PCRE matching |
The PCRE_NOTEOL option is set when calling the underlying PCRE matching |
| 172 |
function. |
function. |
| 173 |
</P> |
</P> |
| 195 |
message, including the zero, is limited to <i>errbuf_size</i>. The yield of the |
message, including the zero, is limited to <i>errbuf_size</i>. The yield of the |
| 196 |
function is the size of buffer needed to hold the whole message. |
function is the size of buffer needed to hold the whole message. |
| 197 |
</P> |
</P> |
| 198 |
<br><a name="SEC7" href="#TOC1">STORAGE</a><br> |
<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br> |
| 199 |
<P> |
<P> |
| 200 |
Compiling a regular expression causes memory to be allocated and associated |
Compiling a regular expression causes memory to be allocated and associated |
| 201 |
with the <i>preg</i> structure. The function <b>regfree()</b> frees all such |
with the <i>preg</i> structure. The function <b>regfree()</b> frees all such |
| 210 |
Cambridge CB2 3QG, England. |
Cambridge CB2 3QG, England. |
| 211 |
</P> |
</P> |
| 212 |
<P> |
<P> |
| 213 |
Last updated: 03 February 2003 |
Last updated: 07 September 2004 |
| 214 |
<br> |
<br> |
| 215 |
Copyright © 1997-2003 University of Cambridge. |
Copyright © 1997-2004 University of Cambridge. |
| 216 |
|
<p> |
| 217 |
|
Return to the <a href="index.html">PCRE index page</a>. |
| 218 |
|
</p> |