/[pcre]/code/trunk/doc/html/pcreposix.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcreposix.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 93 - (hide annotations) (download) (as text)
Sat Feb 24 21:41:42 2007 UTC (6 years, 2 months ago) by nigel
File MIME type: text/html
File size: 10007 byte(s)
Load pcre-7.0 into code/trunk.

1 nigel 63 <html>
2     <head>
3     <title>pcreposix specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 nigel 75 <h1>pcreposix man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10     <p>
11     This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14     <br>
15 nigel 63 <ul>
16     <li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
17     <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
18     <li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
19     <li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
20     <li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
21     <li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
22 nigel 75 <li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
23 nigel 63 <li><a name="TOC8" href="#SEC8">AUTHOR</a>
24     </ul>
25     <br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
26     <P>
27     <b>#include &#60;pcreposix.h&#62;</b>
28     </P>
29     <P>
30     <b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
31     <b>int <i>cflags</i>);</b>
32     </P>
33     <P>
34     <b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
35     <b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
36     </P>
37     <P>
38     <b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
39     <b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
40     </P>
41     <P>
42     <b>void regfree(regex_t *<i>preg</i>);</b>
43     </P>
44     <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
45     <P>
46     This set of functions provides a POSIX-style API to the PCRE regular expression
47     package. See the
48     <a href="pcreapi.html"><b>pcreapi</b></a>
49 nigel 77 documentation for a description of PCRE's native API, which contains much
50     additional functionality.
51 nigel 63 </P>
52     <P>
53     The functions described here are just wrapper functions that ultimately call
54     the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
55     header file, and on Unix systems the library itself is called
56     <b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
57 nigel 75 command for linking an application that uses them. Because the POSIX functions
58     call the native ones, it is also necessary to add <b>-lpcre</b>.
59 nigel 63 </P>
60     <P>
61     I have implemented only those option bits that can be reasonably mapped to PCRE
62 nigel 87 native options. In addition, the option REG_EXTENDED is defined with the value
63     zero. This has no effect, but since programs that are written to the POSIX
64     interface often use it, this makes it easier to slot in PCRE as a replacement
65     library. Other POSIX options are not even defined.
66 nigel 63 </P>
67     <P>
68     When PCRE is called via these functions, it is only the API that is POSIX-like
69     in style. The syntax and semantics of the regular expressions themselves are
70     still those of Perl, subject to the setting of various PCRE options, as
71 nigel 69 described below. "POSIX-like in style" means that the API approximates to the
72     POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
73     domains it is probably even less compatible.
74 nigel 63 </P>
75     <P>
76     The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
77     potential clash with other POSIX libraries. It can, of course, be renamed or
78     aliased as <b>regex.h</b>, which is the "correct" name. It provides two
79     structure types, <i>regex_t</i> for compiled internal forms, and
80     <i>regmatch_t</i> for returning captured substrings. It also defines some
81     constants whose names start with "REG_"; these are used for setting options and
82     identifying error codes.
83     </P>
84 nigel 75 <P>
85     </P>
86 nigel 63 <br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
87     <P>
88     The function <b>regcomp()</b> is called to compile a pattern into an
89     internal form. The pattern is a C string terminated by a binary zero, and
90     is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
91 nigel 75 to a <b>regex_t</b> structure that is used as a base for storing information
92 nigel 87 about the compiled regular expression.
93 nigel 63 </P>
94     <P>
95     The argument <i>cflags</i> is either zero, or contains one or more of the bits
96     defined by the following macros:
97     <pre>
98 nigel 77 REG_DOTALL
99     </pre>
100 nigel 87 The PCRE_DOTALL option is set when the regular expression is passed for
101     compilation to the native function. Note that REG_DOTALL is not part of the
102     POSIX standard.
103 nigel 77 <pre>
104 nigel 63 REG_ICASE
105 nigel 75 </pre>
106 nigel 87 The PCRE_CASELESS option is set when the regular expression is passed for
107     compilation to the native function.
108 nigel 63 <pre>
109     REG_NEWLINE
110 nigel 75 </pre>
111 nigel 87 The PCRE_MULTILINE option is set when the regular expression is passed for
112     compilation to the native function. Note that this does <i>not</i> mimic the
113     defined POSIX behaviour for REG_NEWLINE (see the following section).
114     <pre>
115     REG_NOSUB
116     </pre>
117     The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
118     for compilation to the native function. In addition, when a pattern that is
119     compiled with this flag is passed to <b>regexec()</b> for matching, the
120     <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
121     are returned.
122     <pre>
123     REG_UTF8
124     </pre>
125     The PCRE_UTF8 option is set when the regular expression is passed for
126     compilation to the native function. This causes the pattern itself and all data
127     strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
128     is not part of the POSIX standard.
129 nigel 63 </P>
130     <P>
131     In the absence of these flags, no options are passed to the native function.
132     This means the the regex is compiled with PCRE default semantics. In
133     particular, the way it handles newline characters in the subject string is the
134     Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
135     <i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
136     newlines are matched by . (they aren't) or by a negative class such as [^a]
137     (they are).
138     </P>
139     <P>
140     The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
141     <i>preg</i> structure is filled in on success, and one member of the structure
142     is public: <i>re_nsub</i> contains the number of capturing subpatterns in
143     the regular expression. Various error codes are defined in the header file.
144     </P>
145     <br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
146     <P>
147     This area is not simple, because POSIX and Perl take different views of things.
148     It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
149     intended to be a POSIX engine. The following table lists the different
150     possibilities for matching newline characters in PCRE:
151     <pre>
152     Default Change with
153 nigel 75
154 nigel 63 . matches newline no PCRE_DOTALL
155     newline matches [^a] yes not changeable
156     $ matches \n at end yes PCRE_DOLLARENDONLY
157     $ matches \n in middle no PCRE_MULTILINE
158     ^ matches \n in middle no PCRE_MULTILINE
159 nigel 75 </pre>
160 nigel 63 This is the equivalent table for POSIX:
161     <pre>
162     Default Change with
163 nigel 75
164     . matches newline yes REG_NEWLINE
165     newline matches [^a] yes REG_NEWLINE
166     $ matches \n at end no REG_NEWLINE
167     $ matches \n in middle no REG_NEWLINE
168     ^ matches \n in middle no REG_NEWLINE
169     </pre>
170 nigel 63 PCRE's behaviour is the same as Perl's, except that there is no equivalent for
171 nigel 75 PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
172 nigel 63 newline from matching [^a].
173     </P>
174     <P>
175     The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
176 nigel 75 PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
177 nigel 63 REG_NEWLINE action.
178     </P>
179     <br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
180     <P>
181 nigel 75 The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
182     against a given <i>string</i>, which is terminated by a zero byte, subject to
183     the options in <i>eflags</i>. These can be:
184 nigel 63 <pre>
185     REG_NOTBOL
186 nigel 75 </pre>
187 nigel 63 The PCRE_NOTBOL option is set when calling the underlying PCRE matching
188     function.
189     <pre>
190     REG_NOTEOL
191 nigel 75 </pre>
192 nigel 63 The PCRE_NOTEOL option is set when calling the underlying PCRE matching
193     function.
194     </P>
195     <P>
196 nigel 87 If the pattern was compiled with the REG_NOSUB flag, no data about any matched
197     strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
198     <b>regexec()</b> are ignored.
199 nigel 63 </P>
200     <P>
201 nigel 87 Otherwise,the portion of the string that was matched, and also any captured
202     substrings, are returned via the <i>pmatch</i> argument, which points to an
203     array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
204     members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
205     character of each substring and the offset to the first character after the end
206     of each substring, respectively. The 0th element of the vector relates to the
207     entire portion of <i>string</i> that was matched; subsequent elements relate to
208     the capturing subpatterns of the regular expression. Unused entries in the
209     array have both structure members set to -1.
210     </P>
211     <P>
212 nigel 63 A successful match yields a zero return; various error codes are defined in the
213     header file, of which REG_NOMATCH is the "expected" failure code.
214     </P>
215     <br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
216     <P>
217     The <b>regerror()</b> function maps a non-zero errorcode from either
218     <b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
219     NULL, the error should have arisen from the use of that structure. A message
220     terminated by a binary zero is placed in <i>errbuf</i>. The length of the
221     message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
222     function is the size of buffer needed to hold the whole message.
223     </P>
224 nigel 75 <br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
225 nigel 63 <P>
226     Compiling a regular expression causes memory to be allocated and associated
227     with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
228     memory, after which <i>preg</i> may no longer be used as a compiled expression.
229     </P>
230     <br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
231     <P>
232 nigel 77 Philip Hazel
233 nigel 63 <br>
234     University Computing Service,
235     <br>
236 nigel 93 Cambridge CB2 3QH, England.
237 nigel 63 </P>
238     <P>
239 nigel 87 Last updated: 16 January 2006
240 nigel 63 <br>
241 nigel 87 Copyright &copy; 1997-2006 University of Cambridge.
242 nigel 75 <p>
243     Return to the <a href="index.html">PCRE index page</a>.
244     </p>

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12