ViewVC logotype

Contents of /code/trunk/doc/pcreposix.3

Parent Directory Parent Directory | Revision Log Revision Log

Revision 903 - (hide annotations) (download)
Sat Jan 21 16:37:17 2012 UTC (3 years, 3 months ago) by ph10
File size: 10661 byte(s)
Source file tidies for 8.30-RC1 release; fix Makefile.am bugs for building 
symbolic links to man pages.

1 nigel 79 .TH PCREPOSIX 3
2 nigel 41 .SH NAME
3 nigel 63 PCRE - Perl-compatible regular expressions.
5     .rs
6     .sp
7 nigel 41 .B #include <pcreposix.h>
8     .PP
9     .SM
10 nigel 75 .B int regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP,
11 nigel 41 .ti +5n
12 nigel 75 .B int \fIcflags\fP);
13 nigel 41 .PP
14 nigel 75 .B int regexec(regex_t *\fIpreg\fP, const char *\fIstring\fP,
15 nigel 41 .ti +5n
16 nigel 75 .B size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);
17 nigel 41 .PP
18 nigel 75 .B size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP,
19 nigel 41 .ti +5n
20 nigel 75 .B char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);
21 nigel 41 .PP
22 nigel 75 .B void regfree(regex_t *\fIpreg\fP);
23     .
24 nigel 41 .SH DESCRIPTION
25 nigel 63 .rs
26     .sp
27 ph10 859 This set of functions provides a POSIX-style API for the PCRE regular
28     expression 8-bit library. See the
29 nigel 63 .\" HREF
30 nigel 75 \fBpcreapi\fP
31 nigel 63 .\"
32 nigel 77 documentation for a description of PCRE's native API, which contains much
33 ph10 903 additional functionality. There is no POSIX-style wrapper for PCRE's 16-bit
34 ph10 859 library.
35 nigel 75 .P
36 nigel 41 The functions described here are just wrapper functions that ultimately call
37 nigel 75 the PCRE native API. Their prototypes are defined in the \fBpcreposix.h\fP
38 nigel 63 header file, and on Unix systems the library itself is called
39 nigel 75 \fBpcreposix.a\fP, so can be accessed by adding \fB-lpcreposix\fP to the
40     command for linking an application that uses them. Because the POSIX functions
41     call the native ones, it is also necessary to add \fB-lpcre\fP.
42     .P
43 ph10 388 I have implemented only those POSIX option bits that can be reasonably mapped
44     to PCRE native options. In addition, the option REG_EXTENDED is defined with
45     the value zero. This has no effect, but since programs that are written to the
46     POSIX interface often use it, this makes it easier to slot in PCRE as a
47     replacement library. Other POSIX options are not even defined.
48 nigel 75 .P
49 ph10 432 There are also some other options that are not defined by POSIX. These have
50     been added at the request of users who want to make use of certain
51     PCRE-specific features via the POSIX calling interface.
52     .P
53 nigel 41 When PCRE is called via these functions, it is only the API that is POSIX-like
54     in style. The syntax and semantics of the regular expressions themselves are
55     still those of Perl, subject to the setting of various PCRE options, as
56 nigel 69 described below. "POSIX-like in style" means that the API approximates to the
57     POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
58     domains it is probably even less compatible.
59 nigel 75 .P
60     The header for these functions is supplied as \fBpcreposix.h\fP to avoid any
61 nigel 41 potential clash with other POSIX libraries. It can, of course, be renamed or
62 nigel 75 aliased as \fBregex.h\fP, which is the "correct" name. It provides two
63     structure types, \fIregex_t\fP for compiled internal forms, and
64     \fIregmatch_t\fP for returning captured substrings. It also defines some
65 nigel 41 constants whose names start with "REG_"; these are used for setting options and
66     identifying error codes.
67 ph10 518 .
68     .
70 nigel 63 .rs
71     .sp
72 nigel 75 The function \fBregcomp()\fP is called to compile a pattern into an
73 nigel 41 internal form. The pattern is a C string terminated by a binary zero, and
74 nigel 75 is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer
75     to a \fBregex_t\fP structure that is used as a base for storing information
76 nigel 87 about the compiled regular expression.
77 nigel 75 .P
78     The argument \fIcflags\fP is either zero, or contains one or more of the bits
79 nigel 41 defined by the following macros:
80 nigel 75 .sp
81 nigel 77 REG_DOTALL
82     .sp
83 nigel 87 The PCRE_DOTALL option is set when the regular expression is passed for
84     compilation to the native function. Note that REG_DOTALL is not part of the
85     POSIX standard.
86 nigel 77 .sp
87 nigel 41 REG_ICASE
88 nigel 75 .sp
89 nigel 87 The PCRE_CASELESS option is set when the regular expression is passed for
90     compilation to the native function.
91 nigel 75 .sp
92 nigel 41 REG_NEWLINE
93 nigel 75 .sp
94 nigel 87 The PCRE_MULTILINE option is set when the regular expression is passed for
95     compilation to the native function. Note that this does \fInot\fP mimic the
96     defined POSIX behaviour for REG_NEWLINE (see the following section).
97     .sp
98     REG_NOSUB
99     .sp
100     The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
101     for compilation to the native function. In addition, when a pattern that is
102     compiled with this flag is passed to \fBregexec()\fP for matching, the
103     \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no captured strings
104     are returned.
105     .sp
106 ph10 518 REG_UCP
107     .sp
108     The PCRE_UCP option is set when the regular expression is passed for
109     compilation to the native function. This causes PCRE to use Unicode properties
110     when matchine \ed, \ew, etc., instead of just recognizing ASCII values. Note
111     that REG_UTF8 is not part of the POSIX standard.
112     .sp
113 ph10 432 REG_UNGREEDY
114     .sp
115 ph10 461 The PCRE_UNGREEDY option is set when the regular expression is passed for
116 ph10 432 compilation to the native function. Note that REG_UNGREEDY is not part of the
117 ph10 461 POSIX standard.
118 ph10 432 .sp
119 nigel 87 REG_UTF8
120     .sp
121     The PCRE_UTF8 option is set when the regular expression is passed for
122     compilation to the native function. This causes the pattern itself and all data
123     strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
124     is not part of the POSIX standard.
125 nigel 75 .P
126 nigel 49 In the absence of these flags, no options are passed to the native function.
127     This means the the regex is compiled with PCRE default semantics. In
128     particular, the way it handles newline characters in the subject string is the
129     Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
130 nigel 75 \fIsome\fP of the effects specified for REG_NEWLINE. It does not affect the way
131 ph10 432 newlines are matched by . (they are not) or by a negative class such as [^a]
132 nigel 63 (they are).
133 nigel 75 .P
134     The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
135     \fIpreg\fP structure is filled in on success, and one member of the structure
136     is public: \fIre_nsub\fP contains the number of capturing subpatterns in
137 nigel 41 the regular expression. Various error codes are defined in the header file.
138 ph10 424 .P
139     NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to
140     use the contents of the \fIpreg\fP structure. If, for example, you pass it to
141     \fBregexec()\fP, the result is undefined and your program is likely to crash.
142 nigel 75 .
143     .
145 nigel 63 .rs
146     .sp
147     This area is not simple, because POSIX and Perl take different views of things.
148     It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
149     intended to be a POSIX engine. The following table lists the different
150     possibilities for matching newline characters in PCRE:
151 nigel 75 .sp
152 nigel 63 Default Change with
153 nigel 75 .sp
154 nigel 63 . matches newline no PCRE_DOTALL
155     newline matches [^a] yes not changeable
156 nigel 75 $ matches \en at end yes PCRE_DOLLARENDONLY
157     $ matches \en in middle no PCRE_MULTILINE
158     ^ matches \en in middle no PCRE_MULTILINE
159     .sp
160 nigel 63 This is the equivalent table for POSIX:
161 nigel 75 .sp
162 nigel 63 Default Change with
163 nigel 75 .sp
164     . matches newline yes REG_NEWLINE
165     newline matches [^a] yes REG_NEWLINE
166     $ matches \en at end no REG_NEWLINE
167     $ matches \en in middle no REG_NEWLINE
168     ^ matches \en in middle no REG_NEWLINE
169     .sp
170 nigel 63 PCRE's behaviour is the same as Perl's, except that there is no equivalent for
171 nigel 75 PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
172 nigel 63 newline from matching [^a].
173 nigel 75 .P
174 nigel 63 The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
175 nigel 75 PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
176 nigel 63 REG_NEWLINE action.
177 nigel 75 .
178     .
180 nigel 63 .rs
181     .sp
182 nigel 75 The function \fBregexec()\fP is called to match a compiled pattern \fIpreg\fP
183 ph10 345 against a given \fIstring\fP, which is by default terminated by a zero byte
184 ph10 332 (but see REG_STARTEND below), subject to the options in \fIeflags\fP. These can
185     be:
186 nigel 75 .sp
187 nigel 41 REG_NOTBOL
188 nigel 75 .sp
189 nigel 41 The PCRE_NOTBOL option is set when calling the underlying PCRE matching
190     function.
191 nigel 75 .sp
192 ph10 388 REG_NOTEMPTY
193 ph10 392 .sp
194 ph10 388 The PCRE_NOTEMPTY option is set when calling the underlying PCRE matching
195 ph10 392 function. Note that REG_NOTEMPTY is not part of the POSIX standard. However,
196 ph10 388 setting this option can give more POSIX-like behaviour in some situations.
197     .sp
198 nigel 41 REG_NOTEOL
199 nigel 75 .sp
200 nigel 41 The PCRE_NOTEOL option is set when calling the underlying PCRE matching
201     function.
202 ph10 332 .sp
204     .sp
205     The string is considered to start at \fIstring\fP + \fIpmatch[0].rm_so\fP and
206     to have a terminating NUL located at \fIstring\fP + \fIpmatch[0].rm_eo\fP
207     (there need not actually be a NUL at that location), regardless of the value of
208     \fInmatch\fP. This is a BSD extension, compatible with but not specified by
209     IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
210     intended to be portable to other systems. Note that a non-zero \fIrm_so\fP does
211     not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
212     how it is matched.
213 nigel 75 .P
214 nigel 87 If the pattern was compiled with the REG_NOSUB flag, no data about any matched
215     strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
216     \fBregexec()\fP are ignored.
217 nigel 75 .P
218 ph10 433 If the value of \fInmatch\fP is zero, or if the value \fIpmatch\fP is NULL,
219     no data about any matched strings is returned.
220     .P
221 nigel 87 Otherwise,the portion of the string that was matched, and also any captured
222     substrings, are returned via the \fIpmatch\fP argument, which points to an
223     array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the
224     members \fIrm_so\fP and \fIrm_eo\fP. These contain the offset to the first
225     character of each substring and the offset to the first character after the end
226     of each substring, respectively. The 0th element of the vector relates to the
227     entire portion of \fIstring\fP that was matched; subsequent elements relate to
228     the capturing subpatterns of the regular expression. Unused entries in the
229     array have both structure members set to -1.
230     .P
231 nigel 41 A successful match yields a zero return; various error codes are defined in the
232     header file, of which REG_NOMATCH is the "expected" failure code.
233 nigel 75 .
234     .
236 nigel 63 .rs
237     .sp
238 nigel 75 The \fBregerror()\fP function maps a non-zero errorcode from either
239     \fBregcomp()\fP or \fBregexec()\fP to a printable message. If \fIpreg\fP is not
240 nigel 41 NULL, the error should have arisen from the use of that structure. A message
241 nigel 75 terminated by a binary zero is placed in \fIerrbuf\fP. The length of the
242     message, including the zero, is limited to \fIerrbuf_size\fP. The yield of the
243 nigel 41 function is the size of buffer needed to hold the whole message.
244 nigel 75 .
245     .
247 nigel 63 .rs
248     .sp
249 nigel 41 Compiling a regular expression causes memory to be allocated and associated
250 nigel 75 with the \fIpreg\fP structure. The function \fBregfree()\fP frees all such
251     memory, after which \fIpreg\fP may no longer be used as a compiled expression.
252     .
253     .
254 nigel 41 .SH AUTHOR
255 nigel 63 .rs
256     .sp
257 ph10 99 .nf
258 nigel 77 Philip Hazel
259 ph10 99 University Computing Service
260 nigel 93 Cambridge CB2 3QH, England.
261 ph10 99 .fi
262     .
263     .
264     .SH REVISION
265     .rs
266     .sp
267     .nf
268 ph10 859 Last updated: 09 January 2012
269     Copyright (c) 1997-2012 University of Cambridge.
270 ph10 99 .fi


Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

ViewVC Help
Powered by ViewVC 1.1.12