| 1 |
nigel |
37 |
NAME |
| 2 |
|
|
pcreposix - POSIX API for Perl-compatible regular expres- |
| 3 |
|
|
sions. |
| 4 |
|
|
|
| 5 |
|
|
|
| 6 |
|
|
|
| 7 |
|
|
SYNOPSIS |
| 8 |
|
|
#include <pcreposix.h> |
| 9 |
|
|
|
| 10 |
|
|
int regcomp(regex_t *preg, const char *pattern, |
| 11 |
|
|
int cflags); |
| 12 |
|
|
|
| 13 |
|
|
int regexec(regex_t *preg, const char *string, |
| 14 |
|
|
size_t nmatch, regmatch_t pmatch[], int eflags); |
| 15 |
|
|
|
| 16 |
|
|
size_t regerror(int errcode, const regex_t *preg, |
| 17 |
|
|
char *errbuf, size_t errbuf_size); |
| 18 |
|
|
|
| 19 |
|
|
void regfree(regex_t *preg); |
| 20 |
|
|
|
| 21 |
|
|
|
| 22 |
|
|
|
| 23 |
|
|
DESCRIPTION |
| 24 |
|
|
This set of functions provides a POSIX-style API to the PCRE |
| 25 |
|
|
regular expression package. See the pcre documentation for a |
| 26 |
|
|
description of the native API, which contains additional |
| 27 |
|
|
functionality. |
| 28 |
|
|
|
| 29 |
|
|
The functions described here are just wrapper functions that |
| 30 |
|
|
ultimately call the native API. Their prototypes are defined |
| 31 |
|
|
in the pcreposix.h header file, and on Unix systems the |
| 32 |
|
|
library itself is called pcreposix.a, so can be accessed by |
| 33 |
|
|
adding -lpcreposix to the command for linking an application |
| 34 |
|
|
which uses them. Because the POSIX functions call the native |
| 35 |
|
|
ones, it is also necessary to add -lpcre. |
| 36 |
|
|
|
| 37 |
|
|
As I am pretty ignorant about POSIX, these functions must be |
| 38 |
|
|
considered as experimental. I have implemented only those |
| 39 |
|
|
option bits that can be reasonably mapped to PCRE native |
| 40 |
|
|
options. Other POSIX options are not even defined. It may be |
| 41 |
|
|
that it is useful to define, but ignore, other options. |
| 42 |
|
|
Feedback from more knowledgeable folk may cause this kind of |
| 43 |
|
|
detail to change. |
| 44 |
|
|
|
| 45 |
|
|
When PCRE is called via these functions, it is only the API |
| 46 |
|
|
that is POSIX-like in style. The syntax and semantics of the |
| 47 |
|
|
regular expressions themselves are still those of Perl, sub- |
| 48 |
|
|
ject to the setting of various PCRE options, as described |
| 49 |
|
|
below. |
| 50 |
|
|
|
| 51 |
|
|
The header for these functions is supplied as pcreposix.h to |
| 52 |
|
|
avoid any potential clash with other POSIX libraries. It |
| 53 |
|
|
can, of course, be renamed or aliased as regex.h, which is |
| 54 |
|
|
the "correct" name. It provides two structure types, regex_t |
| 55 |
|
|
for compiled internal forms, and regmatch_t for returning |
| 56 |
|
|
captured substrings. It also defines some constants whose |
| 57 |
|
|
names start with "REG_"; these are used for setting options |
| 58 |
|
|
and identifying error codes. |
| 59 |
|
|
|
| 60 |
|
|
|
| 61 |
|
|
|
| 62 |
|
|
COMPILING A PATTERN |
| 63 |
|
|
The function regcomp() is called to compile a pattern into |
| 64 |
|
|
an internal form. The pattern is a C string terminated by a |
| 65 |
|
|
binary zero, and is passed in the argument pattern. The preg |
| 66 |
|
|
argument is a pointer to a regex_t structure which is used |
| 67 |
|
|
as a base for storing information about the compiled expres- |
| 68 |
|
|
sion. |
| 69 |
|
|
|
| 70 |
|
|
The argument cflags is either zero, or contains one or more |
| 71 |
|
|
of the bits defined by the following macros: |
| 72 |
|
|
|
| 73 |
|
|
REG_ICASE |
| 74 |
|
|
|
| 75 |
|
|
The PCRE_CASELESS option is set when the expression is |
| 76 |
|
|
passed for compilation to the native function. |
| 77 |
|
|
|
| 78 |
|
|
REG_NEWLINE |
| 79 |
|
|
|
| 80 |
|
|
The PCRE_MULTILINE option is set when the expression is |
| 81 |
|
|
passed for compilation to the native function. |
| 82 |
|
|
|
| 83 |
|
|
The yield of regcomp() is zero on success, and non-zero oth- |
| 84 |
|
|
erwise. The preg structure is filled in on success, and one |
| 85 |
|
|
member of the structure is publicized: re_nsub contains the |
| 86 |
|
|
number of capturing subpatterns in the regular expression. |
| 87 |
|
|
Various error codes are defined in the header file. |
| 88 |
|
|
|
| 89 |
|
|
|
| 90 |
|
|
|
| 91 |
|
|
MATCHING A PATTERN |
| 92 |
|
|
The function regexec() is called to match a pre-compiled |
| 93 |
|
|
pattern preg against a given string, which is terminated by |
| 94 |
|
|
a zero byte, subject to the options in eflags. These can be: |
| 95 |
|
|
|
| 96 |
|
|
REG_NOTBOL |
| 97 |
|
|
|
| 98 |
|
|
The PCRE_NOTBOL option is set when calling the underlying |
| 99 |
|
|
PCRE matching function. |
| 100 |
|
|
|
| 101 |
|
|
REG_NOTEOL |
| 102 |
|
|
|
| 103 |
|
|
The PCRE_NOTEOL option is set when calling the underlying |
| 104 |
|
|
PCRE matching function. |
| 105 |
|
|
|
| 106 |
|
|
The portion of the string that was matched, and also any |
| 107 |
|
|
captured substrings, are returned via the pmatch argument, |
| 108 |
|
|
which points to an array of nmatch structures of type |
| 109 |
|
|
regmatch_t, containing the members rm_so and rm_eo. These |
| 110 |
|
|
contain the offset to the first character of each substring |
| 111 |
|
|
and the offset to the first character after the end of each |
| 112 |
|
|
substring, respectively. The 0th element of the vector |
| 113 |
|
|
relates to the entire portion of string that was matched; |
| 114 |
|
|
subsequent elements relate to the capturing subpatterns of |
| 115 |
|
|
the regular expression. Unused entries in the array have |
| 116 |
|
|
both structure members set to -1. |
| 117 |
|
|
|
| 118 |
|
|
A successful match yields a zero return; various error codes |
| 119 |
|
|
are defined in the header file, of which REG_NOMATCH is the |
| 120 |
|
|
"expected" failure code. |
| 121 |
|
|
|
| 122 |
|
|
|
| 123 |
|
|
|
| 124 |
|
|
ERROR MESSAGES |
| 125 |
|
|
The regerror() function maps a non-zero errorcode from |
| 126 |
|
|
either regcomp or regexec to a printable message. If preg is |
| 127 |
|
|
not NULL, the error should have arisen from the use of that |
| 128 |
|
|
structure. A message terminated by a binary zero is placed |
| 129 |
|
|
in errbuf. The length of the message, including the zero, is |
| 130 |
|
|
limited to errbuf_size. The yield of the function is the |
| 131 |
|
|
size of buffer needed to hold the whole message. |
| 132 |
|
|
|
| 133 |
|
|
|
| 134 |
|
|
|
| 135 |
|
|
STORAGE |
| 136 |
|
|
Compiling a regular expression causes memory to be allocated |
| 137 |
|
|
and associated with the preg structure. The function reg- |
| 138 |
|
|
free() frees all such memory, after which preg may no longer |
| 139 |
|
|
be used as a compiled expression. |
| 140 |
|
|
|
| 141 |
|
|
|
| 142 |
|
|
|
| 143 |
|
|
AUTHOR |
| 144 |
|
|
Philip Hazel <ph10@cam.ac.uk> |
| 145 |
|
|
University Computing Service, |
| 146 |
|
|
New Museums Site, |
| 147 |
|
|
Cambridge CB2 3QG, England. |
| 148 |
|
|
Phone: +44 1223 334714 |
| 149 |
|
|
|
| 150 |
|
|
Copyright (c) 1997-1999 University of Cambridge. |