| 8 |
.br |
.br |
| 9 |
.B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR, |
.B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR, |
| 10 |
.ti +5n |
.ti +5n |
| 11 |
.B const char **\fIerrptr\fR, int *\fIerroffset\fR); |
.B const char **\fIerrptr\fR, int *\fIerroffset\fR, |
| 12 |
|
.ti +5n |
| 13 |
|
.B const unsigned char *\fItableptr\fR); |
| 14 |
|
.PP |
| 15 |
|
.br |
| 16 |
|
.B const unsigned char *pcre_maketables(void); |
| 17 |
.PP |
.PP |
| 18 |
.br |
.br |
| 19 |
.B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR, |
.B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR, |
| 39 |
.PP |
.PP |
| 40 |
.br |
.br |
| 41 |
.B void (*pcre_free)(void *); |
.B void (*pcre_free)(void *); |
|
.PP |
|
|
.br |
|
|
.B unsigned char *pcre_cbits[128]; |
|
|
.PP |
|
|
.br |
|
|
.B unsigned char *pcre_ctypes[256]; |
|
|
.PP |
|
|
.br |
|
|
.B unsigned char *pcre_fcc[256]; |
|
|
.PP |
|
|
.br |
|
|
.B unsigned char *pcre_lcc[256]; |
|
| 42 |
|
|
| 43 |
|
|
| 44 |
|
|
| 53 |
|
|
| 54 |
The three functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and |
The three functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and |
| 55 |
\fBpcre_exec()\fR are used for compiling and matching regular expressions. The |
\fBpcre_exec()\fR are used for compiling and matching regular expressions. The |
| 56 |
function \fBpcre_info()\fR is used to find out information about a compiled |
function \fBpcre_maketables()\fR is used (optionally) to build a set of |
| 57 |
|
character tables in the current locale for passing to \fBpcre_compile()\fR. |
| 58 |
|
|
| 59 |
|
The function \fBpcre_info()\fR is used to find out information about a compiled |
| 60 |
pattern, while the function \fBpcre_version()\fR returns a pointer to a string |
pattern, while the function \fBpcre_version()\fR returns a pointer to a string |
| 61 |
containing the version of PCRE and its date of release. |
containing the version of PCRE and its date of release. |
| 62 |
|
|
| 66 |
so a calling program can replace them if it wishes to intercept the calls. This |
so a calling program can replace them if it wishes to intercept the calls. This |
| 67 |
should be done before calling any PCRE functions. |
should be done before calling any PCRE functions. |
| 68 |
|
|
|
The other global variables are character tables. They are initialized when PCRE |
|
|
is compiled, from source that is generated by reference to the C character type |
|
|
functions, but which a user of PCRE is free to modify. In principle the tables |
|
|
could also be modified at run time. See PCRE's README file for more details. |
|
|
|
|
| 69 |
|
|
| 70 |
.SH MULTI-THREADING |
.SH MULTI-THREADING |
| 71 |
The PCRE functions can be used in multi-threading applications, with the |
The PCRE functions can be used in multi-threading applications, with the |
| 72 |
proviso that the character tables and the memory management functions pointed |
proviso that the memory management functions pointed to by \fBpcre_malloc\fR |
| 73 |
to by \fBpcre_malloc\fR and \fBpcre_free\fR are shared by all threads. |
and \fBpcre_free\fR are shared by all threads. |
| 74 |
|
|
| 75 |
The compiled form of a regular expression is not altered during matching, so |
The compiled form of a regular expression is not altered during matching, so |
| 76 |
the same compiled pattern can safely be used by several threads at once. |
the same compiled pattern can safely be used by several threads at once. |
| 79 |
.SH COMPILING A PATTERN |
.SH COMPILING A PATTERN |
| 80 |
The function \fBpcre_compile()\fR is called to compile a pattern into an |
The function \fBpcre_compile()\fR is called to compile a pattern into an |
| 81 |
internal form. The pattern is a C string terminated by a binary zero, and |
internal form. The pattern is a C string terminated by a binary zero, and |
| 82 |
is passed in the argument \fIpattern\fR. A pointer to the compiled code block |
is passed in the argument \fIpattern\fR. A pointer to a single block of memory |
| 83 |
is returned. The \fBpcre\fR type is defined for this for convenience, but in |
that is obtained via \fBpcre_malloc\fR is returned. This contains the |
| 84 |
fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the contents of the |
compiled code and related data. The \fBpcre\fR type is defined for this for |
| 85 |
block are not defined. |
convenience, but in fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the |
| 86 |
|
contents of the block are not externally defined. It is up to the caller to |
| 87 |
|
free the memory when it is no longer required. |
| 88 |
.PP |
.PP |
| 89 |
The size of a compiled pattern is roughly proportional to the length of the |
The size of a compiled pattern is roughly proportional to the length of the |
| 90 |
pattern string, except that each character class (other than those containing |
pattern string, except that each character class (other than those containing |
| 104 |
If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately. |
If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately. |
| 105 |
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns |
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns |
| 106 |
NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual |
NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual |
| 107 |
error message. |
error message. The offset from the start of the pattern to the character where |
| 108 |
|
the error was discovered is placed in the variable pointed to by |
| 109 |
The offset from the start of the pattern to the character where the error was |
\fIerroffset\fR, which must not be NULL. If it is, an immediate error is given. |
| 110 |
discovered is placed in the variable pointed to by \fIerroffset\fR, which must |
.PP |
| 111 |
not be NULL. If it is, an immediate error is given. |
If the final argument, \fItableptr\fR, is NULL, PCRE uses a default set of |
| 112 |
|
character tables which are built when it is compiled, using the default C |
| 113 |
|
locale. Otherwise, \fItableptr\fR must be the result of a call to |
| 114 |
|
\fBpcre_maketables()\fR. See the section on locale support below. |
| 115 |
.PP |
.PP |
| 116 |
The following option bits are defined in the header file: |
The following option bits are defined in the header file: |
| 117 |
|
|
| 206 |
characters is created. |
characters is created. |
| 207 |
|
|
| 208 |
|
|
| 209 |
|
.SH LOCALE SUPPORT |
| 210 |
|
PCRE handles caseless matching, and determines whether characters are letters, |
| 211 |
|
digits, or whatever, by reference to a set of tables. The library contains a |
| 212 |
|
default set of tables which is created in the default C locale when PCRE is |
| 213 |
|
compiled. This is used when the final argument of \fBpcre_compile()\fR is NULL, |
| 214 |
|
and is sufficient for many applications. |
| 215 |
|
|
| 216 |
|
An alternative set of tables can, however, be supplied. Such tables are built |
| 217 |
|
by calling the \fBpcre_maketables()\fR function, which has no arguments, in the |
| 218 |
|
relevant locale. The result can then be passed to \fBpcre_compile()\ as often |
| 219 |
|
as necessary. For example, to build and use tables that are appropriate for the |
| 220 |
|
French locale (where accented characters with codes greater than 128 are |
| 221 |
|
treated as letters), the following code could be used: |
| 222 |
|
|
| 223 |
|
setlocale(LC_CTYPE, "fr"); |
| 224 |
|
tables = pcre_maketables(); |
| 225 |
|
re = pcre_compile(..., tables); |
| 226 |
|
|
| 227 |
|
The tables are built in memory that is obtained via \fBpcre_malloc\fR. The |
| 228 |
|
pointer that is passed to \fBpcre_compile\fR is saved with the compiled |
| 229 |
|
pattern, and the same tables are used via this pointer by \fBpcre_study()\fR |
| 230 |
|
and \fBpcre_match()\fR. Thus for any single pattern, compilation, studying and |
| 231 |
|
matching all happen in the same locale, but different patterns can be compiled |
| 232 |
|
in different locales. It is the caller's responsibility to ensure that the |
| 233 |
|
memory containing the tables remains available for as long as it is needed. |
| 234 |
|
|
| 235 |
|
|
| 236 |
.SH MATCHING A PATTERN |
.SH MATCHING A PATTERN |
| 237 |
The function \fBpcre_exec()\fR is called to match a subject string against a |
The function \fBpcre_exec()\fR is called to match a subject string against a |
| 238 |
pre-compiled pattern, which is passed in the \fIcode\fR argument. If the |
pre-compiled pattern, which is passed in the \fIcode\fR argument. If the |
| 602 |
two disjoint sets. Any given character matches one, and only one, of each pair. |
two disjoint sets. Any given character matches one, and only one, of each pair. |
| 603 |
|
|
| 604 |
A "word" character is any letter or digit or the underscore character, that is, |
A "word" character is any letter or digit or the underscore character, that is, |
| 605 |
any character which can be part of a Perl "word". These character type |
any character which can be part of a Perl "word". The definition of letters and |
| 606 |
sequences can appear both inside and outside character classes. They each match |
digits is controlled by PCRE's character tables, and may vary if locale- |
| 607 |
one character of the appropriate type. If the current matching point is at the |
specific matching is taking place (see "Locale support" above). For example, in |
| 608 |
end of the subject string, all of them fail, since there is no character to |
the "fr" (French) locale, some character codes greater than 128 are used for |
| 609 |
match. |
accented letters, and these are matched by \\w. |
| 610 |
|
|
| 611 |
|
These character type sequences can appear both inside and outside character |
| 612 |
|
classes. They each match one character of the appropriate type. If the current |
| 613 |
|
matching point is at the end of the subject string, all of them fail, since |
| 614 |
|
there is no character to match. |
| 615 |
|
|
| 616 |
The fourth use of backslash is for certain simple assertions. An assertion |
The fourth use of backslash is for certain simple assertions. An assertion |
| 617 |
specifies a condition that has to be met at a particular point in a match, |
specifies a condition that has to be met at a particular point in a match, |
| 710 |
still consumes a character from the subject string, and fails if the current |
still consumes a character from the subject string, and fails if the current |
| 711 |
pointer is at the end of the string. |
pointer is at the end of the string. |
| 712 |
|
|
| 713 |
When PCRE_CASELESS is set, any letters in a class represent both their upper |
When caseless matching is set, any letters in a class represent both their |
| 714 |
case and lower case versions, so for example, a caseless [aeiou] matches "A" as |
upper case and lower case versions, so for example, a caseless [aeiou] matches |
| 715 |
well as "a", and a caseless [^aeiou] does not match "A", whereas a caseful |
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a |
| 716 |
version would. |
caseful version would. |
| 717 |
|
|
| 718 |
The newline character is never treated in any special way in character classes, |
The newline character is never treated in any special way in character classes, |
| 719 |
whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class |
whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class |
| 730 |
range. |
range. |
| 731 |
|
|
| 732 |
Ranges operate in ASCII collating sequence. They can also be used for |
Ranges operate in ASCII collating sequence. They can also be used for |
| 733 |
characters specified numerically, for example [\\000-\\037]. If a range such as |
characters specified numerically, for example [\\000-\\037]. If a range that |
| 734 |
[W-c] is used when PCRE_CASELESS is set, it matches the letters involved in |
includes letters is used when caseless matching is set, it matches the letters |
| 735 |
either case, so is equivalent to [][\\^_`wxyzabc], matched caselessly. |
in either case. For example, [W-c] is equivalent to [][\\^_`wxyzabc], matched |
| 736 |
|
caselessly, and if character tables for the "fr" locale are in use, |
| 737 |
|
[\\xc8-\\xcb] matches accented E characters in both cases. |
| 738 |
|
|
| 739 |
The character types \\d, \\D, \\s, \\S, \\w, and \\W may also appear in a |
The character types \\d, \\D, \\s, \\S, \\w, and \\W may also appear in a |
| 740 |
character class, and add the characters that they match to the class. For |
character class, and add the characters that they match to the class. For |