| 74 |
releases. |
releases. |
| 75 |
|
|
| 76 |
The functions pcre_compile(), pcre_study(), and pcre_exec() |
The functions pcre_compile(), pcre_study(), and pcre_exec() |
| 77 |
are used for compiling and matching regular expressions. |
are used for compiling and matching regular expressions. A |
| 78 |
|
sample program that demonstrates the simplest way of using |
| 79 |
|
them is given in the file pcredemo.c. The last section of |
| 80 |
|
this man page describes how to run it. |
| 81 |
|
|
| 82 |
The functions pcre_copy_substring(), pcre_get_substring(), |
The functions pcre_copy_substring(), pcre_get_substring(), |
| 83 |
and pcre_get_substring_list() are convenience functions for |
and pcre_get_substring_list() are convenience functions for |
| 107 |
|
|
| 108 |
|
|
| 109 |
MULTI-THREADING |
MULTI-THREADING |
| 110 |
The PCRE functions can be used in multi-threading |
The PCRE functions can be used in multi-threading applica- |
| 111 |
|
tions, with the proviso that the memory management functions |
| 112 |
|
pointed to by pcre_malloc and pcre_free are shared by all |
| 113 |
|
threads. |
|
|
|
|
|
|
|
SunOS 5.8 Last change: 2 |
|
|
|
|
|
|
|
|
|
|
|
applications, with the proviso that the memory management |
|
|
functions pointed to by pcre_malloc and pcre_free are shared |
|
|
by all threads. |
|
| 114 |
|
|
| 115 |
The compiled form of a regular expression is not altered |
The compiled form of a regular expression is not altered |
| 116 |
during matching, so the same compiled pattern can safely be |
during matching, so the same compiled pattern can safely be |
| 124 |
by a binary zero, and is passed in the argument pattern. A |
by a binary zero, and is passed in the argument pattern. A |
| 125 |
pointer to a single block of memory that is obtained via |
pointer to a single block of memory that is obtained via |
| 126 |
pcre_malloc is returned. This contains the compiled code and |
pcre_malloc is returned. This contains the compiled code and |
| 127 |
related data. The pcre type is defined for this for conveni- |
related data. The pcre type is defined for the returned |
| 128 |
ence, but in fact pcre is just a typedef for void, since the |
block; this is a typedef for a structure whose contents are |
| 129 |
contents of the block are not externally defined. It is up |
not externally defined. It is up to the caller to free the |
| 130 |
to the caller to free the memory when it is no longer |
memory when it is no longer required. |
| 131 |
required. |
|
| 132 |
|
Although the compiled code of a PCRE regex is relocatable, |
| 133 |
|
that is, it does not depend on memory location, the complete |
| 134 |
|
pcre data block is not fully relocatable, because it con- |
| 135 |
|
tains a copy of the tableptr argument, which is an address |
| 136 |
|
(see below). |
| 137 |
|
|
| 138 |
The size of a compiled pattern is roughly proportional to |
The size of a compiled pattern is roughly proportional to |
| 139 |
the length of the pattern string, except that each character |
the length of the pattern string, except that each character |
| 168 |
must be the result of a call to pcre_maketables(). See the |
must be the result of a call to pcre_maketables(). See the |
| 169 |
section on locale support below. |
section on locale support below. |
| 170 |
|
|
| 171 |
|
This code fragment shows a typical straightforward call to |
| 172 |
|
pcre_compile(): |
| 173 |
|
|
| 174 |
|
pcre *re; |
| 175 |
|
const char *error; |
| 176 |
|
int erroffset; |
| 177 |
|
re = pcre_compile( |
| 178 |
|
"^A.*Z", /* the pattern */ |
| 179 |
|
0, /* default options */ |
| 180 |
|
&error, /* for error message */ |
| 181 |
|
&erroffset, /* for error offset */ |
| 182 |
|
NULL); /* use default character tables */ |
| 183 |
|
|
| 184 |
The following option bits are defined in the header file: |
The following option bits are defined in the header file: |
| 185 |
|
|
| 186 |
PCRE_ANCHORED |
PCRE_ANCHORED |
| 283 |
When a pattern is going to be used several times, it is |
When a pattern is going to be used several times, it is |
| 284 |
worth spending more time analyzing it in order to speed up |
worth spending more time analyzing it in order to speed up |
| 285 |
the time taken for matching. The function pcre_study() takes |
the time taken for matching. The function pcre_study() takes |
|
|
|
| 286 |
a pointer to a compiled pattern as its first argument, and |
a pointer to a compiled pattern as its first argument, and |
| 287 |
returns a pointer to a pcre_extra block (another void |
returns a pointer to a pcre_extra block (another typedef for |
| 288 |
typedef) containing additional information about the pat- |
a structure with hidden contents) containing additional |
| 289 |
tern; this can be passed to pcre_exec(). If no additional |
information about the pattern; this can be passed to |
| 290 |
information is available, NULL is returned. |
pcre_exec(). If no additional information is available, NULL |
| 291 |
|
is returned. |
| 292 |
|
|
| 293 |
The second argument contains option bits. At present, no |
The second argument contains option bits. At present, no |
| 294 |
options are defined for pcre_study(), and this argument |
options are defined for pcre_study(), and this argument |
| 299 |
the variable it points to is set to NULL. Otherwise it |
the variable it points to is set to NULL. Otherwise it |
| 300 |
points to a textual error message. |
points to a textual error message. |
| 301 |
|
|
| 302 |
|
This is a typical call to pcre_study(): |
| 303 |
|
|
| 304 |
|
pcre_extra *pe; |
| 305 |
|
pe = pcre_study( |
| 306 |
|
re, /* result of pcre_compile() */ |
| 307 |
|
0, /* no options exist */ |
| 308 |
|
&error); /* set to NULL or points to a message */ |
| 309 |
|
|
| 310 |
At present, studying a pattern is useful only for non- |
At present, studying a pattern is useful only for non- |
| 311 |
anchored patterns that do not have a single fixed starting |
anchored patterns that do not have a single fixed starting |
| 312 |
character. A bitmap of possible starting characters is |
character. A bitmap of possible starting characters is |
| 367 |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
| 368 |
PCRE_ERROR_BADOPTION the value of what was invalid |
PCRE_ERROR_BADOPTION the value of what was invalid |
| 369 |
|
|
| 370 |
|
Here is a typical call of pcre_fullinfo(), to obtain the |
| 371 |
|
length of the compiled pattern: |
| 372 |
|
|
| 373 |
|
int rc; |
| 374 |
|
unsigned long int length; |
| 375 |
|
rc = pcre_fullinfo( |
| 376 |
|
re, /* result of pcre_compile() */ |
| 377 |
|
pe, /* result of pcre_study(), or NULL */ |
| 378 |
|
PCRE_INFO_SIZE, /* what is required */ |
| 379 |
|
&length); /* where to put the data */ |
| 380 |
|
|
| 381 |
The possible values for the third argument are defined in |
The possible values for the third argument are defined in |
| 382 |
pcre.h, and are as follows: |
pcre.h, and are as follows: |
| 383 |
|
|
| 384 |
PCRE_INFO_OPTIONS |
PCRE_INFO_OPTIONS |
| 385 |
|
|
| 386 |
Return a copy of the options with which the pattern was com- |
Return a copy of the options with which the pattern was com- |
| 387 |
piled. The fourth argument should point to au unsigned long |
piled. The fourth argument should point to an unsigned long |
| 388 |
int variable. These option bits are those specified in the |
int variable. These option bits are those specified in the |
| 389 |
call to pcre_compile(), modified by any top-level option |
call to pcre_compile(), modified by any top-level option |
| 390 |
settings within the pattern itself, and with the |
settings within the pattern itself, and with the |
| 406 |
|
|
| 407 |
PCRE_INFO_BACKREFMAX |
PCRE_INFO_BACKREFMAX |
| 408 |
|
|
| 409 |
Return the number of the highest back reference in the |
Return the number of the highest back reference in the pat- |
| 410 |
pattern. The fourth argument should point to an int vari- |
tern. The fourth argument should point to an int variable. |
| 411 |
able. Zero is returned if there are no back references. |
Zero is returned if there are no back references. |
| 412 |
|
|
| 413 |
PCRE_INFO_FIRSTCHAR |
PCRE_INFO_FIRSTCHAR |
| 414 |
|
|
| 471 |
|
|
| 472 |
MATCHING A PATTERN |
MATCHING A PATTERN |
| 473 |
The function pcre_exec() is called to match a subject string |
The function pcre_exec() is called to match a subject string |
| 474 |
|
|
| 475 |
|
|
| 476 |
|
|
| 477 |
|
|
| 478 |
|
|
| 479 |
|
SunOS 5.8 Last change: 9 |
| 480 |
|
|
| 481 |
|
|
| 482 |
|
|
| 483 |
against a pre-compiled pattern, which is passed in the code |
against a pre-compiled pattern, which is passed in the code |
| 484 |
argument. If the pattern has been studied, the result of the |
argument. If the pattern has been studied, the result of the |
| 485 |
study should be passed in the extra argument. Otherwise this |
study should be passed in the extra argument. Otherwise this |
| 486 |
must be NULL. |
must be NULL. |
| 487 |
|
|
| 488 |
|
Here is an example of a simple call to pcre_exec(): |
| 489 |
|
|
| 490 |
|
int rc; |
| 491 |
|
int ovector[30]; |
| 492 |
|
rc = pcre_exec( |
| 493 |
|
re, /* result of pcre_compile() */ |
| 494 |
|
NULL, /* we didn't study the pattern */ |
| 495 |
|
"some string", /* the subject string */ |
| 496 |
|
11, /* the length of the subject string */ |
| 497 |
|
0, /* start at offset 0 in the subject */ |
| 498 |
|
0, /* default options */ |
| 499 |
|
ovector, /* vector for substring information */ |
| 500 |
|
30); /* number of elements in the vector */ |
| 501 |
|
|
| 502 |
The PCRE_ANCHORED option can be passed in the options argu- |
The PCRE_ANCHORED option can be passed in the options argu- |
| 503 |
ment, whose unused bits must be zero. However, if a pattern |
ment, whose unused bits must be zero. However, if a pattern |
| 504 |
was compiled with PCRE_ANCHORED, or turned out to be |
was compiled with PCRE_ANCHORED, or turned out to be |
| 549 |
|
|
| 550 |
The subject string is passed as a pointer in subject, a |
The subject string is passed as a pointer in subject, a |
| 551 |
length in length, and a starting offset in startoffset. |
length in length, and a starting offset in startoffset. |
| 552 |
Unlike the pattern string, it may contain binary zero char- |
Unlike the pattern string, the subject may contain binary |
| 553 |
acters. When the starting offset is zero, the search for a |
zero characters. When the starting offset is zero, the |
| 554 |
match starts at the beginning of the subject, and this is by |
search for a match starts at the beginning of the subject, |
| 555 |
far the most common case. |
and this is by far the most common case. |
| 556 |
|
|
| 557 |
A non-zero starting offset is useful when searching for |
A non-zero starting offset is useful when searching for |
| 558 |
another match in the same subject by calling pcre_exec() |
another match in the same subject by calling pcre_exec() |
| 688 |
|
|
| 689 |
|
|
| 690 |
|
|
| 691 |
|
|
| 692 |
EXTRACTING CAPTURED SUBSTRINGS |
EXTRACTING CAPTURED SUBSTRINGS |
| 693 |
Captured substrings can be accessed directly by using the |
Captured substrings can be accessed directly by using the |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SunOS 5.8 Last change: 12 |
|
|
|
|
|
|
|
|
|
|
| 694 |
offsets returned by pcre_exec() in ovector. For convenience, |
offsets returned by pcre_exec() in ovector. For convenience, |
| 695 |
the functions pcre_copy_substring(), pcre_get_substring(), |
the functions pcre_copy_substring(), pcre_get_substring(), |
| 696 |
and pcre_get_substring_list() are provided for extracting |
and pcre_get_substring_list() are provided for extracting |
| 768 |
There are some size limitations in PCRE but it is hoped that |
There are some size limitations in PCRE but it is hoped that |
| 769 |
they will never in practice be relevant. The maximum length |
they will never in practice be relevant. The maximum length |
| 770 |
of a compiled pattern is 65539 (sic) bytes. All values in |
of a compiled pattern is 65539 (sic) bytes. All values in |
| 771 |
repeating quantifiers must be less than 65536. The maximum |
repeating quantifiers must be less than 65536. There max- |
| 772 |
number of capturing subpatterns is 99. The maximum number |
imum number of capturing subpatterns is 65535. There is no |
| 773 |
of all parenthesized subpatterns, including capturing sub- |
limit to the number of non-capturing subpatterns, but the |
| 774 |
patterns, assertions, and other types of subpattern, is 200. |
maximum depth of nesting of all kinds of parenthesized sub- |
| 775 |
|
pattern, including capturing subpatterns, assertions, and |
| 776 |
|
other types of subpattern, is 200. |
| 777 |
|
|
| 778 |
The maximum length of a subject string is the largest posi- |
The maximum length of a subject string is the largest posi- |
| 779 |
tive number that an integer variable can hold. However, PCRE |
tive number that an integer variable can hold. However, PCRE |
| 949 |
The backslash character has several uses. Firstly, if it is |
The backslash character has several uses. Firstly, if it is |
| 950 |
followed by a non-alphameric character, it takes away any |
followed by a non-alphameric character, it takes away any |
| 951 |
special meaning that character may have. This use of |
special meaning that character may have. This use of |
| 952 |
|
|
| 953 |
backslash as an escape character applies both inside and |
backslash as an escape character applies both inside and |
| 954 |
outside character classes. |
outside character classes. |
| 955 |
|
|
| 1110 |
Outside a character class, in the default matching mode, the |
Outside a character class, in the default matching mode, the |
| 1111 |
circumflex character is an assertion which is true only if |
circumflex character is an assertion which is true only if |
| 1112 |
the current matching point is at the start of the subject |
the current matching point is at the start of the subject |
|
|
|
| 1113 |
string. If the startoffset argument of pcre_exec() is non- |
string. If the startoffset argument of pcre_exec() is non- |
| 1114 |
zero, circumflex can never match. Inside a character class, |
zero, circumflex can never match. Inside a character class, |
| 1115 |
circumflex has an entirely different meaning (see below). |
circumflex has an entirely different meaning (see below). |
| 1153 |
|
|
| 1154 |
Note that the sequences \A, \Z, and \z can be used to match |
Note that the sequences \A, \Z, and \z can be used to match |
| 1155 |
the start and end of the subject in both modes, and if all |
the start and end of the subject in both modes, and if all |
| 1156 |
branches of a pattern start with \A is it always anchored, |
branches of a pattern start with \A it is always anchored, |
| 1157 |
whether PCRE_MULTILINE is set or not. |
whether PCRE_MULTILINE is set or not. |
| 1158 |
|
|
| 1159 |
|
|
| 1162 |
Outside a character class, a dot in the pattern matches any |
Outside a character class, a dot in the pattern matches any |
| 1163 |
one character in the subject, including a non-printing char- |
one character in the subject, including a non-printing char- |
| 1164 |
acter, but not (by default) newline. If the PCRE_DOTALL |
acter, but not (by default) newline. If the PCRE_DOTALL |
|
|
|
| 1165 |
option is set, dots match newlines as well. The handling of |
option is set, dots match newlines as well. The handling of |
| 1166 |
dot is entirely independent of the handling of circumflex |
dot is entirely independent of the handling of circumflex |
| 1167 |
and dollar, the only relationship being that they both |
and dollar, the only relationship being that they both |
| 1280 |
[12[:^digit:]] |
[12[:^digit:]] |
| 1281 |
|
|
| 1282 |
matches "1", "2", or any non-digit. PCRE (and Perl) also |
matches "1", "2", or any non-digit. PCRE (and Perl) also |
| 1283 |
recogize the POSIX syntax [.ch.] and [=ch=] where "ch" is a |
recognize the POSIX syntax [.ch.] and [=ch=] where "ch" is a |
| 1284 |
"collating element", but these are not supported, and an |
"collating element", but these are not supported, and an |
| 1285 |
error is given if they are encountered. |
error is given if they are encountered. |
| 1286 |
|
|
| 1399 |
the ((red|white) (king|queen)) |
the ((red|white) (king|queen)) |
| 1400 |
|
|
| 1401 |
the captured substrings are "red king", "red", and "king", |
the captured substrings are "red king", "red", and "king", |
| 1402 |
and are numbered 1, 2, and 3. |
and are numbered 1, 2, and 3, respectively. |
| 1403 |
|
|
| 1404 |
The fact that plain parentheses fulfil two functions is not |
The fact that plain parentheses fulfil two functions is not |
| 1405 |
always helpful. There are often times when a grouping sub- |
always helpful. There are often times when a grouping sub- |
| 1470 |
one that does not match the syntax of a quantifier, is taken |
one that does not match the syntax of a quantifier, is taken |
| 1471 |
as a literal character. For example, {,6} is not a quantif- |
as a literal character. For example, {,6} is not a quantif- |
| 1472 |
ier, but a literal string of four characters. |
ier, but a literal string of four characters. |
|
|
|
| 1473 |
The quantifier {0} is permitted, causing the expression to |
The quantifier {0} is permitted, causing the expression to |
| 1474 |
behave as if the previous item and the quantifier were not |
behave as if the previous item and the quantifier were not |
| 1475 |
present. |
present. |
| 1574 |
BACK REFERENCES |
BACK REFERENCES |
| 1575 |
Outside a character class, a backslash followed by a digit |
Outside a character class, a backslash followed by a digit |
| 1576 |
greater than 0 (and possibly further digits) is a back |
greater than 0 (and possibly further digits) is a back |
| 1577 |
|
|
| 1578 |
|
|
| 1579 |
|
|
| 1580 |
|
|
| 1581 |
|
SunOS 5.8 Last change: 30 |
| 1582 |
|
|
| 1583 |
|
|
| 1584 |
|
|
| 1585 |
reference to a capturing subpattern earlier (i.e. to its |
reference to a capturing subpattern earlier (i.e. to its |
| 1586 |
left) in the pattern, provided there have been that many |
left) in the pattern, provided there have been that many |
| 1587 |
previous capturing left parentheses. |
previous capturing left parentheses. |
| 1637 |
|
|
| 1638 |
matches any number of "a"s and also "aba", "ababbaa" etc. At |
matches any number of "a"s and also "aba", "ababbaa" etc. At |
| 1639 |
each iteration of the subpattern, the back reference matches |
each iteration of the subpattern, the back reference matches |
| 1640 |
the character string corresponding to the previous |
the character string corresponding to the previous itera- |
| 1641 |
iteration. In order for this to work, the pattern must be |
tion. In order for this to work, the pattern must be such |
| 1642 |
such that the first iteration does not need to match the |
that the first iteration does not need to match the back |
| 1643 |
back reference. This can be done using alternation, as in |
reference. This can be done using alternation, as in the |
| 1644 |
the example above, or by a quantifier with a minimum of |
example above, or by a quantifier with a minimum of zero. |
|
zero. |
|
| 1645 |
|
|
| 1646 |
|
|
| 1647 |
|
|
| 1794 |
|
|
| 1795 |
This kind of parenthesis "locks up" the part of the pattern |
This kind of parenthesis "locks up" the part of the pattern |
| 1796 |
it contains once it has matched, and a failure further into |
it contains once it has matched, and a failure further into |
| 1797 |
the pattern is prevented from backtracking into it. |
the pattern is prevented from backtracking into it. Back- |
| 1798 |
Backtracking past it to previous items, however, works as |
tracking past it to previous items, however, works as nor- |
| 1799 |
normal. |
mal. |
| 1800 |
|
|
| 1801 |
An alternative description is that a subpattern of this type |
An alternative description is that a subpattern of this type |
| 1802 |
matches the string of characters that an identical stan- |
matches the string of characters that an identical stan- |
| 2104 |
Running with PCRE_UTF8 set causes these changes in the way |
Running with PCRE_UTF8 set causes these changes in the way |
| 2105 |
PCRE works: |
PCRE works: |
| 2106 |
|
|
| 2107 |
1. In a pattern, the escape sequence \x{...}, where the con- |
1. In a pattern, the escape sequence \x{...}, where the |
| 2108 |
tents of the braces is a string of hexadecimal digits, is |
contents of the braces is a string of hexadecimal digits, is |
| 2109 |
interpreted as a UTF-8 character whose code number is the |
interpreted as a UTF-8 character whose code number is the |
| 2110 |
given hexadecimal number, for example: \x{1234}. This |
given hexadecimal number, for example: \x{1234}. This |
| 2111 |
inserts from one to six literal bytes into the pattern, |
inserts from one to six literal bytes into the pattern, |
| 2159 |
|
|
| 2160 |
The following UTF-8 features of Perl 5.6 are not imple- |
The following UTF-8 features of Perl 5.6 are not imple- |
| 2161 |
mented: |
mented: |
| 2162 |
|
|
| 2163 |
1. The escape sequence \C to match a single byte. |
1. The escape sequence \C to match a single byte. |
| 2164 |
|
|
| 2165 |
2. The use of Unicode tables and properties and escapes \p, |
2. The use of Unicode tables and properties and escapes \p, |
| 2167 |
|
|
| 2168 |
|
|
| 2169 |
|
|
| 2170 |
|
SAMPLE PROGRAM |
| 2171 |
|
The code below is a simple, complete demonstration program, |
| 2172 |
|
to get you started with using PCRE. This code is also sup- |
| 2173 |
|
plied in the file pcredemo.c in the PCRE distribution. |
| 2174 |
|
|
| 2175 |
|
The program compiles the regular expression that is its |
| 2176 |
|
first argument, and matches it against the subject string in |
| 2177 |
|
its second argument. No options are set, and default charac- |
| 2178 |
|
ter tables are used. If matching succeeds, the program out- |
| 2179 |
|
puts the portion of the subject that matched, together with |
| 2180 |
|
the contents of any captured substrings. |
| 2181 |
|
|
| 2182 |
|
On a Unix system that has PCRE installed in /usr/local, you |
| 2183 |
|
can compile the demonstration program using a command like |
| 2184 |
|
this: |
| 2185 |
|
|
| 2186 |
|
gcc -o pcredemo pcredemo.c -I/usr/local/include |
| 2187 |
|
-L/usr/local/lib -lpcre |
| 2188 |
|
|
| 2189 |
|
Then you can run simple tests like this: |
| 2190 |
|
|
| 2191 |
|
./pcredemo 'cat|dog' 'the cat sat on the mat' |
| 2192 |
|
|
| 2193 |
|
Note that there is a much more comprehensive test program, |
| 2194 |
|
called pcretest, which supports many more facilities for |
| 2195 |
|
testing regular expressions. The pcredemo program is pro- |
| 2196 |
|
vided as a simple coding example. |
| 2197 |
|
|
| 2198 |
|
On some operating systems (e.g. Solaris) you may get an |
| 2199 |
|
error like this when you try to run pcredemo: |
| 2200 |
|
|
| 2201 |
|
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such |
| 2202 |
|
file or directory |
| 2203 |
|
|
| 2204 |
|
This is caused by the way shared library support works on |
| 2205 |
|
those systems. You need to add |
| 2206 |
|
|
| 2207 |
|
-R/usr/local/lib |
| 2208 |
|
|
| 2209 |
|
to the compile command to get round this problem. Here's the |
| 2210 |
|
code: |
| 2211 |
|
|
| 2212 |
|
#include <stdio.h> |
| 2213 |
|
#include <string.h> |
| 2214 |
|
#include <pcre.h> |
| 2215 |
|
|
| 2216 |
|
#define OVECCOUNT 30 /* should be a multiple of 3 */ |
| 2217 |
|
|
| 2218 |
|
int main(int argc, char **argv) |
| 2219 |
|
{ |
| 2220 |
|
pcre *re; |
| 2221 |
|
const char *error; |
| 2222 |
|
int erroffset; |
| 2223 |
|
int ovector[OVECCOUNT]; |
| 2224 |
|
int rc, i; |
| 2225 |
|
|
| 2226 |
|
if (argc != 3) |
| 2227 |
|
{ |
| 2228 |
|
printf("Two arguments required: a regex and a " |
| 2229 |
|
"subject string\n"); |
| 2230 |
|
return 1; |
| 2231 |
|
} |
| 2232 |
|
|
| 2233 |
|
/* Compile the regular expression in the first argument */ |
| 2234 |
|
|
| 2235 |
|
re = pcre_compile( |
| 2236 |
|
argv[1], /* the pattern */ |
| 2237 |
|
0, /* default options */ |
| 2238 |
|
&error, /* for error message */ |
| 2239 |
|
&erroffset, /* for error offset */ |
| 2240 |
|
NULL); /* use default character tables */ |
| 2241 |
|
|
| 2242 |
|
/* Compilation failed: print the error message and exit */ |
| 2243 |
|
|
| 2244 |
|
if (re == NULL) |
| 2245 |
|
{ |
| 2246 |
|
printf("PCRE compilation failed at offset %d: %s\n", |
| 2247 |
|
erroffset, error); |
| 2248 |
|
return 1; |
| 2249 |
|
} |
| 2250 |
|
|
| 2251 |
|
/* Compilation succeeded: match the subject in the second |
| 2252 |
|
argument */ |
| 2253 |
|
|
| 2254 |
|
rc = pcre_exec( |
| 2255 |
|
re, /* the compiled pattern */ |
| 2256 |
|
NULL, /* we didn't study the pattern */ |
| 2257 |
|
argv[2], /* the subject string */ |
| 2258 |
|
(int)strlen(argv[2]), /* the length of the subject */ |
| 2259 |
|
0, /* start at offset 0 in the subject */ |
| 2260 |
|
0, /* default options */ |
| 2261 |
|
ovector, /* vector for substring information */ |
| 2262 |
|
OVECCOUNT); /* number of elements in the vector */ |
| 2263 |
|
|
| 2264 |
|
/* Matching failed: handle error cases */ |
| 2265 |
|
|
| 2266 |
|
if (rc < 0) |
| 2267 |
|
{ |
| 2268 |
|
switch(rc) |
| 2269 |
|
{ |
| 2270 |
|
case PCRE_ERROR_NOMATCH: printf("No match\n"); break; |
| 2271 |
|
/* |
| 2272 |
|
Handle other special cases if you like |
| 2273 |
|
*/ |
| 2274 |
|
default: printf("Matching error %d\n", rc); break; |
| 2275 |
|
} |
| 2276 |
|
return 1; |
| 2277 |
|
} |
| 2278 |
|
|
| 2279 |
|
/* Match succeded */ |
| 2280 |
|
|
| 2281 |
|
printf("Match succeeded\n"); |
| 2282 |
|
|
| 2283 |
|
/* The output vector wasn't big enough */ |
| 2284 |
|
|
| 2285 |
|
if (rc == 0) |
| 2286 |
|
{ |
| 2287 |
|
rc = OVECCOUNT/3; |
| 2288 |
|
printf("ovector only has room for %d captured " |
| 2289 |
|
substrings\n", rc - 1); |
| 2290 |
|
} |
| 2291 |
|
|
| 2292 |
|
/* Show substrings stored in the output vector */ |
| 2293 |
|
|
| 2294 |
|
for (i = 0; i < rc; i++) |
| 2295 |
|
{ |
| 2296 |
|
char *substring_start = argv[2] + ovector[2*i]; |
| 2297 |
|
int substring_length = ovector[2*i+1] - ovector[2*i]; |
| 2298 |
|
printf("%2d: %.*s\n", i, substring_length, |
| 2299 |
|
substring_start); |
| 2300 |
|
} |
| 2301 |
|
|
| 2302 |
|
return 0; |
| 2303 |
|
} |
| 2304 |
|
|
| 2305 |
|
|
| 2306 |
|
|
| 2307 |
AUTHOR |
AUTHOR |
| 2308 |
Philip Hazel <ph10@cam.ac.uk> |
Philip Hazel <ph10@cam.ac.uk> |
| 2309 |
University Computing Service, |
University Computing Service, |
| 2311 |
Cambridge CB2 3QG, England. |
Cambridge CB2 3QG, England. |
| 2312 |
Phone: +44 1223 334714 |
Phone: +44 1223 334714 |
| 2313 |
|
|
| 2314 |
Last updated: 28 August 2000, |
Last updated: 15 August 2001 |
| 2315 |
the 250th anniversary of the death of J.S. Bach. |
Copyright (c) 1997-2001 University of Cambridge. |
|
Copyright (c) 1997-2000 University of Cambridge. |
|