/[pcre]/code/trunk/doc/pcrecpp.3
ViewVC logotype

Contents of /code/trunk/doc/pcrecpp.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 77 - (show annotations) (download)
Sat Feb 24 21:40:45 2007 UTC (7 years, 6 months ago) by nigel
File size: 7429 byte(s)
Load pcre-6.0 into code/trunk.

1 .TH PCRE 3
2 .SH NAME
3 PCRE - Perl-compatible regular expressions.
4 .SH "SYNOPSIS OF C++ WRAPPER"
5 .rs
6 .sp
7 .B #include <pcrecpp.h>
8 .PP
9 .SM
10 .br
11 .SH DESCRIPTION
12 .rs
13 .sp
14 The C++ wrapper for PCRE was provided by Google Inc. This brief man page was
15 constructed from the notes in the \fIpcrecpp.h\fP file, which should be
16 consulted for further details.
17 .
18 .
19 .SH "MATCHING INTERFACE"
20 .rs
21 .sp
22 The "FullMatch" operation checks that supplied text matches a supplied pattern
23 exactly. If pointer arguments are supplied, it copies matched sub-strings that
24 match sub-patterns into them.
25 .sp
26 Example: successful match
27 pcrecpp::RE re("h.*o");
28 re.FullMatch("hello");
29 .sp
30 Example: unsuccessful match (requires full match):
31 pcrecpp::RE re("e");
32 !re.FullMatch("hello");
33 .sp
34 Example: creating a temporary RE object:
35 pcrecpp::RE("h.*o").FullMatch("hello");
36 .sp
37 You can pass in a "const char*" or a "string" for "text". The examples below
38 tend to use a const char*. You can, as in the different examples above, store
39 the RE object explicitly in a variable or use a temporary RE object. The
40 examples below use one mode or the other arbitrarily. Either could correctly be
41 used for any of these examples.
42 .P
43 You must supply extra pointer arguments to extract matched subpieces.
44 .sp
45 Example: extracts "ruby" into "s" and 1234 into "i"
46 int i;
47 string s;
48 pcrecpp::RE re("(\e\ew+):(\e\ed+)");
49 re.FullMatch("ruby:1234", &s, &i);
50 .sp
51 Example: does not try to extract any extra sub-patterns
52 re.FullMatch("ruby:1234", &s);
53 .sp
54 Example: does not try to extract into NULL
55 re.FullMatch("ruby:1234", NULL, &i);
56 .sp
57 Example: integer overflow causes failure
58 !re.FullMatch("ruby:1234567891234", NULL, &i);
59 .sp
60 Example: fails because there aren't enough sub-patterns:
61 !pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
62 .sp
63 Example: fails because string cannot be stored in integer
64 !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
65 .sp
66 The provided pointer arguments can be pointers to any scalar numeric
67 type, or one of:
68 .sp
69 string (matched piece is copied to string)
70 StringPiece (StringPiece is mutated to point to matched piece)
71 T (where "bool T::ParseFrom(const char*, int)" exists)
72 NULL (the corresponding matched sub-pattern is not copied)
73 .sp
74 The function returns true iff all of the following conditions are satisfied:
75 .sp
76 a. "text" matches "pattern" exactly;
77 .sp
78 b. The number of matched sub-patterns is >= number of supplied
79 pointers;
80 .sp
81 c. The "i"th argument has a suitable type for holding the
82 string captured as the "i"th sub-pattern. If you pass in
83 NULL for the "i"th argument, or pass fewer arguments than
84 number of sub-patterns, "i"th captured sub-pattern is
85 ignored.
86 .sp
87 The matching interface supports at most 16 arguments per call.
88 If you need more, consider using the more general interface
89 \fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
90 \fBDoMatch\fP.
91 .
92 .SH "PARTIAL MATCHES"
93 .rs
94 .sp
95 You can use the "PartialMatch" operation when you want the pattern
96 to match any substring of the text.
97 .sp
98 Example: simple search for a string:
99 pcrecpp::RE("ell").PartialMatch("hello");
100 .sp
101 Example: find first number in a string:
102 int number;
103 pcrecpp::RE re("(\e\ed+)");
104 re.PartialMatch("x*100 + 20", &number);
105 assert(number == 100);
106 .
107 .
108 .SH "UTF-8 AND THE MATCHING INTERFACE"
109 .rs
110 .sp
111 By default, pattern and text are plain text, one byte per character. The UTF8
112 flag, passed to the constructor, causes both pattern and string to be treated
113 as UTF-8 text, still a byte stream but potentially multiple bytes per
114 character. In practice, the text is likelier to be UTF-8 than the pattern, but
115 the match returned may depend on the UTF8 flag, so always use it when matching
116 UTF8 text. For example, "." will match one byte normally but with UTF8 set may
117 match up to three bytes of a multi-byte character.
118 .sp
119 Example:
120 pcrecpp::RE_Options options;
121 options.set_utf8();
122 pcrecpp::RE re(utf8_pattern, options);
123 re.FullMatch(utf8_string);
124 .sp
125 Example: using the convenience function UTF8():
126 pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
127 re.FullMatch(utf8_string);
128 .sp
129 NOTE: The UTF8 flag is ignored if pcre was not configured with the
130 --enable-utf8 flag.
131 .
132 .
133 .SH "SCANNING TEXT INCREMENTALLY"
134 .rs
135 .sp
136 The "Consume" operation may be useful if you want to repeatedly
137 match regular expressions at the front of a string and skip over
138 them as they match. This requires use of the "StringPiece" type,
139 which represents a sub-range of a real string. Like RE, StringPiece
140 is defined in the pcrecpp namespace.
141 .sp
142 Example: read lines of the form "var = value" from a string.
143 string contents = ...; // Fill string somehow
144 pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
145
146 string var;
147 int value;
148 pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
149 while (re.Consume(&input, &var, &value)) {
150 ...;
151 }
152 .sp
153 Each successful call to "Consume" will set "var/value", and also
154 advance "input" so it points past the matched text.
155 .P
156 The "FindAndConsume" operation is similar to "Consume" but does not
157 anchor your match at the beginning of the string. For example, you
158 could extract all words from a string by repeatedly calling
159 .sp
160 pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
161 .
162 .
163 .SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
164 .rs
165 .sp
166 By default, if you pass a pointer to a numeric value, the
167 corresponding text is interpreted as a base-10 number. You can
168 instead wrap the pointer with a call to one of the operators Hex(),
169 Octal(), or CRadix() to interpret the text in another base. The
170 CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
171 prefixes, but defaults to base-10.
172 .sp
173 Example:
174 int a, b, c, d;
175 pcrecpp::RE re("(.*) (.*) (.*) (.*)");
176 re.FullMatch("100 40 0100 0x40",
177 pcrecpp::Octal(&a), pcrecpp::Hex(&b),
178 pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
179 .sp
180 will leave 64 in a, b, c, and d.
181 .
182 .
183 .SH "REPLACING PARTS OF STRINGS"
184 .rs
185 .sp
186 You can replace the first match of "pattern" in "str" with "rewrite".
187 Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
188 used to insert text matching corresponding parenthesized group
189 from the pattern. \e0 in "rewrite" refers to the entire matching
190 text. For example:
191 .sp
192 string s = "yabba dabba doo";
193 pcrecpp::RE("b+").Replace("d", &s);
194 .sp
195 will leave "s" containing "yada dabba doo". The result is true if the pattern
196 matches and a replacement occurs, false otherwise.
197 .P
198 \fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
199 occurrences of the pattern in the string with the rewrite. Replacements are
200 not subject to re-matching. For example:
201 .sp
202 string s = "yabba dabba doo";
203 pcrecpp::RE("b+").GlobalReplace("d", &s);
204 .sp
205 will leave "s" containing "yada dada doo". It returns the number of
206 replacements made.
207 .P
208 \fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
209 "rewrite" is copied into "out" (an additional argument) with substitutions.
210 The non-matching portions of "text" are ignored. Returns true iff a match
211 occurred and the extraction happened successfully; if no match occurs, the
212 string is left unaffected.
213 .
214 .
215 .SH AUTHOR
216 .rs
217 .sp
218 The C++ wrapper was contributed by Google Inc.
219 .br
220 Copyright (c) 2005 Google Inc.

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12