/[pcre]/code/trunk/doc/html/pcresyntax.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcresyntax.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 345 - (show annotations) (download) (as text)
Mon Apr 28 15:10:02 2008 UTC (5 years, 11 months ago) by ph10
File MIME type: text/html
File size: 13184 byte(s)
Tidies for the 7.7-RC1 distribution.

1 <html>
2 <head>
3 <title>pcresyntax specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcresyntax man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17 <li><a name="TOC2" href="#SEC2">QUOTING</a>
18 <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
21 <li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
22 <li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
23 <li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
24 <li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
25 <li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
26 <li><a name="TOC11" href="#SEC11">ALTERNATION</a>
27 <li><a name="TOC12" href="#SEC12">CAPTURING</a>
28 <li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
29 <li><a name="TOC14" href="#SEC14">COMMENT</a>
30 <li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
31 <li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
32 <li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
33 <li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
34 <li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
35 <li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>
36 <li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a>
37 <li><a name="TOC22" href="#SEC22">WHAT \R MATCHES</a>
38 <li><a name="TOC23" href="#SEC23">CALLOUTS</a>
39 <li><a name="TOC24" href="#SEC24">SEE ALSO</a>
40 <li><a name="TOC25" href="#SEC25">AUTHOR</a>
41 <li><a name="TOC26" href="#SEC26">REVISION</a>
42 </ul>
43 <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
44 <P>
45 The full syntax and semantics of the regular expressions that are supported by
46 PCRE are described in the
47 <a href="pcrepattern.html"><b>pcrepattern</b></a>
48 documentation. This document contains just a quick-reference summary of the
49 syntax.
50 </P>
51 <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
52 <P>
53 <pre>
54 \x where x is non-alphanumeric is a literal x
55 \Q...\E treat enclosed characters as literal
56 </PRE>
57 </P>
58 <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
59 <P>
60 <pre>
61 \a alarm, that is, the BEL character (hex 07)
62 \cx "control-x", where x is any character
63 \e escape (hex 1B)
64 \f formfeed (hex 0C)
65 \n newline (hex 0A)
66 \r carriage return (hex 0D)
67 \t tab (hex 09)
68 \ddd character with octal code ddd, or backreference
69 \xhh character with hex code hh
70 \x{hhh..} character with hex code hhh..
71 </PRE>
72 </P>
73 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
74 <P>
75 <pre>
76 . any character except newline;
77 in dotall mode, any character whatsoever
78 \C one byte, even in UTF-8 mode (best avoided)
79 \d a decimal digit
80 \D a character that is not a decimal digit
81 \h a horizontal whitespace character
82 \H a character that is not a horizontal whitespace character
83 \p{<i>xx</i>} a character with the <i>xx</i> property
84 \P{<i>xx</i>} a character without the <i>xx</i> property
85 \R a newline sequence
86 \s a whitespace character
87 \S a character that is not a whitespace character
88 \v a vertical whitespace character
89 \V a character that is not a vertical whitespace character
90 \w a "word" character
91 \W a "non-word" character
92 \X an extended Unicode sequence
93 </pre>
94 In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
95 </P>
96 <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
97 <P>
98 <pre>
99 C Other
100 Cc Control
101 Cf Format
102 Cn Unassigned
103 Co Private use
104 Cs Surrogate
105
106 L Letter
107 Ll Lower case letter
108 Lm Modifier letter
109 Lo Other letter
110 Lt Title case letter
111 Lu Upper case letter
112 L& Ll, Lu, or Lt
113
114 M Mark
115 Mc Spacing mark
116 Me Enclosing mark
117 Mn Non-spacing mark
118
119 N Number
120 Nd Decimal number
121 Nl Letter number
122 No Other number
123
124 P Punctuation
125 Pc Connector punctuation
126 Pd Dash punctuation
127 Pe Close punctuation
128 Pf Final punctuation
129 Pi Initial punctuation
130 Po Other punctuation
131 Ps Open punctuation
132
133 S Symbol
134 Sc Currency symbol
135 Sk Modifier symbol
136 Sm Mathematical symbol
137 So Other symbol
138
139 Z Separator
140 Zl Line separator
141 Zp Paragraph separator
142 Zs Space separator
143 </PRE>
144 </P>
145 <br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
146 <P>
147 Arabic,
148 Armenian,
149 Balinese,
150 Bengali,
151 Bopomofo,
152 Braille,
153 Buginese,
154 Buhid,
155 Canadian_Aboriginal,
156 Cherokee,
157 Common,
158 Coptic,
159 Cuneiform,
160 Cypriot,
161 Cyrillic,
162 Deseret,
163 Devanagari,
164 Ethiopic,
165 Georgian,
166 Glagolitic,
167 Gothic,
168 Greek,
169 Gujarati,
170 Gurmukhi,
171 Han,
172 Hangul,
173 Hanunoo,
174 Hebrew,
175 Hiragana,
176 Inherited,
177 Kannada,
178 Katakana,
179 Kharoshthi,
180 Khmer,
181 Lao,
182 Latin,
183 Limbu,
184 Linear_B,
185 Malayalam,
186 Mongolian,
187 Myanmar,
188 New_Tai_Lue,
189 Nko,
190 Ogham,
191 Old_Italic,
192 Old_Persian,
193 Oriya,
194 Osmanya,
195 Phags_Pa,
196 Phoenician,
197 Runic,
198 Shavian,
199 Sinhala,
200 Syloti_Nagri,
201 Syriac,
202 Tagalog,
203 Tagbanwa,
204 Tai_Le,
205 Tamil,
206 Telugu,
207 Thaana,
208 Thai,
209 Tibetan,
210 Tifinagh,
211 Ugaritic,
212 Yi.
213 </P>
214 <br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
215 <P>
216 <pre>
217 [...] positive character class
218 [^...] negative character class
219 [x-y] range (can be used for hex characters)
220 [[:xxx:]] positive POSIX named set
221 [[:^xxx:]] negative POSIX named set
222
223 alnum alphanumeric
224 alpha alphabetic
225 ascii 0-127
226 blank space or tab
227 cntrl control character
228 digit decimal digit
229 graph printing, excluding space
230 lower lower case letter
231 print printing, including space
232 punct printing, excluding alphanumeric
233 space whitespace
234 upper upper case letter
235 word same as \w
236 xdigit hexadecimal digit
237 </pre>
238 In PCRE, POSIX character set names recognize only ASCII characters. You can use
239 \Q...\E inside a character class.
240 </P>
241 <br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
242 <P>
243 <pre>
244 ? 0 or 1, greedy
245 ?+ 0 or 1, possessive
246 ?? 0 or 1, lazy
247 * 0 or more, greedy
248 *+ 0 or more, possessive
249 *? 0 or more, lazy
250 + 1 or more, greedy
251 ++ 1 or more, possessive
252 +? 1 or more, lazy
253 {n} exactly n
254 {n,m} at least n, no more than m, greedy
255 {n,m}+ at least n, no more than m, possessive
256 {n,m}? at least n, no more than m, lazy
257 {n,} n or more, greedy
258 {n,}+ n or more, possessive
259 {n,}? n or more, lazy
260 </PRE>
261 </P>
262 <br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
263 <P>
264 <pre>
265 \b word boundary
266 \B not a word boundary
267 ^ start of subject
268 also after internal newline in multiline mode
269 \A start of subject
270 $ end of subject
271 also before newline at end of subject
272 also before internal newline in multiline mode
273 \Z end of subject
274 also before newline at end of subject
275 \z end of subject
276 \G first matching position in subject
277 </PRE>
278 </P>
279 <br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
280 <P>
281 <pre>
282 \K reset start of match
283 </PRE>
284 </P>
285 <br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
286 <P>
287 <pre>
288 expr|expr|expr...
289 </PRE>
290 </P>
291 <br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
292 <P>
293 <pre>
294 (...) capturing group
295 (?&#60;name&#62;...) named capturing group (Perl)
296 (?'name'...) named capturing group (Perl)
297 (?P&#60;name&#62;...) named capturing group (Python)
298 (?:...) non-capturing group
299 (?|...) non-capturing group; reset group numbers for
300 capturing groups in each alternative
301 </PRE>
302 </P>
303 <br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
304 <P>
305 <pre>
306 (?&#62;...) atomic, non-capturing group
307 </PRE>
308 </P>
309 <br><a name="SEC14" href="#TOC1">COMMENT</a><br>
310 <P>
311 <pre>
312 (?#....) comment (not nestable)
313 </PRE>
314 </P>
315 <br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
316 <P>
317 <pre>
318 (?i) caseless
319 (?J) allow duplicate names
320 (?m) multiline
321 (?s) single line (dotall)
322 (?U) default ungreedy (lazy)
323 (?x) extended (ignore white space)
324 (?-...) unset option(s)
325 </PRE>
326 </P>
327 <br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
328 <P>
329 <pre>
330 (?=...) positive look ahead
331 (?!...) negative look ahead
332 (?&#60;=...) positive look behind
333 (?&#60;!...) negative look behind
334 </pre>
335 Each top-level branch of a look behind must be of a fixed length.
336 </P>
337 <br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
338 <P>
339 <pre>
340 \n reference by number (can be ambiguous)
341 \gn reference by number
342 \g{n} reference by number
343 \g{-n} relative reference by number
344 \k&#60;name&#62; reference by name (Perl)
345 \k'name' reference by name (Perl)
346 \g{name} reference by name (Perl)
347 \k{name} reference by name (.NET)
348 (?P=name) reference by name (Python)
349 </PRE>
350 </P>
351 <br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
352 <P>
353 <pre>
354 (?R) recurse whole pattern
355 (?n) call subpattern by absolute number
356 (?+n) call subpattern by relative number
357 (?-n) call subpattern by relative number
358 (?&name) call subpattern by name (Perl)
359 (?P&#62;name) call subpattern by name (Python)
360 \g&#60;name&#62; call subpattern by name (Oniguruma)
361 \g'name' call subpattern by name (Oniguruma)
362 \g&#60;n&#62; call subpattern by absolute number (Oniguruma)
363 \g'n' call subpattern by absolute number (Oniguruma)
364 \g&#60;+n&#62; call subpattern by relative number (PCRE extension)
365 \g'+n' call subpattern by relative number (PCRE extension)
366 \g&#60;-n&#62; call subpattern by relative number (PCRE extension)
367 \g'-n' call subpattern by relative number (PCRE extension)
368 </PRE>
369 </P>
370 <br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
371 <P>
372 <pre>
373 (?(condition)yes-pattern)
374 (?(condition)yes-pattern|no-pattern)
375
376 (?(n)... absolute reference condition
377 (?(+n)... relative reference condition
378 (?(-n)... relative reference condition
379 (?(&#60;name&#62;)... named reference condition (Perl)
380 (?('name')... named reference condition (Perl)
381 (?(name)... named reference condition (PCRE)
382 (?(R)... overall recursion condition
383 (?(Rn)... specific group recursion condition
384 (?(R&name)... specific recursion condition
385 (?(DEFINE)... define subpattern for reference
386 (?(assert)... assertion condition
387 </PRE>
388 </P>
389 <br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>
390 <P>
391 The following act immediately they are reached:
392 <pre>
393 (*ACCEPT) force successful match
394 (*FAIL) force backtrack; synonym (*F)
395 </pre>
396 The following act only when a subsequent match failure causes a backtrack to
397 reach them. They all force a match failure, but they differ in what happens
398 afterwards. Those that advance the start-of-match point do so only if the
399 pattern is not anchored.
400 <pre>
401 (*COMMIT) overall failure, no advance of starting point
402 (*PRUNE) advance to next starting character
403 (*SKIP) advance start to current matching position
404 (*THEN) local failure, backtrack to next alternation
405 </PRE>
406 </P>
407 <br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br>
408 <P>
409 These are recognized only at the very start of the pattern or after a
410 (*BSR_...) option.
411 <pre>
412 (*CR)
413 (*LF)
414 (*CRLF)
415 (*ANYCRLF)
416 (*ANY)
417 </PRE>
418 </P>
419 <br><a name="SEC22" href="#TOC1">WHAT \R MATCHES</a><br>
420 <P>
421 These are recognized only at the very start of the pattern or after a
422 (*...) option that sets the newline convention.
423 <pre>
424 (*BSR_ANYCRLF)
425 (*BSR_UNICODE)
426 </PRE>
427 </P>
428 <br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
429 <P>
430 <pre>
431 (?C) callout
432 (?Cn) callout with data n
433 </PRE>
434 </P>
435 <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
436 <P>
437 <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
438 <b>pcrematching</b>(3), <b>pcre</b>(3).
439 </P>
440 <br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
441 <P>
442 Philip Hazel
443 <br>
444 University Computing Service
445 <br>
446 Cambridge CB2 3QH, England.
447 <br>
448 </P>
449 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
450 <P>
451 Last updated: 09 April 2008
452 <br>
453 Copyright &copy; 1997-2008 University of Cambridge.
454 <br>
455 <p>
456 Return to the <a href="index.html">PCRE index page</a>.
457 </p>

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12