/[pcre]/code/trunk/doc/pcresyntax.3
ViewVC logotype

Contents of /code/trunk/doc/pcresyntax.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 903 - (show annotations) (download)
Sat Jan 21 16:37:17 2012 UTC (2 years, 9 months ago) by ph10
File size: 11967 byte(s)
Source file tidies for 8.30-RC1 release; fix Makefile.am bugs for building 
symbolic links to man pages.

1 .TH PCRESYNTAX 3
2 .SH NAME
3 PCRE - Perl-compatible regular expressions
4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
5 .rs
6 .sp
7 The full syntax and semantics of the regular expressions that are supported by
8 PCRE are described in the
9 .\" HREF
10 \fBpcrepattern\fP
11 .\"
12 documentation. This document contains a quick-reference summary of the syntax.
13 .
14 .
15 .SH "QUOTING"
16 .rs
17 .sp
18 \ex where x is non-alphanumeric is a literal x
19 \eQ...\eE treat enclosed characters as literal
20 .
21 .
22 .SH "CHARACTERS"
23 .rs
24 .sp
25 \ea alarm, that is, the BEL character (hex 07)
26 \ecx "control-x", where x is any ASCII character
27 \ee escape (hex 1B)
28 \ef formfeed (hex 0C)
29 \en newline (hex 0A)
30 \er carriage return (hex 0D)
31 \et tab (hex 09)
32 \eddd character with octal code ddd, or backreference
33 \exhh character with hex code hh
34 \ex{hhh..} character with hex code hhh..
35 .
36 .
37 .SH "CHARACTER TYPES"
38 .rs
39 .sp
40 . any character except newline;
41 in dotall mode, any character whatsoever
42 \eC one data unit, even in UTF mode (best avoided)
43 \ed a decimal digit
44 \eD a character that is not a decimal digit
45 \eh a horizontal whitespace character
46 \eH a character that is not a horizontal whitespace character
47 \eN a character that is not a newline
48 \ep{\fIxx\fP} a character with the \fIxx\fP property
49 \eP{\fIxx\fP} a character without the \fIxx\fP property
50 \eR a newline sequence
51 \es a whitespace character
52 \eS a character that is not a whitespace character
53 \ev a vertical whitespace character
54 \eV a character that is not a vertical whitespace character
55 \ew a "word" character
56 \eW a "non-word" character
57 \eX an extended Unicode sequence
58 .sp
59 In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
60 characters, even in a UTF mode. However, this can be changed by setting the
61 PCRE_UCP option.
62 .
63 .
64 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
65 .rs
66 .sp
67 C Other
68 Cc Control
69 Cf Format
70 Cn Unassigned
71 Co Private use
72 Cs Surrogate
73 .sp
74 L Letter
75 Ll Lower case letter
76 Lm Modifier letter
77 Lo Other letter
78 Lt Title case letter
79 Lu Upper case letter
80 L& Ll, Lu, or Lt
81 .sp
82 M Mark
83 Mc Spacing mark
84 Me Enclosing mark
85 Mn Non-spacing mark
86 .sp
87 N Number
88 Nd Decimal number
89 Nl Letter number
90 No Other number
91 .sp
92 P Punctuation
93 Pc Connector punctuation
94 Pd Dash punctuation
95 Pe Close punctuation
96 Pf Final punctuation
97 Pi Initial punctuation
98 Po Other punctuation
99 Ps Open punctuation
100 .sp
101 S Symbol
102 Sc Currency symbol
103 Sk Modifier symbol
104 Sm Mathematical symbol
105 So Other symbol
106 .sp
107 Z Separator
108 Zl Line separator
109 Zp Paragraph separator
110 Zs Space separator
111 .
112 .
113 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
114 .rs
115 .sp
116 Xan Alphanumeric: union of properties L and N
117 Xps POSIX space: property Z or tab, NL, VT, FF, CR
118 Xsp Perl space: property Z or tab, NL, FF, CR
119 Xwd Perl word: property Xan or underscore
120 .
121 .
122 .SH "SCRIPT NAMES FOR \ep AND \eP"
123 .rs
124 .sp
125 Arabic,
126 Armenian,
127 Avestan,
128 Balinese,
129 Bamum,
130 Bengali,
131 Bopomofo,
132 Braille,
133 Buginese,
134 Buhid,
135 Canadian_Aboriginal,
136 Carian,
137 Cham,
138 Cherokee,
139 Common,
140 Coptic,
141 Cuneiform,
142 Cypriot,
143 Cyrillic,
144 Deseret,
145 Devanagari,
146 Egyptian_Hieroglyphs,
147 Ethiopic,
148 Georgian,
149 Glagolitic,
150 Gothic,
151 Greek,
152 Gujarati,
153 Gurmukhi,
154 Han,
155 Hangul,
156 Hanunoo,
157 Hebrew,
158 Hiragana,
159 Imperial_Aramaic,
160 Inherited,
161 Inscriptional_Pahlavi,
162 Inscriptional_Parthian,
163 Javanese,
164 Kaithi,
165 Kannada,
166 Katakana,
167 Kayah_Li,
168 Kharoshthi,
169 Khmer,
170 Lao,
171 Latin,
172 Lepcha,
173 Limbu,
174 Linear_B,
175 Lisu,
176 Lycian,
177 Lydian,
178 Malayalam,
179 Meetei_Mayek,
180 Mongolian,
181 Myanmar,
182 New_Tai_Lue,
183 Nko,
184 Ogham,
185 Old_Italic,
186 Old_Persian,
187 Old_South_Arabian,
188 Old_Turkic,
189 Ol_Chiki,
190 Oriya,
191 Osmanya,
192 Phags_Pa,
193 Phoenician,
194 Rejang,
195 Runic,
196 Samaritan,
197 Saurashtra,
198 Shavian,
199 Sinhala,
200 Sundanese,
201 Syloti_Nagri,
202 Syriac,
203 Tagalog,
204 Tagbanwa,
205 Tai_Le,
206 Tai_Tham,
207 Tai_Viet,
208 Tamil,
209 Telugu,
210 Thaana,
211 Thai,
212 Tibetan,
213 Tifinagh,
214 Ugaritic,
215 Vai,
216 Yi.
217 .
218 .
219 .SH "CHARACTER CLASSES"
220 .rs
221 .sp
222 [...] positive character class
223 [^...] negative character class
224 [x-y] range (can be used for hex characters)
225 [[:xxx:]] positive POSIX named set
226 [[:^xxx:]] negative POSIX named set
227 .sp
228 alnum alphanumeric
229 alpha alphabetic
230 ascii 0-127
231 blank space or tab
232 cntrl control character
233 digit decimal digit
234 graph printing, excluding space
235 lower lower case letter
236 print printing, including space
237 punct printing, excluding alphanumeric
238 space whitespace
239 upper upper case letter
240 word same as \ew
241 xdigit hexadecimal digit
242 .sp
243 In PCRE, POSIX character set names recognize only ASCII characters by default,
244 but some of them use Unicode properties if PCRE_UCP is set. You can use
245 \eQ...\eE inside a character class.
246 .
247 .
248 .SH "QUANTIFIERS"
249 .rs
250 .sp
251 ? 0 or 1, greedy
252 ?+ 0 or 1, possessive
253 ?? 0 or 1, lazy
254 * 0 or more, greedy
255 *+ 0 or more, possessive
256 *? 0 or more, lazy
257 + 1 or more, greedy
258 ++ 1 or more, possessive
259 +? 1 or more, lazy
260 {n} exactly n
261 {n,m} at least n, no more than m, greedy
262 {n,m}+ at least n, no more than m, possessive
263 {n,m}? at least n, no more than m, lazy
264 {n,} n or more, greedy
265 {n,}+ n or more, possessive
266 {n,}? n or more, lazy
267 .
268 .
269 .SH "ANCHORS AND SIMPLE ASSERTIONS"
270 .rs
271 .sp
272 \eb word boundary
273 \eB not a word boundary
274 ^ start of subject
275 also after internal newline in multiline mode
276 \eA start of subject
277 $ end of subject
278 also before newline at end of subject
279 also before internal newline in multiline mode
280 \eZ end of subject
281 also before newline at end of subject
282 \ez end of subject
283 \eG first matching position in subject
284 .
285 .
286 .SH "MATCH POINT RESET"
287 .rs
288 .sp
289 \eK reset start of match
290 .
291 .
292 .SH "ALTERNATION"
293 .rs
294 .sp
295 expr|expr|expr...
296 .
297 .
298 .SH "CAPTURING"
299 .rs
300 .sp
301 (...) capturing group
302 (?<name>...) named capturing group (Perl)
303 (?'name'...) named capturing group (Perl)
304 (?P<name>...) named capturing group (Python)
305 (?:...) non-capturing group
306 (?|...) non-capturing group; reset group numbers for
307 capturing groups in each alternative
308 .
309 .
310 .SH "ATOMIC GROUPS"
311 .rs
312 .sp
313 (?>...) atomic, non-capturing group
314 .
315 .
316 .
317 .
318 .SH "COMMENT"
319 .rs
320 .sp
321 (?#....) comment (not nestable)
322 .
323 .
324 .SH "OPTION SETTING"
325 .rs
326 .sp
327 (?i) caseless
328 (?J) allow duplicate names
329 (?m) multiline
330 (?s) single line (dotall)
331 (?U) default ungreedy (lazy)
332 (?x) extended (ignore white space)
333 (?-...) unset option(s)
334 .sp
335 The following are recognized only at the start of a pattern or after one of the
336 newline-setting options with similar syntax:
337 .sp
338 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
339 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
340 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
341 (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
342 .
343 .
344 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
345 .rs
346 .sp
347 (?=...) positive look ahead
348 (?!...) negative look ahead
349 (?<=...) positive look behind
350 (?<!...) negative look behind
351 .sp
352 Each top-level branch of a look behind must be of a fixed length.
353 .
354 .
355 .SH "BACKREFERENCES"
356 .rs
357 .sp
358 \en reference by number (can be ambiguous)
359 \egn reference by number
360 \eg{n} reference by number
361 \eg{-n} relative reference by number
362 \ek<name> reference by name (Perl)
363 \ek'name' reference by name (Perl)
364 \eg{name} reference by name (Perl)
365 \ek{name} reference by name (.NET)
366 (?P=name) reference by name (Python)
367 .
368 .
369 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
370 .rs
371 .sp
372 (?R) recurse whole pattern
373 (?n) call subpattern by absolute number
374 (?+n) call subpattern by relative number
375 (?-n) call subpattern by relative number
376 (?&name) call subpattern by name (Perl)
377 (?P>name) call subpattern by name (Python)
378 \eg<name> call subpattern by name (Oniguruma)
379 \eg'name' call subpattern by name (Oniguruma)
380 \eg<n> call subpattern by absolute number (Oniguruma)
381 \eg'n' call subpattern by absolute number (Oniguruma)
382 \eg<+n> call subpattern by relative number (PCRE extension)
383 \eg'+n' call subpattern by relative number (PCRE extension)
384 \eg<-n> call subpattern by relative number (PCRE extension)
385 \eg'-n' call subpattern by relative number (PCRE extension)
386 .
387 .
388 .SH "CONDITIONAL PATTERNS"
389 .rs
390 .sp
391 (?(condition)yes-pattern)
392 (?(condition)yes-pattern|no-pattern)
393 .sp
394 (?(n)... absolute reference condition
395 (?(+n)... relative reference condition
396 (?(-n)... relative reference condition
397 (?(<name>)... named reference condition (Perl)
398 (?('name')... named reference condition (Perl)
399 (?(name)... named reference condition (PCRE)
400 (?(R)... overall recursion condition
401 (?(Rn)... specific group recursion condition
402 (?(R&name)... specific recursion condition
403 (?(DEFINE)... define subpattern for reference
404 (?(assert)... assertion condition
405 .
406 .
407 .SH "BACKTRACKING CONTROL"
408 .rs
409 .sp
410 The following act immediately they are reached:
411 .sp
412 (*ACCEPT) force successful match
413 (*FAIL) force backtrack; synonym (*F)
414 (*MARK:NAME) set name to be passed back; synonym (*:NAME)
415 .sp
416 The following act only when a subsequent match failure causes a backtrack to
417 reach them. They all force a match failure, but they differ in what happens
418 afterwards. Those that advance the start-of-match point do so only if the
419 pattern is not anchored.
420 .sp
421 (*COMMIT) overall failure, no advance of starting point
422 (*PRUNE) advance to next starting character
423 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
424 (*SKIP) advance to current matching position
425 (*SKIP:NAME) advance to position corresponding to an earlier
426 (*MARK:NAME); if not found, the (*SKIP) is ignored
427 (*THEN) local failure, backtrack to next alternation
428 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
429 .
430 .
431 .SH "NEWLINE CONVENTIONS"
432 .rs
433 .sp
434 These are recognized only at the very start of the pattern or after a
435 (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
436 .sp
437 (*CR) carriage return only
438 (*LF) linefeed only
439 (*CRLF) carriage return followed by linefeed
440 (*ANYCRLF) all three of the above
441 (*ANY) any Unicode newline sequence
442 .
443 .
444 .SH "WHAT \eR MATCHES"
445 .rs
446 .sp
447 These are recognized only at the very start of the pattern or after a
448 (*...) option that sets the newline convention or a UTF or UCP mode.
449 .sp
450 (*BSR_ANYCRLF) CR, LF, or CRLF
451 (*BSR_UNICODE) any Unicode newline sequence
452 .
453 .
454 .SH "CALLOUTS"
455 .rs
456 .sp
457 (?C) callout
458 (?Cn) callout with data n
459 .
460 .
461 .SH "SEE ALSO"
462 .rs
463 .sp
464 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
465 \fBpcrematching\fP(3), \fBpcre\fP(3).
466 .
467 .
468 .SH AUTHOR
469 .rs
470 .sp
471 .nf
472 Philip Hazel
473 University Computing Service
474 Cambridge CB2 3QH, England.
475 .fi
476 .
477 .
478 .SH REVISION
479 .rs
480 .sp
481 .nf
482 Last updated: 10 January 2012
483 Copyright (c) 1997-2012 University of Cambridge.
484 .fi

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12