/[pcre]/code/trunk/doc/pcresyntax.3
ViewVC logotype

Contents of /code/trunk/doc/pcresyntax.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 954 - (show annotations) (download)
Sat Mar 31 18:09:26 2012 UTC (2 years ago) by ph10
File size: 12105 byte(s)
Add date and PCRE version to .TH macros of all man pages.

1 .TH PCRESYNTAX 3 "10 January 2012" "PCRE 8.30"
2 .SH NAME
3 PCRE - Perl-compatible regular expressions
4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
5 .rs
6 .sp
7 The full syntax and semantics of the regular expressions that are supported by
8 PCRE are described in the
9 .\" HREF
10 \fBpcrepattern\fP
11 .\"
12 documentation. This document contains a quick-reference summary of the syntax.
13 .
14 .
15 .SH "QUOTING"
16 .rs
17 .sp
18 \ex where x is non-alphanumeric is a literal x
19 \eQ...\eE treat enclosed characters as literal
20 .
21 .
22 .SH "CHARACTERS"
23 .rs
24 .sp
25 \ea alarm, that is, the BEL character (hex 07)
26 \ecx "control-x", where x is any ASCII character
27 \ee escape (hex 1B)
28 \ef formfeed (hex 0C)
29 \en newline (hex 0A)
30 \er carriage return (hex 0D)
31 \et tab (hex 09)
32 \eddd character with octal code ddd, or backreference
33 \exhh character with hex code hh
34 \ex{hhh..} character with hex code hhh..
35 .
36 .
37 .SH "CHARACTER TYPES"
38 .rs
39 .sp
40 . any character except newline;
41 in dotall mode, any character whatsoever
42 \eC one data unit, even in UTF mode (best avoided)
43 \ed a decimal digit
44 \eD a character that is not a decimal digit
45 \eh a horizontal whitespace character
46 \eH a character that is not a horizontal whitespace character
47 \eN a character that is not a newline
48 \ep{\fIxx\fP} a character with the \fIxx\fP property
49 \eP{\fIxx\fP} a character without the \fIxx\fP property
50 \eR a newline sequence
51 \es a whitespace character
52 \eS a character that is not a whitespace character
53 \ev a vertical whitespace character
54 \eV a character that is not a vertical whitespace character
55 \ew a "word" character
56 \eW a "non-word" character
57 \eX an extended Unicode sequence
58 .sp
59 In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
60 characters, even in a UTF mode. However, this can be changed by setting the
61 PCRE_UCP option.
62 .
63 .
64 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
65 .rs
66 .sp
67 C Other
68 Cc Control
69 Cf Format
70 Cn Unassigned
71 Co Private use
72 Cs Surrogate
73 .sp
74 L Letter
75 Ll Lower case letter
76 Lm Modifier letter
77 Lo Other letter
78 Lt Title case letter
79 Lu Upper case letter
80 L& Ll, Lu, or Lt
81 .sp
82 M Mark
83 Mc Spacing mark
84 Me Enclosing mark
85 Mn Non-spacing mark
86 .sp
87 N Number
88 Nd Decimal number
89 Nl Letter number
90 No Other number
91 .sp
92 P Punctuation
93 Pc Connector punctuation
94 Pd Dash punctuation
95 Pe Close punctuation
96 Pf Final punctuation
97 Pi Initial punctuation
98 Po Other punctuation
99 Ps Open punctuation
100 .sp
101 S Symbol
102 Sc Currency symbol
103 Sk Modifier symbol
104 Sm Mathematical symbol
105 So Other symbol
106 .sp
107 Z Separator
108 Zl Line separator
109 Zp Paragraph separator
110 Zs Space separator
111 .
112 .
113 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
114 .rs
115 .sp
116 Xan Alphanumeric: union of properties L and N
117 Xps POSIX space: property Z or tab, NL, VT, FF, CR
118 Xsp Perl space: property Z or tab, NL, FF, CR
119 Xwd Perl word: property Xan or underscore
120 .
121 .
122 .SH "SCRIPT NAMES FOR \ep AND \eP"
123 .rs
124 .sp
125 Arabic,
126 Armenian,
127 Avestan,
128 Balinese,
129 Bamum,
130 Batak,
131 Bengali,
132 Bopomofo,
133 Brahmi,
134 Braille,
135 Buginese,
136 Buhid,
137 Canadian_Aboriginal,
138 Carian,
139 Chakma,
140 Cham,
141 Cherokee,
142 Common,
143 Coptic,
144 Cuneiform,
145 Cypriot,
146 Cyrillic,
147 Deseret,
148 Devanagari,
149 Egyptian_Hieroglyphs,
150 Ethiopic,
151 Georgian,
152 Glagolitic,
153 Gothic,
154 Greek,
155 Gujarati,
156 Gurmukhi,
157 Han,
158 Hangul,
159 Hanunoo,
160 Hebrew,
161 Hiragana,
162 Imperial_Aramaic,
163 Inherited,
164 Inscriptional_Pahlavi,
165 Inscriptional_Parthian,
166 Javanese,
167 Kaithi,
168 Kannada,
169 Katakana,
170 Kayah_Li,
171 Kharoshthi,
172 Khmer,
173 Lao,
174 Latin,
175 Lepcha,
176 Limbu,
177 Linear_B,
178 Lisu,
179 Lycian,
180 Lydian,
181 Malayalam,
182 Mandaic,
183 Meetei_Mayek,
184 Meroitic_Cursive,
185 Meroitic_Hieroglyphs,
186 Miao,
187 Mongolian,
188 Myanmar,
189 New_Tai_Lue,
190 Nko,
191 Ogham,
192 Old_Italic,
193 Old_Persian,
194 Old_South_Arabian,
195 Old_Turkic,
196 Ol_Chiki,
197 Oriya,
198 Osmanya,
199 Phags_Pa,
200 Phoenician,
201 Rejang,
202 Runic,
203 Samaritan,
204 Saurashtra,
205 Sharada,
206 Shavian,
207 Sinhala,
208 Sora_Sompeng,
209 Sundanese,
210 Syloti_Nagri,
211 Syriac,
212 Tagalog,
213 Tagbanwa,
214 Tai_Le,
215 Tai_Tham,
216 Tai_Viet,
217 Takri,
218 Tamil,
219 Telugu,
220 Thaana,
221 Thai,
222 Tibetan,
223 Tifinagh,
224 Ugaritic,
225 Vai,
226 Yi.
227 .
228 .
229 .SH "CHARACTER CLASSES"
230 .rs
231 .sp
232 [...] positive character class
233 [^...] negative character class
234 [x-y] range (can be used for hex characters)
235 [[:xxx:]] positive POSIX named set
236 [[:^xxx:]] negative POSIX named set
237 .sp
238 alnum alphanumeric
239 alpha alphabetic
240 ascii 0-127
241 blank space or tab
242 cntrl control character
243 digit decimal digit
244 graph printing, excluding space
245 lower lower case letter
246 print printing, including space
247 punct printing, excluding alphanumeric
248 space whitespace
249 upper upper case letter
250 word same as \ew
251 xdigit hexadecimal digit
252 .sp
253 In PCRE, POSIX character set names recognize only ASCII characters by default,
254 but some of them use Unicode properties if PCRE_UCP is set. You can use
255 \eQ...\eE inside a character class.
256 .
257 .
258 .SH "QUANTIFIERS"
259 .rs
260 .sp
261 ? 0 or 1, greedy
262 ?+ 0 or 1, possessive
263 ?? 0 or 1, lazy
264 * 0 or more, greedy
265 *+ 0 or more, possessive
266 *? 0 or more, lazy
267 + 1 or more, greedy
268 ++ 1 or more, possessive
269 +? 1 or more, lazy
270 {n} exactly n
271 {n,m} at least n, no more than m, greedy
272 {n,m}+ at least n, no more than m, possessive
273 {n,m}? at least n, no more than m, lazy
274 {n,} n or more, greedy
275 {n,}+ n or more, possessive
276 {n,}? n or more, lazy
277 .
278 .
279 .SH "ANCHORS AND SIMPLE ASSERTIONS"
280 .rs
281 .sp
282 \eb word boundary
283 \eB not a word boundary
284 ^ start of subject
285 also after internal newline in multiline mode
286 \eA start of subject
287 $ end of subject
288 also before newline at end of subject
289 also before internal newline in multiline mode
290 \eZ end of subject
291 also before newline at end of subject
292 \ez end of subject
293 \eG first matching position in subject
294 .
295 .
296 .SH "MATCH POINT RESET"
297 .rs
298 .sp
299 \eK reset start of match
300 .
301 .
302 .SH "ALTERNATION"
303 .rs
304 .sp
305 expr|expr|expr...
306 .
307 .
308 .SH "CAPTURING"
309 .rs
310 .sp
311 (...) capturing group
312 (?<name>...) named capturing group (Perl)
313 (?'name'...) named capturing group (Perl)
314 (?P<name>...) named capturing group (Python)
315 (?:...) non-capturing group
316 (?|...) non-capturing group; reset group numbers for
317 capturing groups in each alternative
318 .
319 .
320 .SH "ATOMIC GROUPS"
321 .rs
322 .sp
323 (?>...) atomic, non-capturing group
324 .
325 .
326 .
327 .
328 .SH "COMMENT"
329 .rs
330 .sp
331 (?#....) comment (not nestable)
332 .
333 .
334 .SH "OPTION SETTING"
335 .rs
336 .sp
337 (?i) caseless
338 (?J) allow duplicate names
339 (?m) multiline
340 (?s) single line (dotall)
341 (?U) default ungreedy (lazy)
342 (?x) extended (ignore white space)
343 (?-...) unset option(s)
344 .sp
345 The following are recognized only at the start of a pattern or after one of the
346 newline-setting options with similar syntax:
347 .sp
348 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
349 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
350 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
351 (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
352 .
353 .
354 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
355 .rs
356 .sp
357 (?=...) positive look ahead
358 (?!...) negative look ahead
359 (?<=...) positive look behind
360 (?<!...) negative look behind
361 .sp
362 Each top-level branch of a look behind must be of a fixed length.
363 .
364 .
365 .SH "BACKREFERENCES"
366 .rs
367 .sp
368 \en reference by number (can be ambiguous)
369 \egn reference by number
370 \eg{n} reference by number
371 \eg{-n} relative reference by number
372 \ek<name> reference by name (Perl)
373 \ek'name' reference by name (Perl)
374 \eg{name} reference by name (Perl)
375 \ek{name} reference by name (.NET)
376 (?P=name) reference by name (Python)
377 .
378 .
379 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
380 .rs
381 .sp
382 (?R) recurse whole pattern
383 (?n) call subpattern by absolute number
384 (?+n) call subpattern by relative number
385 (?-n) call subpattern by relative number
386 (?&name) call subpattern by name (Perl)
387 (?P>name) call subpattern by name (Python)
388 \eg<name> call subpattern by name (Oniguruma)
389 \eg'name' call subpattern by name (Oniguruma)
390 \eg<n> call subpattern by absolute number (Oniguruma)
391 \eg'n' call subpattern by absolute number (Oniguruma)
392 \eg<+n> call subpattern by relative number (PCRE extension)
393 \eg'+n' call subpattern by relative number (PCRE extension)
394 \eg<-n> call subpattern by relative number (PCRE extension)
395 \eg'-n' call subpattern by relative number (PCRE extension)
396 .
397 .
398 .SH "CONDITIONAL PATTERNS"
399 .rs
400 .sp
401 (?(condition)yes-pattern)
402 (?(condition)yes-pattern|no-pattern)
403 .sp
404 (?(n)... absolute reference condition
405 (?(+n)... relative reference condition
406 (?(-n)... relative reference condition
407 (?(<name>)... named reference condition (Perl)
408 (?('name')... named reference condition (Perl)
409 (?(name)... named reference condition (PCRE)
410 (?(R)... overall recursion condition
411 (?(Rn)... specific group recursion condition
412 (?(R&name)... specific recursion condition
413 (?(DEFINE)... define subpattern for reference
414 (?(assert)... assertion condition
415 .
416 .
417 .SH "BACKTRACKING CONTROL"
418 .rs
419 .sp
420 The following act immediately they are reached:
421 .sp
422 (*ACCEPT) force successful match
423 (*FAIL) force backtrack; synonym (*F)
424 (*MARK:NAME) set name to be passed back; synonym (*:NAME)
425 .sp
426 The following act only when a subsequent match failure causes a backtrack to
427 reach them. They all force a match failure, but they differ in what happens
428 afterwards. Those that advance the start-of-match point do so only if the
429 pattern is not anchored.
430 .sp
431 (*COMMIT) overall failure, no advance of starting point
432 (*PRUNE) advance to next starting character
433 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
434 (*SKIP) advance to current matching position
435 (*SKIP:NAME) advance to position corresponding to an earlier
436 (*MARK:NAME); if not found, the (*SKIP) is ignored
437 (*THEN) local failure, backtrack to next alternation
438 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
439 .
440 .
441 .SH "NEWLINE CONVENTIONS"
442 .rs
443 .sp
444 These are recognized only at the very start of the pattern or after a
445 (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
446 .sp
447 (*CR) carriage return only
448 (*LF) linefeed only
449 (*CRLF) carriage return followed by linefeed
450 (*ANYCRLF) all three of the above
451 (*ANY) any Unicode newline sequence
452 .
453 .
454 .SH "WHAT \eR MATCHES"
455 .rs
456 .sp
457 These are recognized only at the very start of the pattern or after a
458 (*...) option that sets the newline convention or a UTF or UCP mode.
459 .sp
460 (*BSR_ANYCRLF) CR, LF, or CRLF
461 (*BSR_UNICODE) any Unicode newline sequence
462 .
463 .
464 .SH "CALLOUTS"
465 .rs
466 .sp
467 (?C) callout
468 (?Cn) callout with data n
469 .
470 .
471 .SH "SEE ALSO"
472 .rs
473 .sp
474 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
475 \fBpcrematching\fP(3), \fBpcre\fP(3).
476 .
477 .
478 .SH AUTHOR
479 .rs
480 .sp
481 .nf
482 Philip Hazel
483 University Computing Service
484 Cambridge CB2 3QH, England.
485 .fi
486 .
487 .
488 .SH REVISION
489 .rs
490 .sp
491 .nf
492 Last updated: 10 January 2012
493 Copyright (c) 1997-2012 University of Cambridge.
494 .fi

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12