/[pcre]/code/trunk/doc/pcresyntax.3
ViewVC logotype

Contents of /code/trunk/doc/pcresyntax.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 260 - (hide annotations) (download)
Thu Sep 20 10:19:16 2007 UTC (6 years, 6 months ago) by ph10
File size: 9637 byte(s)
Documentation tidies and clarifications.

1 ph10 208 .TH PCRESYNTAX 3
2     .SH NAME
3     PCRE - Perl-compatible regular expressions
4     .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
5     .rs
6     .sp
7     The full syntax and semantics of the regular expressions that are supported by
8     PCRE are described in the
9     .\" HREF
10     \fBpcrepattern\fP
11     .\"
12     documentation. This document contains just a quick-reference summary of the
13     syntax.
14     .
15     .
16     .SH "QUOTING"
17     .rs
18     .sp
19     \ex where x is non-alphanumeric is a literal x
20     \eQ...\eE treat enclosed characters as literal
21     .
22     .
23     .SH "CHARACTERS"
24     .rs
25     .sp
26     \ea alarm, that is, the BEL character (hex 07)
27     \ecx "control-x", where x is any character
28     \ee escape (hex 1B)
29     \ef formfeed (hex 0C)
30     \en newline (hex 0A)
31     \er carriage return (hex 0D)
32     \et tab (hex 09)
33     \eddd character with octal code ddd, or backreference
34     \exhh character with hex code hh
35     \ex{hhh..} character with hex code hhh..
36     .
37     .
38     .SH "CHARACTER TYPES"
39     .rs
40     .sp
41     . any character except newline;
42     in dotall mode, any character whatsoever
43     \eC one byte, even in UTF-8 mode (best avoided)
44     \ed a decimal digit
45     \eD a character that is not a decimal digit
46     \eh a horizontal whitespace character
47     \eH a character that is not a horizontal whitespace character
48     \ep{\fIxx\fP} a character with the \fIxx\fP property
49     \eP{\fIxx\fP} a character without the \fIxx\fP property
50     \eR a newline sequence
51     \es a whitespace character
52     \eS a character that is not a whitespace character
53     \ev a vertical whitespace character
54     \eV a character that is not a vertical whitespace character
55     \ew a "word" character
56     \eW a "non-word" character
57     \eX an extended Unicode sequence
58     .sp
59     In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.
60     .
61     .
62     .SH "GENERAL CATEGORY PROPERTY CODES FOR \ep and \eP"
63     .rs
64     .sp
65     C Other
66     Cc Control
67     Cf Format
68     Cn Unassigned
69     Co Private use
70     Cs Surrogate
71     .sp
72     L Letter
73     Ll Lower case letter
74     Lm Modifier letter
75     Lo Other letter
76     Lt Title case letter
77     Lu Upper case letter
78     L& Ll, Lu, or Lt
79     .sp
80     M Mark
81     Mc Spacing mark
82     Me Enclosing mark
83     Mn Non-spacing mark
84     .sp
85     N Number
86     Nd Decimal number
87     Nl Letter number
88     No Other number
89     .sp
90     P Punctuation
91     Pc Connector punctuation
92     Pd Dash punctuation
93     Pe Close punctuation
94     Pf Final punctuation
95     Pi Initial punctuation
96     Po Other punctuation
97     Ps Open punctuation
98     .sp
99     S Symbol
100     Sc Currency symbol
101     Sk Modifier symbol
102     Sm Mathematical symbol
103     So Other symbol
104     .sp
105     Z Separator
106     Zl Line separator
107     Zp Paragraph separator
108     Zs Space separator
109     .
110     .
111     .SH "SCRIPT NAMES FOR \ep AND \eP"
112     .rs
113     .sp
114     Arabic,
115     Armenian,
116     Balinese,
117     Bengali,
118     Bopomofo,
119     Braille,
120     Buginese,
121     Buhid,
122     Canadian_Aboriginal,
123     Cherokee,
124     Common,
125     Coptic,
126     Cuneiform,
127     Cypriot,
128     Cyrillic,
129     Deseret,
130     Devanagari,
131     Ethiopic,
132     Georgian,
133     Glagolitic,
134     Gothic,
135     Greek,
136     Gujarati,
137     Gurmukhi,
138     Han,
139     Hangul,
140     Hanunoo,
141     Hebrew,
142     Hiragana,
143     Inherited,
144     Kannada,
145     Katakana,
146     Kharoshthi,
147     Khmer,
148     Lao,
149     Latin,
150     Limbu,
151     Linear_B,
152     Malayalam,
153     Mongolian,
154     Myanmar,
155     New_Tai_Lue,
156     Nko,
157     Ogham,
158     Old_Italic,
159     Old_Persian,
160     Oriya,
161     Osmanya,
162     Phags_Pa,
163     Phoenician,
164     Runic,
165     Shavian,
166     Sinhala,
167     Syloti_Nagri,
168     Syriac,
169     Tagalog,
170     Tagbanwa,
171     Tai_Le,
172     Tamil,
173     Telugu,
174     Thaana,
175     Thai,
176     Tibetan,
177     Tifinagh,
178     Ugaritic,
179     Yi.
180     .
181     .
182     .SH "CHARACTER CLASSES"
183     .rs
184     .sp
185     [...] positive character class
186     [^...] negative character class
187     [x-y] range (can be used for hex characters)
188     [[:xxx:]] positive POSIX named set
189     [[^:xxx:]] negative POSIX named set
190     .sp
191     alnum alphanumeric
192     alpha alphabetic
193     ascii 0-127
194     blank space or tab
195     cntrl control character
196     digit decimal digit
197     graph printing, excluding space
198     lower lower case letter
199     print printing, including space
200     punct printing, excluding alphanumeric
201     space whitespace
202     upper upper case letter
203     word same as \ew
204     xdigit hexadecimal digit
205     .sp
206     In PCRE, POSIX character set names recognize only ASCII characters. You can use
207     \eQ...\eE inside a character class.
208     .
209     .
210     .SH "QUANTIFIERS"
211     .rs
212     .sp
213     ? 0 or 1, greedy
214     ?+ 0 or 1, possessive
215     ?? 0 or 1, lazy
216     * 0 or more, greedy
217     *+ 0 or more, possessive
218     *? 0 or more, lazy
219     + 1 or more, greedy
220     ++ 1 or more, possessive
221     +? 1 or more, lazy
222     {n} exactly n
223     {n,m} at least n, no more than m, greedy
224     {n,m}+ at least n, no more than m, possessive
225     {n,m}? at least n, no more than m, lazy
226     {n,} n or more, greedy
227     {n,}+ n or more, possessive
228     {n,}? n or more, lazy
229     .
230     .
231     .SH "ANCHORS AND SIMPLE ASSERTIONS"
232     .rs
233     .sp
234     \eb word boundary
235     \eB not a word boundary
236     ^ start of subject
237     also after internal newline in multiline mode
238     \eA start of subject
239     $ end of subject
240     also before newline at end of subject
241     also before internal newline in multiline mode
242     \eZ end of subject
243     also before newline at end of subject
244     \ez end of subject
245     \eG first matching position in subject
246     .
247     .
248     .SH "MATCH POINT RESET"
249     .rs
250     .sp
251     \eK reset start of match
252     .
253     .
254     .SH "ALTERNATION"
255     .rs
256     .sp
257     expr|expr|expr...
258     .
259     .
260     .SH "CAPTURING"
261     .rs
262     .sp
263     (...) capturing group
264     (?<name>...) named capturing group (Perl)
265     (?'name'...) named capturing group (Perl)
266     (?P<name>...) named capturing group (Python)
267     (?:...) non-capturing group
268     (?|...) non-capturing group; reset group numbers for
269     capturing groups in each alternative
270     .
271     .
272     .SH "ATOMIC GROUPS"
273     .rs
274     .sp
275     (?>...) atomic, non-capturing group
276     .
277     .
278     .
279     .
280     .SH "COMMENT"
281     .rs
282     .sp
283     (?#....) comment (not nestable)
284     .
285     .
286     .SH "OPTION SETTING"
287     .rs
288     .sp
289     (?i) caseless
290     (?J) allow duplicate names
291     (?m) multiline
292     (?s) single line (dotall)
293     (?U) default ungreedy (lazy)
294     (?x) extended (ignore white space)
295     (?-...) unset option(s)
296     .
297     .
298     .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
299     .rs
300     .sp
301     (?=...) positive look ahead
302     (?!...) negative look ahead
303     (?<=...) positive look behind
304     (?<!...) negative look behind
305     .sp
306     Each top-level branch of a look behind must be of a fixed length.
307     .SH "BACKREFERENCES"
308     .rs
309     .sp
310     \en reference by number (can be ambiguous)
311     \egn reference by number
312     \eg{n} reference by number
313     \eg{-n} relative reference by number
314     \ek<name> reference by name (Perl)
315     \ek'name' reference by name (Perl)
316     \eg{name} reference by name (Perl)
317     \ek{name} reference by name (.NET)
318     (?P=name) reference by name (Python)
319     .
320     .
321     .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
322     .rs
323     .sp
324     (?R) recurse whole pattern
325     (?n) call subpattern by absolute number
326     (?+n) call subpattern by relative number
327     (?-n) call subpattern by relative number
328     (?&name) call subpattern by name (Perl)
329     (?P>name) call subpattern by name (Python)
330     .
331     .
332     .SH "CONDITIONAL PATTERNS"
333     .rs
334     .sp
335     (?(condition)yes-pattern)
336     (?(condition)yes-pattern|no-pattern)
337     .sp
338     (?(n)... absolute reference condition
339     (?(+n)... relative reference condition
340     (?(-n)... relative reference condition
341     (?(<name>)... named reference condition (Perl)
342     (?('name')... named reference condition (Perl)
343     (?(name)... named reference condition (PCRE)
344     (?(R)... overall recursion condition
345     (?(Rn)... specific group recursion condition
346     (?(R&name)... specific recursion condition
347     (?(DEFINE)... define subpattern for reference
348     (?(assert)... assertion condition
349     .
350     .
351 ph10 210 .SH "BACKTRACKING CONTROL"
352     .rs
353     .sp
354 ph10 211 The following act immediately they are reached:
355     .sp
356 ph10 210 (*ACCEPT) force successful match
357     (*FAIL) force backtrack; synonym (*F)
358     .sp
359 ph10 211 The following act only when a subsequent match failure causes a backtrack to
360     reach them. They all force a match failure, but they differ in what happens
361     afterwards. Those that advance the start-of-match point do so only if the
362 ph10 210 pattern is not anchored.
363     .sp
364     (*COMMIT) overall failure, no advance of starting point
365     (*PRUNE) advance to next starting character
366     (*SKIP) advance start to current matching position
367     (*THEN) local failure, backtrack to next alternation
368     .
369     .
370 ph10 227 .SH "NEWLINE CONVENTIONS"
371     .rs
372     .sp
373 ph10 260 These are recognized only at the very start of the pattern or after a
374     (*BSR_...) option.
375 ph10 227 .sp
376     (*CR)
377     (*LF)
378     (*CRLF)
379     (*ANYCRLF)
380     (*ANY)
381     .
382     .
383 ph10 231 .SH "WHAT \eR MATCHES"
384     .rs
385     .sp
386 ph10 260 These are recognized only at the very start of the pattern or after a
387     (*...) option that sets the newline convention.
388 ph10 231 .sp
389     (*BSR_ANYCRLF)
390     (*BSR_UNICODE)
391     .
392     .
393 ph10 208 .SH "CALLOUTS"
394     .rs
395     .sp
396     (?C) callout
397     (?Cn) callout with data n
398     .
399     .
400     .SH "SEE ALSO"
401     .rs
402     .sp
403     \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
404     \fBpcrematching\fP(3), \fBpcre\fP(3).
405     .
406     .
407     .SH AUTHOR
408     .rs
409     .sp
410     .nf
411     Philip Hazel
412     University Computing Service
413     Cambridge CB2 3QH, England.
414     .fi
415     .
416     .
417     .SH REVISION
418     .rs
419     .sp
420     .nf
421 ph10 260 Last updated: 21 September 2007
422 ph10 208 Copyright (c) 1997-2007 University of Cambridge.
423     .fi

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12