/[pcre]/code/trunk/doc/html/pcresyntax.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcresyntax.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 286 - (hide annotations) (download) (as text)
Mon Dec 17 14:46:11 2007 UTC (6 years, 4 months ago) by ph10
File MIME type: text/html
File size: 12645 byte(s)
Add .gz and .bz2 optional support to pcregrep.

1 ph10 208 <html>
2     <head>
3     <title>pcresyntax specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6     <h1>pcresyntax man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10     <p>
11     This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14     <br>
15     <ul>
16     <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17     <li><a name="TOC2" href="#SEC2">QUOTING</a>
18     <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19     <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20     <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
21     <li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
22     <li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
23     <li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
24     <li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
25     <li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
26     <li><a name="TOC11" href="#SEC11">ALTERNATION</a>
27     <li><a name="TOC12" href="#SEC12">CAPTURING</a>
28     <li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
29     <li><a name="TOC14" href="#SEC14">COMMENT</a>
30     <li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
31     <li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
32     <li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
33     <li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
34     <li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
35 ph10 211 <li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>
36 ph10 227 <li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a>
37 ph10 231 <li><a name="TOC22" href="#SEC22">WHAT \R MATCHES</a>
38     <li><a name="TOC23" href="#SEC23">CALLOUTS</a>
39     <li><a name="TOC24" href="#SEC24">SEE ALSO</a>
40     <li><a name="TOC25" href="#SEC25">AUTHOR</a>
41     <li><a name="TOC26" href="#SEC26">REVISION</a>
42 ph10 208 </ul>
43     <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
44     <P>
45     The full syntax and semantics of the regular expressions that are supported by
46     PCRE are described in the
47     <a href="pcrepattern.html"><b>pcrepattern</b></a>
48     documentation. This document contains just a quick-reference summary of the
49     syntax.
50     </P>
51     <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
52     <P>
53     <pre>
54     \x where x is non-alphanumeric is a literal x
55     \Q...\E treat enclosed characters as literal
56     </PRE>
57     </P>
58     <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
59     <P>
60     <pre>
61     \a alarm, that is, the BEL character (hex 07)
62     \cx "control-x", where x is any character
63     \e escape (hex 1B)
64     \f formfeed (hex 0C)
65     \n newline (hex 0A)
66     \r carriage return (hex 0D)
67     \t tab (hex 09)
68     \ddd character with octal code ddd, or backreference
69     \xhh character with hex code hh
70     \x{hhh..} character with hex code hhh..
71     </PRE>
72     </P>
73     <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
74     <P>
75     <pre>
76     . any character except newline;
77     in dotall mode, any character whatsoever
78     \C one byte, even in UTF-8 mode (best avoided)
79     \d a decimal digit
80     \D a character that is not a decimal digit
81     \h a horizontal whitespace character
82     \H a character that is not a horizontal whitespace character
83     \p{<i>xx</i>} a character with the <i>xx</i> property
84     \P{<i>xx</i>} a character without the <i>xx</i> property
85     \R a newline sequence
86     \s a whitespace character
87     \S a character that is not a whitespace character
88     \v a vertical whitespace character
89     \V a character that is not a vertical whitespace character
90     \w a "word" character
91     \W a "non-word" character
92     \X an extended Unicode sequence
93     </pre>
94     In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
95     </P>
96     <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
97     <P>
98     <pre>
99     C Other
100     Cc Control
101     Cf Format
102     Cn Unassigned
103     Co Private use
104     Cs Surrogate
105    
106     L Letter
107     Ll Lower case letter
108     Lm Modifier letter
109     Lo Other letter
110     Lt Title case letter
111     Lu Upper case letter
112     L& Ll, Lu, or Lt
113    
114     M Mark
115     Mc Spacing mark
116     Me Enclosing mark
117     Mn Non-spacing mark
118    
119     N Number
120     Nd Decimal number
121     Nl Letter number
122     No Other number
123    
124     P Punctuation
125     Pc Connector punctuation
126     Pd Dash punctuation
127     Pe Close punctuation
128     Pf Final punctuation
129     Pi Initial punctuation
130     Po Other punctuation
131     Ps Open punctuation
132    
133     S Symbol
134     Sc Currency symbol
135     Sk Modifier symbol
136     Sm Mathematical symbol
137     So Other symbol
138    
139     Z Separator
140     Zl Line separator
141     Zp Paragraph separator
142     Zs Space separator
143     </PRE>
144     </P>
145     <br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
146     <P>
147     Arabic,
148     Armenian,
149     Balinese,
150     Bengali,
151     Bopomofo,
152     Braille,
153     Buginese,
154     Buhid,
155     Canadian_Aboriginal,
156     Cherokee,
157     Common,
158     Coptic,
159     Cuneiform,
160     Cypriot,
161     Cyrillic,
162     Deseret,
163     Devanagari,
164     Ethiopic,
165     Georgian,
166     Glagolitic,
167     Gothic,
168     Greek,
169     Gujarati,
170     Gurmukhi,
171     Han,
172     Hangul,
173     Hanunoo,
174     Hebrew,
175     Hiragana,
176     Inherited,
177     Kannada,
178     Katakana,
179     Kharoshthi,
180     Khmer,
181     Lao,
182     Latin,
183     Limbu,
184     Linear_B,
185     Malayalam,
186     Mongolian,
187     Myanmar,
188     New_Tai_Lue,
189     Nko,
190     Ogham,
191     Old_Italic,
192     Old_Persian,
193     Oriya,
194     Osmanya,
195     Phags_Pa,
196     Phoenician,
197     Runic,
198     Shavian,
199     Sinhala,
200     Syloti_Nagri,
201     Syriac,
202     Tagalog,
203     Tagbanwa,
204     Tai_Le,
205     Tamil,
206     Telugu,
207     Thaana,
208     Thai,
209     Tibetan,
210     Tifinagh,
211     Ugaritic,
212     Yi.
213     </P>
214     <br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
215     <P>
216     <pre>
217     [...] positive character class
218     [^...] negative character class
219     [x-y] range (can be used for hex characters)
220     [[:xxx:]] positive POSIX named set
221 ph10 286 [[:^xxx:]] negative POSIX named set
222 ph10 208
223     alnum alphanumeric
224     alpha alphabetic
225     ascii 0-127
226     blank space or tab
227     cntrl control character
228     digit decimal digit
229     graph printing, excluding space
230     lower lower case letter
231     print printing, including space
232     punct printing, excluding alphanumeric
233     space whitespace
234     upper upper case letter
235     word same as \w
236     xdigit hexadecimal digit
237     </pre>
238     In PCRE, POSIX character set names recognize only ASCII characters. You can use
239     \Q...\E inside a character class.
240     </P>
241     <br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
242     <P>
243     <pre>
244     ? 0 or 1, greedy
245     ?+ 0 or 1, possessive
246     ?? 0 or 1, lazy
247     * 0 or more, greedy
248     *+ 0 or more, possessive
249     *? 0 or more, lazy
250     + 1 or more, greedy
251     ++ 1 or more, possessive
252     +? 1 or more, lazy
253     {n} exactly n
254     {n,m} at least n, no more than m, greedy
255     {n,m}+ at least n, no more than m, possessive
256     {n,m}? at least n, no more than m, lazy
257     {n,} n or more, greedy
258     {n,}+ n or more, possessive
259     {n,}? n or more, lazy
260     </PRE>
261     </P>
262     <br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
263     <P>
264     <pre>
265     \b word boundary
266     \B not a word boundary
267     ^ start of subject
268     also after internal newline in multiline mode
269     \A start of subject
270     $ end of subject
271     also before newline at end of subject
272     also before internal newline in multiline mode
273     \Z end of subject
274     also before newline at end of subject
275     \z end of subject
276     \G first matching position in subject
277     </PRE>
278     </P>
279     <br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
280     <P>
281     <pre>
282     \K reset start of match
283     </PRE>
284     </P>
285     <br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
286     <P>
287     <pre>
288     expr|expr|expr...
289     </PRE>
290     </P>
291     <br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
292     <P>
293     <pre>
294     (...) capturing group
295     (?&#60;name&#62;...) named capturing group (Perl)
296     (?'name'...) named capturing group (Perl)
297     (?P&#60;name&#62;...) named capturing group (Python)
298     (?:...) non-capturing group
299     (?|...) non-capturing group; reset group numbers for
300     capturing groups in each alternative
301     </PRE>
302     </P>
303     <br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
304     <P>
305     <pre>
306     (?&#62;...) atomic, non-capturing group
307     </PRE>
308     </P>
309     <br><a name="SEC14" href="#TOC1">COMMENT</a><br>
310     <P>
311     <pre>
312     (?#....) comment (not nestable)
313     </PRE>
314     </P>
315     <br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
316     <P>
317     <pre>
318     (?i) caseless
319     (?J) allow duplicate names
320     (?m) multiline
321     (?s) single line (dotall)
322     (?U) default ungreedy (lazy)
323     (?x) extended (ignore white space)
324     (?-...) unset option(s)
325     </PRE>
326     </P>
327     <br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
328     <P>
329     <pre>
330     (?=...) positive look ahead
331     (?!...) negative look ahead
332     (?&#60;=...) positive look behind
333     (?&#60;!...) negative look behind
334     </pre>
335     Each top-level branch of a look behind must be of a fixed length.
336     </P>
337     <br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
338     <P>
339     <pre>
340     \n reference by number (can be ambiguous)
341     \gn reference by number
342     \g{n} reference by number
343     \g{-n} relative reference by number
344     \k&#60;name&#62; reference by name (Perl)
345     \k'name' reference by name (Perl)
346     \g{name} reference by name (Perl)
347     \k{name} reference by name (.NET)
348     (?P=name) reference by name (Python)
349     </PRE>
350     </P>
351     <br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
352     <P>
353     <pre>
354     (?R) recurse whole pattern
355     (?n) call subpattern by absolute number
356     (?+n) call subpattern by relative number
357     (?-n) call subpattern by relative number
358     (?&name) call subpattern by name (Perl)
359     (?P&#62;name) call subpattern by name (Python)
360     </PRE>
361     </P>
362     <br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
363     <P>
364     <pre>
365     (?(condition)yes-pattern)
366     (?(condition)yes-pattern|no-pattern)
367    
368     (?(n)... absolute reference condition
369     (?(+n)... relative reference condition
370     (?(-n)... relative reference condition
371     (?(&#60;name&#62;)... named reference condition (Perl)
372     (?('name')... named reference condition (Perl)
373     (?(name)... named reference condition (PCRE)
374     (?(R)... overall recursion condition
375     (?(Rn)... specific group recursion condition
376     (?(R&name)... specific recursion condition
377     (?(DEFINE)... define subpattern for reference
378     (?(assert)... assertion condition
379     </PRE>
380     </P>
381 ph10 211 <br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>
382 ph10 208 <P>
383 ph10 211 The following act immediately they are reached:
384 ph10 208 <pre>
385 ph10 211 (*ACCEPT) force successful match
386     (*FAIL) force backtrack; synonym (*F)
387     </pre>
388     The following act only when a subsequent match failure causes a backtrack to
389     reach them. They all force a match failure, but they differ in what happens
390     afterwards. Those that advance the start-of-match point do so only if the
391     pattern is not anchored.
392     <pre>
393     (*COMMIT) overall failure, no advance of starting point
394     (*PRUNE) advance to next starting character
395     (*SKIP) advance start to current matching position
396     (*THEN) local failure, backtrack to next alternation
397     </PRE>
398     </P>
399 ph10 227 <br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br>
400 ph10 211 <P>
401 ph10 261 These are recognized only at the very start of the pattern or after a
402     (*BSR_...) option.
403 ph10 211 <pre>
404 ph10 227 (*CR)
405     (*LF)
406     (*CRLF)
407     (*ANYCRLF)
408     (*ANY)
409     </PRE>
410     </P>
411 ph10 231 <br><a name="SEC22" href="#TOC1">WHAT \R MATCHES</a><br>
412 ph10 227 <P>
413 ph10 261 These are recognized only at the very start of the pattern or after a
414     (*...) option that sets the newline convention.
415 ph10 227 <pre>
416 ph10 231 (*BSR_ANYCRLF)
417     (*BSR_UNICODE)
418     </PRE>
419     </P>
420     <br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
421     <P>
422     <pre>
423 ph10 208 (?C) callout
424     (?Cn) callout with data n
425     </PRE>
426     </P>
427 ph10 231 <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
428 ph10 208 <P>
429     <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
430     <b>pcrematching</b>(3), <b>pcre</b>(3).
431     </P>
432 ph10 231 <br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
433 ph10 208 <P>
434     Philip Hazel
435     <br>
436     University Computing Service
437     <br>
438     Cambridge CB2 3QH, England.
439     <br>
440     </P>
441 ph10 231 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
442 ph10 208 <P>
443 ph10 286 Last updated: 14 November 2007
444 ph10 208 <br>
445     Copyright &copy; 1997-2007 University of Cambridge.
446     <br>
447     <p>
448     Return to the <a href="index.html">PCRE index page</a>.
449     </p>

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12