| 1 |
ph10 |
208 |
<html> |
| 2 |
|
|
<head> |
| 3 |
|
|
<title>pcresyntax specification</title> |
| 4 |
|
|
</head> |
| 5 |
|
|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> |
| 6 |
|
|
<h1>pcresyntax man page</h1> |
| 7 |
|
|
<p> |
| 8 |
|
|
Return to the <a href="index.html">PCRE index page</a>. |
| 9 |
|
|
</p> |
| 10 |
|
|
<p> |
| 11 |
|
|
This page is part of the PCRE HTML documentation. It was generated automatically |
| 12 |
|
|
from the original man page. If there is any nonsense in it, please consult the |
| 13 |
|
|
man page, in case the conversion went wrong. |
| 14 |
|
|
<br> |
| 15 |
|
|
<ul> |
| 16 |
|
|
<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a> |
| 17 |
|
|
<li><a name="TOC2" href="#SEC2">QUOTING</a> |
| 18 |
|
|
<li><a name="TOC3" href="#SEC3">CHARACTERS</a> |
| 19 |
|
|
<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a> |
| 20 |
ph10 |
518 |
<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a> |
| 21 |
|
|
<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a> |
| 22 |
|
|
<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a> |
| 23 |
|
|
<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a> |
| 24 |
|
|
<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a> |
| 25 |
|
|
<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a> |
| 26 |
|
|
<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a> |
| 27 |
|
|
<li><a name="TOC12" href="#SEC12">ALTERNATION</a> |
| 28 |
|
|
<li><a name="TOC13" href="#SEC13">CAPTURING</a> |
| 29 |
|
|
<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a> |
| 30 |
|
|
<li><a name="TOC15" href="#SEC15">COMMENT</a> |
| 31 |
|
|
<li><a name="TOC16" href="#SEC16">OPTION SETTING</a> |
| 32 |
|
|
<li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a> |
| 33 |
|
|
<li><a name="TOC18" href="#SEC18">BACKREFERENCES</a> |
| 34 |
|
|
<li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a> |
| 35 |
|
|
<li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a> |
| 36 |
|
|
<li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a> |
| 37 |
|
|
<li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a> |
| 38 |
|
|
<li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a> |
| 39 |
|
|
<li><a name="TOC24" href="#SEC24">CALLOUTS</a> |
| 40 |
|
|
<li><a name="TOC25" href="#SEC25">SEE ALSO</a> |
| 41 |
|
|
<li><a name="TOC26" href="#SEC26">AUTHOR</a> |
| 42 |
|
|
<li><a name="TOC27" href="#SEC27">REVISION</a> |
| 43 |
ph10 |
208 |
</ul> |
| 44 |
|
|
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br> |
| 45 |
|
|
<P> |
| 46 |
|
|
The full syntax and semantics of the regular expressions that are supported by |
| 47 |
|
|
PCRE are described in the |
| 48 |
|
|
<a href="pcrepattern.html"><b>pcrepattern</b></a> |
| 49 |
ph10 |
869 |
documentation. This document contains a quick-reference summary of the syntax. |
| 50 |
ph10 |
208 |
</P> |
| 51 |
|
|
<br><a name="SEC2" href="#TOC1">QUOTING</a><br> |
| 52 |
|
|
<P> |
| 53 |
|
|
<pre> |
| 54 |
|
|
\x where x is non-alphanumeric is a literal x |
| 55 |
|
|
\Q...\E treat enclosed characters as literal |
| 56 |
|
|
</PRE> |
| 57 |
|
|
</P> |
| 58 |
|
|
<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br> |
| 59 |
|
|
<P> |
| 60 |
|
|
<pre> |
| 61 |
|
|
\a alarm, that is, the BEL character (hex 07) |
| 62 |
ph10 |
579 |
\cx "control-x", where x is any ASCII character |
| 63 |
ph10 |
208 |
\e escape (hex 1B) |
| 64 |
ph10 |
975 |
\f form feed (hex 0C) |
| 65 |
ph10 |
208 |
\n newline (hex 0A) |
| 66 |
|
|
\r carriage return (hex 0D) |
| 67 |
|
|
\t tab (hex 09) |
| 68 |
|
|
\ddd character with octal code ddd, or backreference |
| 69 |
|
|
\xhh character with hex code hh |
| 70 |
|
|
\x{hhh..} character with hex code hhh.. |
| 71 |
|
|
</PRE> |
| 72 |
|
|
</P> |
| 73 |
|
|
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> |
| 74 |
|
|
<P> |
| 75 |
|
|
<pre> |
| 76 |
|
|
. any character except newline; |
| 77 |
|
|
in dotall mode, any character whatsoever |
| 78 |
ph10 |
869 |
\C one data unit, even in UTF mode (best avoided) |
| 79 |
ph10 |
208 |
\d a decimal digit |
| 80 |
|
|
\D a character that is not a decimal digit |
| 81 |
ph10 |
975 |
\h a horizontal white space character |
| 82 |
|
|
\H a character that is not a horizontal white space character |
| 83 |
ph10 |
535 |
\N a character that is not a newline |
| 84 |
ph10 |
208 |
\p{<i>xx</i>} a character with the <i>xx</i> property |
| 85 |
|
|
\P{<i>xx</i>} a character without the <i>xx</i> property |
| 86 |
|
|
\R a newline sequence |
| 87 |
ph10 |
975 |
\s a white space character |
| 88 |
|
|
\S a character that is not a white space character |
| 89 |
|
|
\v a vertical white space character |
| 90 |
|
|
\V a character that is not a vertical white space character |
| 91 |
ph10 |
208 |
\w a "word" character |
| 92 |
|
|
\W a "non-word" character |
| 93 |
ph10 |
1194 |
\X a Unicode extended grapheme cluster |
| 94 |
ph10 |
208 |
</pre> |
| 95 |
ph10 |
535 |
In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII |
| 96 |
ph10 |
869 |
characters, even in a UTF mode. However, this can be changed by setting the |
| 97 |
ph10 |
535 |
PCRE_UCP option. |
| 98 |
ph10 |
208 |
</P> |
| 99 |
ph10 |
518 |
<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> |
| 100 |
ph10 |
208 |
<P> |
| 101 |
|
|
<pre> |
| 102 |
|
|
C Other |
| 103 |
|
|
Cc Control |
| 104 |
|
|
Cf Format |
| 105 |
|
|
Cn Unassigned |
| 106 |
|
|
Co Private use |
| 107 |
|
|
Cs Surrogate |
| 108 |
|
|
|
| 109 |
|
|
L Letter |
| 110 |
|
|
Ll Lower case letter |
| 111 |
|
|
Lm Modifier letter |
| 112 |
|
|
Lo Other letter |
| 113 |
|
|
Lt Title case letter |
| 114 |
|
|
Lu Upper case letter |
| 115 |
|
|
L& Ll, Lu, or Lt |
| 116 |
|
|
|
| 117 |
|
|
M Mark |
| 118 |
|
|
Mc Spacing mark |
| 119 |
|
|
Me Enclosing mark |
| 120 |
|
|
Mn Non-spacing mark |
| 121 |
|
|
|
| 122 |
|
|
N Number |
| 123 |
|
|
Nd Decimal number |
| 124 |
|
|
Nl Letter number |
| 125 |
|
|
No Other number |
| 126 |
|
|
|
| 127 |
|
|
P Punctuation |
| 128 |
|
|
Pc Connector punctuation |
| 129 |
|
|
Pd Dash punctuation |
| 130 |
|
|
Pe Close punctuation |
| 131 |
|
|
Pf Final punctuation |
| 132 |
|
|
Pi Initial punctuation |
| 133 |
|
|
Po Other punctuation |
| 134 |
|
|
Ps Open punctuation |
| 135 |
|
|
|
| 136 |
|
|
S Symbol |
| 137 |
|
|
Sc Currency symbol |
| 138 |
|
|
Sk Modifier symbol |
| 139 |
|
|
Sm Mathematical symbol |
| 140 |
|
|
So Other symbol |
| 141 |
|
|
|
| 142 |
|
|
Z Separator |
| 143 |
|
|
Zl Line separator |
| 144 |
|
|
Zp Paragraph separator |
| 145 |
|
|
Zs Space separator |
| 146 |
|
|
</PRE> |
| 147 |
|
|
</P> |
| 148 |
ph10 |
518 |
<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br> |
| 149 |
ph10 |
208 |
<P> |
| 150 |
ph10 |
518 |
<pre> |
| 151 |
|
|
Xan Alphanumeric: union of properties L and N |
| 152 |
|
|
Xps POSIX space: property Z or tab, NL, VT, FF, CR |
| 153 |
|
|
Xsp Perl space: property Z or tab, NL, FF, CR |
| 154 |
ph10 |
1335 |
Xuc Univerally-named character: one that can be |
| 155 |
|
|
represented by a Universal Character Name |
| 156 |
ph10 |
535 |
Xwd Perl word: property Xan or underscore |
| 157 |
ph10 |
518 |
</PRE> |
| 158 |
|
|
</P> |
| 159 |
|
|
<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> |
| 160 |
|
|
<P> |
| 161 |
ph10 |
208 |
Arabic, |
| 162 |
|
|
Armenian, |
| 163 |
ph10 |
507 |
Avestan, |
| 164 |
ph10 |
208 |
Balinese, |
| 165 |
ph10 |
507 |
Bamum, |
| 166 |
ph10 |
954 |
Batak, |
| 167 |
ph10 |
208 |
Bengali, |
| 168 |
|
|
Bopomofo, |
| 169 |
ph10 |
954 |
Brahmi, |
| 170 |
ph10 |
208 |
Braille, |
| 171 |
|
|
Buginese, |
| 172 |
|
|
Buhid, |
| 173 |
|
|
Canadian_Aboriginal, |
| 174 |
ph10 |
416 |
Carian, |
| 175 |
ph10 |
954 |
Chakma, |
| 176 |
ph10 |
416 |
Cham, |
| 177 |
ph10 |
208 |
Cherokee, |
| 178 |
|
|
Common, |
| 179 |
|
|
Coptic, |
| 180 |
|
|
Cuneiform, |
| 181 |
|
|
Cypriot, |
| 182 |
|
|
Cyrillic, |
| 183 |
|
|
Deseret, |
| 184 |
|
|
Devanagari, |
| 185 |
ph10 |
507 |
Egyptian_Hieroglyphs, |
| 186 |
ph10 |
208 |
Ethiopic, |
| 187 |
|
|
Georgian, |
| 188 |
|
|
Glagolitic, |
| 189 |
|
|
Gothic, |
| 190 |
|
|
Greek, |
| 191 |
|
|
Gujarati, |
| 192 |
|
|
Gurmukhi, |
| 193 |
|
|
Han, |
| 194 |
|
|
Hangul, |
| 195 |
|
|
Hanunoo, |
| 196 |
|
|
Hebrew, |
| 197 |
|
|
Hiragana, |
| 198 |
ph10 |
507 |
Imperial_Aramaic, |
| 199 |
ph10 |
208 |
Inherited, |
| 200 |
ph10 |
507 |
Inscriptional_Pahlavi, |
| 201 |
|
|
Inscriptional_Parthian, |
| 202 |
|
|
Javanese, |
| 203 |
|
|
Kaithi, |
| 204 |
ph10 |
208 |
Kannada, |
| 205 |
|
|
Katakana, |
| 206 |
ph10 |
416 |
Kayah_Li, |
| 207 |
ph10 |
208 |
Kharoshthi, |
| 208 |
|
|
Khmer, |
| 209 |
|
|
Lao, |
| 210 |
|
|
Latin, |
| 211 |
ph10 |
416 |
Lepcha, |
| 212 |
ph10 |
208 |
Limbu, |
| 213 |
|
|
Linear_B, |
| 214 |
ph10 |
507 |
Lisu, |
| 215 |
ph10 |
416 |
Lycian, |
| 216 |
|
|
Lydian, |
| 217 |
ph10 |
208 |
Malayalam, |
| 218 |
ph10 |
954 |
Mandaic, |
| 219 |
ph10 |
507 |
Meetei_Mayek, |
| 220 |
ph10 |
954 |
Meroitic_Cursive, |
| 221 |
|
|
Meroitic_Hieroglyphs, |
| 222 |
|
|
Miao, |
| 223 |
ph10 |
208 |
Mongolian, |
| 224 |
|
|
Myanmar, |
| 225 |
|
|
New_Tai_Lue, |
| 226 |
|
|
Nko, |
| 227 |
|
|
Ogham, |
| 228 |
|
|
Old_Italic, |
| 229 |
|
|
Old_Persian, |
| 230 |
ph10 |
507 |
Old_South_Arabian, |
| 231 |
|
|
Old_Turkic, |
| 232 |
ph10 |
416 |
Ol_Chiki, |
| 233 |
ph10 |
208 |
Oriya, |
| 234 |
|
|
Osmanya, |
| 235 |
|
|
Phags_Pa, |
| 236 |
|
|
Phoenician, |
| 237 |
ph10 |
416 |
Rejang, |
| 238 |
ph10 |
208 |
Runic, |
| 239 |
ph10 |
507 |
Samaritan, |
| 240 |
ph10 |
416 |
Saurashtra, |
| 241 |
ph10 |
954 |
Sharada, |
| 242 |
ph10 |
208 |
Shavian, |
| 243 |
|
|
Sinhala, |
| 244 |
ph10 |
954 |
Sora_Sompeng, |
| 245 |
ph10 |
507 |
Sundanese, |
| 246 |
ph10 |
208 |
Syloti_Nagri, |
| 247 |
|
|
Syriac, |
| 248 |
|
|
Tagalog, |
| 249 |
|
|
Tagbanwa, |
| 250 |
|
|
Tai_Le, |
| 251 |
ph10 |
507 |
Tai_Tham, |
| 252 |
|
|
Tai_Viet, |
| 253 |
ph10 |
954 |
Takri, |
| 254 |
ph10 |
208 |
Tamil, |
| 255 |
|
|
Telugu, |
| 256 |
|
|
Thaana, |
| 257 |
|
|
Thai, |
| 258 |
|
|
Tibetan, |
| 259 |
|
|
Tifinagh, |
| 260 |
|
|
Ugaritic, |
| 261 |
ph10 |
416 |
Vai, |
| 262 |
ph10 |
208 |
Yi. |
| 263 |
|
|
</P> |
| 264 |
ph10 |
518 |
<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br> |
| 265 |
ph10 |
208 |
<P> |
| 266 |
|
|
<pre> |
| 267 |
|
|
[...] positive character class |
| 268 |
|
|
[^...] negative character class |
| 269 |
|
|
[x-y] range (can be used for hex characters) |
| 270 |
|
|
[[:xxx:]] positive POSIX named set |
| 271 |
ph10 |
286 |
[[:^xxx:]] negative POSIX named set |
| 272 |
ph10 |
208 |
|
| 273 |
|
|
alnum alphanumeric |
| 274 |
|
|
alpha alphabetic |
| 275 |
|
|
ascii 0-127 |
| 276 |
|
|
blank space or tab |
| 277 |
|
|
cntrl control character |
| 278 |
|
|
digit decimal digit |
| 279 |
|
|
graph printing, excluding space |
| 280 |
|
|
lower lower case letter |
| 281 |
|
|
print printing, including space |
| 282 |
|
|
punct printing, excluding alphanumeric |
| 283 |
ph10 |
975 |
space white space |
| 284 |
ph10 |
208 |
upper upper case letter |
| 285 |
|
|
word same as \w |
| 286 |
|
|
xdigit hexadecimal digit |
| 287 |
|
|
</pre> |
| 288 |
ph10 |
535 |
In PCRE, POSIX character set names recognize only ASCII characters by default, |
| 289 |
|
|
but some of them use Unicode properties if PCRE_UCP is set. You can use |
| 290 |
ph10 |
208 |
\Q...\E inside a character class. |
| 291 |
|
|
</P> |
| 292 |
ph10 |
518 |
<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br> |
| 293 |
ph10 |
208 |
<P> |
| 294 |
|
|
<pre> |
| 295 |
|
|
? 0 or 1, greedy |
| 296 |
|
|
?+ 0 or 1, possessive |
| 297 |
|
|
?? 0 or 1, lazy |
| 298 |
|
|
* 0 or more, greedy |
| 299 |
|
|
*+ 0 or more, possessive |
| 300 |
|
|
*? 0 or more, lazy |
| 301 |
|
|
+ 1 or more, greedy |
| 302 |
|
|
++ 1 or more, possessive |
| 303 |
|
|
+? 1 or more, lazy |
| 304 |
|
|
{n} exactly n |
| 305 |
|
|
{n,m} at least n, no more than m, greedy |
| 306 |
|
|
{n,m}+ at least n, no more than m, possessive |
| 307 |
|
|
{n,m}? at least n, no more than m, lazy |
| 308 |
|
|
{n,} n or more, greedy |
| 309 |
|
|
{n,}+ n or more, possessive |
| 310 |
|
|
{n,}? n or more, lazy |
| 311 |
|
|
</PRE> |
| 312 |
|
|
</P> |
| 313 |
ph10 |
518 |
<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br> |
| 314 |
ph10 |
208 |
<P> |
| 315 |
|
|
<pre> |
| 316 |
ph10 |
535 |
\b word boundary |
| 317 |
ph10 |
208 |
\B not a word boundary |
| 318 |
|
|
^ start of subject |
| 319 |
|
|
also after internal newline in multiline mode |
| 320 |
|
|
\A start of subject |
| 321 |
|
|
$ end of subject |
| 322 |
|
|
also before newline at end of subject |
| 323 |
|
|
also before internal newline in multiline mode |
| 324 |
|
|
\Z end of subject |
| 325 |
|
|
also before newline at end of subject |
| 326 |
|
|
\z end of subject |
| 327 |
|
|
\G first matching position in subject |
| 328 |
|
|
</PRE> |
| 329 |
|
|
</P> |
| 330 |
ph10 |
518 |
<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br> |
| 331 |
ph10 |
208 |
<P> |
| 332 |
|
|
<pre> |
| 333 |
|
|
\K reset start of match |
| 334 |
|
|
</PRE> |
| 335 |
|
|
</P> |
| 336 |
ph10 |
518 |
<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br> |
| 337 |
ph10 |
208 |
<P> |
| 338 |
|
|
<pre> |
| 339 |
|
|
expr|expr|expr... |
| 340 |
|
|
</PRE> |
| 341 |
|
|
</P> |
| 342 |
ph10 |
518 |
<br><a name="SEC13" href="#TOC1">CAPTURING</a><br> |
| 343 |
ph10 |
208 |
<P> |
| 344 |
|
|
<pre> |
| 345 |
ph10 |
416 |
(...) capturing group |
| 346 |
|
|
(?<name>...) named capturing group (Perl) |
| 347 |
|
|
(?'name'...) named capturing group (Perl) |
| 348 |
|
|
(?P<name>...) named capturing group (Python) |
| 349 |
|
|
(?:...) non-capturing group |
| 350 |
|
|
(?|...) non-capturing group; reset group numbers for |
| 351 |
|
|
capturing groups in each alternative |
| 352 |
ph10 |
208 |
</PRE> |
| 353 |
|
|
</P> |
| 354 |
ph10 |
518 |
<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br> |
| 355 |
ph10 |
208 |
<P> |
| 356 |
|
|
<pre> |
| 357 |
ph10 |
416 |
(?>...) atomic, non-capturing group |
| 358 |
ph10 |
208 |
</PRE> |
| 359 |
|
|
</P> |
| 360 |
ph10 |
518 |
<br><a name="SEC15" href="#TOC1">COMMENT</a><br> |
| 361 |
ph10 |
208 |
<P> |
| 362 |
|
|
<pre> |
| 363 |
ph10 |
416 |
(?#....) comment (not nestable) |
| 364 |
ph10 |
208 |
</PRE> |
| 365 |
|
|
</P> |
| 366 |
ph10 |
518 |
<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br> |
| 367 |
ph10 |
208 |
<P> |
| 368 |
|
|
<pre> |
| 369 |
ph10 |
416 |
(?i) caseless |
| 370 |
|
|
(?J) allow duplicate names |
| 371 |
|
|
(?m) multiline |
| 372 |
|
|
(?s) single line (dotall) |
| 373 |
|
|
(?U) default ungreedy (lazy) |
| 374 |
|
|
(?x) extended (ignore white space) |
| 375 |
|
|
(?-...) unset option(s) |
| 376 |
|
|
</pre> |
| 377 |
ph10 |
535 |
The following are recognized only at the start of a pattern or after one of the |
| 378 |
ph10 |
416 |
newline-setting options with similar syntax: |
| 379 |
|
|
<pre> |
| 380 |
ph10 |
1320 |
(*LIMIT_MATCH=d) set the match limit to d (decimal number) |
| 381 |
|
|
(*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) |
| 382 |
ph10 |
579 |
(*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) |
| 383 |
ph10 |
869 |
(*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) |
| 384 |
|
|
(*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) |
| 385 |
ph10 |
1194 |
(*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) |
| 386 |
ph10 |
1221 |
(*UTF) set appropriate UTF mode for the library in use |
| 387 |
ph10 |
535 |
(*UCP) set PCRE_UCP (use Unicode properties for \d etc) |
| 388 |
ph10 |
208 |
</PRE> |
| 389 |
|
|
</P> |
| 390 |
ph10 |
518 |
<br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> |
| 391 |
ph10 |
208 |
<P> |
| 392 |
|
|
<pre> |
| 393 |
ph10 |
416 |
(?=...) positive look ahead |
| 394 |
|
|
(?!...) negative look ahead |
| 395 |
|
|
(?<=...) positive look behind |
| 396 |
|
|
(?<!...) negative look behind |
| 397 |
ph10 |
208 |
</pre> |
| 398 |
|
|
Each top-level branch of a look behind must be of a fixed length. |
| 399 |
|
|
</P> |
| 400 |
ph10 |
518 |
<br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br> |
| 401 |
ph10 |
208 |
<P> |
| 402 |
|
|
<pre> |
| 403 |
ph10 |
416 |
\n reference by number (can be ambiguous) |
| 404 |
|
|
\gn reference by number |
| 405 |
|
|
\g{n} reference by number |
| 406 |
|
|
\g{-n} relative reference by number |
| 407 |
|
|
\k<name> reference by name (Perl) |
| 408 |
|
|
\k'name' reference by name (Perl) |
| 409 |
|
|
\g{name} reference by name (Perl) |
| 410 |
|
|
\k{name} reference by name (.NET) |
| 411 |
|
|
(?P=name) reference by name (Python) |
| 412 |
ph10 |
208 |
</PRE> |
| 413 |
|
|
</P> |
| 414 |
ph10 |
518 |
<br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br> |
| 415 |
ph10 |
208 |
<P> |
| 416 |
|
|
<pre> |
| 417 |
ph10 |
416 |
(?R) recurse whole pattern |
| 418 |
|
|
(?n) call subpattern by absolute number |
| 419 |
|
|
(?+n) call subpattern by relative number |
| 420 |
|
|
(?-n) call subpattern by relative number |
| 421 |
|
|
(?&name) call subpattern by name (Perl) |
| 422 |
|
|
(?P>name) call subpattern by name (Python) |
| 423 |
|
|
\g<name> call subpattern by name (Oniguruma) |
| 424 |
|
|
\g'name' call subpattern by name (Oniguruma) |
| 425 |
|
|
\g<n> call subpattern by absolute number (Oniguruma) |
| 426 |
|
|
\g'n' call subpattern by absolute number (Oniguruma) |
| 427 |
|
|
\g<+n> call subpattern by relative number (PCRE extension) |
| 428 |
|
|
\g'+n' call subpattern by relative number (PCRE extension) |
| 429 |
|
|
\g<-n> call subpattern by relative number (PCRE extension) |
| 430 |
|
|
\g'-n' call subpattern by relative number (PCRE extension) |
| 431 |
ph10 |
208 |
</PRE> |
| 432 |
|
|
</P> |
| 433 |
ph10 |
518 |
<br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br> |
| 434 |
ph10 |
208 |
<P> |
| 435 |
|
|
<pre> |
| 436 |
|
|
(?(condition)yes-pattern) |
| 437 |
|
|
(?(condition)yes-pattern|no-pattern) |
| 438 |
|
|
|
| 439 |
ph10 |
416 |
(?(n)... absolute reference condition |
| 440 |
|
|
(?(+n)... relative reference condition |
| 441 |
|
|
(?(-n)... relative reference condition |
| 442 |
|
|
(?(<name>)... named reference condition (Perl) |
| 443 |
|
|
(?('name')... named reference condition (Perl) |
| 444 |
|
|
(?(name)... named reference condition (PCRE) |
| 445 |
|
|
(?(R)... overall recursion condition |
| 446 |
|
|
(?(Rn)... specific group recursion condition |
| 447 |
|
|
(?(R&name)... specific recursion condition |
| 448 |
|
|
(?(DEFINE)... define subpattern for reference |
| 449 |
|
|
(?(assert)... assertion condition |
| 450 |
ph10 |
208 |
</PRE> |
| 451 |
|
|
</P> |
| 452 |
ph10 |
518 |
<br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br> |
| 453 |
ph10 |
208 |
<P> |
| 454 |
ph10 |
211 |
The following act immediately they are reached: |
| 455 |
ph10 |
208 |
<pre> |
| 456 |
ph10 |
416 |
(*ACCEPT) force successful match |
| 457 |
|
|
(*FAIL) force backtrack; synonym (*F) |
| 458 |
ph10 |
869 |
(*MARK:NAME) set name to be passed back; synonym (*:NAME) |
| 459 |
ph10 |
211 |
</pre> |
| 460 |
|
|
The following act only when a subsequent match failure causes a backtrack to |
| 461 |
|
|
reach them. They all force a match failure, but they differ in what happens |
| 462 |
|
|
afterwards. Those that advance the start-of-match point do so only if the |
| 463 |
|
|
pattern is not anchored. |
| 464 |
|
|
<pre> |
| 465 |
ph10 |
416 |
(*COMMIT) overall failure, no advance of starting point |
| 466 |
|
|
(*PRUNE) advance to next starting character |
| 467 |
ph10 |
903 |
(*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE) |
| 468 |
ph10 |
869 |
(*SKIP) advance to current matching position |
| 469 |
|
|
(*SKIP:NAME) advance to position corresponding to an earlier |
| 470 |
ph10 |
903 |
(*MARK:NAME); if not found, the (*SKIP) is ignored |
| 471 |
ph10 |
416 |
(*THEN) local failure, backtrack to next alternation |
| 472 |
ph10 |
903 |
(*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) |
| 473 |
ph10 |
211 |
</PRE> |
| 474 |
|
|
</P> |
| 475 |
ph10 |
518 |
<br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br> |
| 476 |
ph10 |
211 |
<P> |
| 477 |
ph10 |
261 |
These are recognized only at the very start of the pattern or after a |
| 478 |
ph10 |
1194 |
(*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option. |
| 479 |
ph10 |
211 |
<pre> |
| 480 |
ph10 |
416 |
(*CR) carriage return only |
| 481 |
|
|
(*LF) linefeed only |
| 482 |
|
|
(*CRLF) carriage return followed by linefeed |
| 483 |
|
|
(*ANYCRLF) all three of the above |
| 484 |
|
|
(*ANY) any Unicode newline sequence |
| 485 |
ph10 |
227 |
</PRE> |
| 486 |
|
|
</P> |
| 487 |
ph10 |
518 |
<br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br> |
| 488 |
ph10 |
227 |
<P> |
| 489 |
ph10 |
261 |
These are recognized only at the very start of the pattern or after a |
| 490 |
ph10 |
869 |
(*...) option that sets the newline convention or a UTF or UCP mode. |
| 491 |
ph10 |
227 |
<pre> |
| 492 |
ph10 |
416 |
(*BSR_ANYCRLF) CR, LF, or CRLF |
| 493 |
|
|
(*BSR_UNICODE) any Unicode newline sequence |
| 494 |
ph10 |
231 |
</PRE> |
| 495 |
|
|
</P> |
| 496 |
ph10 |
518 |
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> |
| 497 |
ph10 |
231 |
<P> |
| 498 |
|
|
<pre> |
| 499 |
ph10 |
208 |
(?C) callout |
| 500 |
|
|
(?Cn) callout with data n |
| 501 |
|
|
</PRE> |
| 502 |
|
|
</P> |
| 503 |
ph10 |
518 |
<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br> |
| 504 |
ph10 |
208 |
<P> |
| 505 |
|
|
<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), |
| 506 |
|
|
<b>pcrematching</b>(3), <b>pcre</b>(3). |
| 507 |
|
|
</P> |
| 508 |
ph10 |
518 |
<br><a name="SEC26" href="#TOC1">AUTHOR</a><br> |
| 509 |
ph10 |
208 |
<P> |
| 510 |
|
|
Philip Hazel |
| 511 |
|
|
<br> |
| 512 |
|
|
University Computing Service |
| 513 |
|
|
<br> |
| 514 |
|
|
Cambridge CB2 3QH, England. |
| 515 |
|
|
<br> |
| 516 |
|
|
</P> |
| 517 |
ph10 |
518 |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
| 518 |
ph10 |
208 |
<P> |
| 519 |
ph10 |
1320 |
Last updated: 26 April 2013 |
| 520 |
ph10 |
208 |
<br> |
| 521 |
ph10 |
1298 |
Copyright © 1997-2013 University of Cambridge. |
| 522 |
ph10 |
208 |
<br> |
| 523 |
|
|
<p> |
| 524 |
|
|
Return to the <a href="index.html">PCRE index page</a>. |
| 525 |
|
|
</p> |