| 165 |
.sp |
.sp |
| 166 |
/caseless/i |
/caseless/i |
| 167 |
.sp |
.sp |
| 168 |
The following table shows additional modifiers for setting PCRE options that do |
The following table shows additional modifiers for setting PCRE compile-time |
| 169 |
not correspond to anything in Perl: |
options that do not correspond to anything in Perl: |
| 170 |
.sp |
.sp |
| 171 |
|
\fB/8\fP PCRE_UTF8 |
| 172 |
|
\fB/?\fP PCRE_NO_UTF8_CHECK |
| 173 |
\fB/A\fP PCRE_ANCHORED |
\fB/A\fP PCRE_ANCHORED |
| 174 |
\fB/C\fP PCRE_AUTO_CALLOUT |
\fB/C\fP PCRE_AUTO_CALLOUT |
| 175 |
\fB/E\fP PCRE_DOLLAR_ENDONLY |
\fB/E\fP PCRE_DOLLAR_ENDONLY |
| 177 |
\fB/J\fP PCRE_DUPNAMES |
\fB/J\fP PCRE_DUPNAMES |
| 178 |
\fB/N\fP PCRE_NO_AUTO_CAPTURE |
\fB/N\fP PCRE_NO_AUTO_CAPTURE |
| 179 |
\fB/U\fP PCRE_UNGREEDY |
\fB/U\fP PCRE_UNGREEDY |
| 180 |
|
\fB/W\fP PCRE_UCP |
| 181 |
\fB/X\fP PCRE_EXTRA |
\fB/X\fP PCRE_EXTRA |
| 182 |
\fB/<JS>\fP PCRE_JAVASCRIPT_COMPAT |
\fB/<JS>\fP PCRE_JAVASCRIPT_COMPAT |
| 183 |
\fB/<cr>\fP PCRE_NEWLINE_CR |
\fB/<cr>\fP PCRE_NEWLINE_CR |
| 188 |
\fB/<bsr_anycrlf>\fP PCRE_BSR_ANYCRLF |
\fB/<bsr_anycrlf>\fP PCRE_BSR_ANYCRLF |
| 189 |
\fB/<bsr_unicode>\fP PCRE_BSR_UNICODE |
\fB/<bsr_unicode>\fP PCRE_BSR_UNICODE |
| 190 |
.sp |
.sp |
| 191 |
Those specifying line ending sequences are literal strings as shown, but the |
The modifiers that are enclosed in angle brackets are literal strings as shown, |
| 192 |
letters can be in either case. This example sets multiline matching with CRLF |
including the angle brackets, but the letters can be in either case. This |
| 193 |
as the line ending sequence: |
example sets multiline matching with CRLF as the line ending sequence: |
| 194 |
.sp |
.sp |
| 195 |
/^abc/m<crlf> |
/^abc/m<crlf> |
| 196 |
.sp |
.sp |
| 197 |
Details of the meanings of these PCRE options are given in the |
As well as turning on the PCRE_UTF8 option, the \fB/8\fP modifier also causes |
| 198 |
|
any non-printing characters in output strings to be printed using the |
| 199 |
|
\ex{hh...} notation if they are valid UTF-8 sequences. Full details of the PCRE |
| 200 |
|
options are given in the |
| 201 |
.\" HREF |
.\" HREF |
| 202 |
\fBpcreapi\fP |
\fBpcreapi\fP |
| 203 |
.\" |
.\" |
| 204 |
documentation. |
documentation. |
| 205 |
. |
. |
| 206 |
. |
. |
| 207 |
.SS "Finding all matches in a string" |
.SS "Finding all matches in a string" |
| 230 |
There are yet more modifiers for controlling the way \fBpcretest\fP |
There are yet more modifiers for controlling the way \fBpcretest\fP |
| 231 |
operates. |
operates. |
| 232 |
.P |
.P |
|
The \fB/8\fP modifier causes \fBpcretest\fP to call PCRE with the PCRE_UTF8 |
|
|
option set. This turns on support for UTF-8 character handling in PCRE, |
|
|
provided that it was compiled with this support enabled. This modifier also |
|
|
causes any non-printing characters in output strings to be printed using the |
|
|
\ex{hh...} notation if they are valid UTF-8 sequences. |
|
|
.P |
|
|
If the \fB/?\fP modifier is used with \fB/8\fP, it causes \fBpcretest\fP to |
|
|
call \fBpcre_compile()\fP with the PCRE_NO_UTF8_CHECK option, to suppress the |
|
|
checking of the string for UTF-8 validity. |
|
|
.P |
|
| 233 |
The \fB/+\fP modifier requests that as well as outputting the substring that |
The \fB/+\fP modifier requests that as well as outputting the substring that |
| 234 |
matched the entire pattern, pcretest should in addition output the remainder of |
matched the entire pattern, pcretest should in addition output the remainder of |
| 235 |
the subject string. This is useful for tests where the subject contains |
the subject string. This is useful for tests where the subject contains |
| 282 |
The \fB/M\fP modifier causes the size of memory block used to hold the compiled |
The \fB/M\fP modifier causes the size of memory block used to hold the compiled |
| 283 |
pattern to be output. |
pattern to be output. |
| 284 |
.P |
.P |
|
The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper |
|
|
API rather than its native API. When this is done, all other modifiers except |
|
|
\fB/i\fP, \fB/m\fP, and \fB/+\fP are ignored. REG_ICASE is set if \fB/i\fP is |
|
|
present, and REG_NEWLINE is set if \fB/m\fP is present. The wrapper functions |
|
|
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set. |
|
|
.P |
|
| 285 |
The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the |
The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the |
| 286 |
expression has been compiled, and the results used when the expression is |
expression has been compiled, and the results used when the expression is |
| 287 |
matched. |
matched. |
| 288 |
. |
. |
| 289 |
. |
. |
| 290 |
|
.SS "Using the POSIX wrapper API" |
| 291 |
|
.rs |
| 292 |
|
.sp |
| 293 |
|
The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper |
| 294 |
|
API rather than its native API. When \fB/P\fP is set, the following modifiers |
| 295 |
|
set options for the \fBregcomp()\fP function: |
| 296 |
|
.sp |
| 297 |
|
/i REG_ICASE |
| 298 |
|
/m REG_NEWLINE |
| 299 |
|
/N REG_NOSUB |
| 300 |
|
/s REG_DOTALL ) |
| 301 |
|
/U REG_UNGREEDY ) These options are not part of |
| 302 |
|
/W REG_UCP ) the POSIX standard |
| 303 |
|
/8 REG_UTF8 ) |
| 304 |
|
.sp |
| 305 |
|
The \fB/+\fP modifier works as described above. All other modifiers are |
| 306 |
|
ignored. |
| 307 |
|
. |
| 308 |
|
. |
| 309 |
.SH "DATA LINES" |
.SH "DATA LINES" |
| 310 |
.rs |
.rs |
| 311 |
.sp |
.sp |
| 443 |
the call of \fBpcre_exec()\fP for the line in which it appears. |
the call of \fBpcre_exec()\fP for the line in which it appears. |
| 444 |
.P |
.P |
| 445 |
If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper |
If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper |
| 446 |
API to be used, the only option-setting sequences that have any effect are \eB |
API to be used, the only option-setting sequences that have any effect are \eB, |
| 447 |
and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to |
\eN, and \eZ, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, |
| 448 |
\fBregexec()\fP. |
to be passed to \fBregexec()\fP. |
| 449 |
.P |
.P |
| 450 |
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use |
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use |
| 451 |
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be |
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be |
| 750 |
.rs |
.rs |
| 751 |
.sp |
.sp |
| 752 |
.nf |
.nf |
| 753 |
Last updated: 26 March 2010 |
Last updated: 16 May 2010 |
| 754 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |
| 755 |
.fi |
.fi |