/[pcre]/code/trunk/doc/html/pcrecompat.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcrecompat.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 77 - (hide annotations) (download) (as text)
Sat Feb 24 21:40:45 2007 UTC (6 years, 3 months ago) by nigel
File MIME type: text/html
File size: 5602 byte(s)
Load pcre-6.0 into code/trunk.

1 nigel 63 <html>
2     <head>
3     <title>pcrecompat specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 nigel 75 <h1>pcrecompat man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10     <p>
11     This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14     <br>
15     <br><b>
16     DIFFERENCES BETWEEN PCRE AND PERL
17     </b><br>
18 nigel 63 <P>
19     This document describes the differences in the ways that PCRE and Perl handle
20     regular expressions. The differences described here are with respect to Perl
21     5.8.
22     </P>
23     <P>
24 nigel 73 1. PCRE does not have full UTF-8 support. Details of what it does have are
25     given in the
26     <a href="pcre.html#utf8support">section on UTF-8 support</a>
27     in the main
28     <a href="pcre.html"><b>pcre</b></a>
29     page.
30     </P>
31     <P>
32     2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
33 nigel 63 them, but they do not mean what you might think. For example, (?!a){3} does
34     not assert that the next three characters are not "a". It just asserts that the
35     next character is not "a" three times.
36     </P>
37     <P>
38 nigel 73 3. Capturing subpatterns that occur inside negative lookahead assertions are
39 nigel 63 counted, but their entries in the offsets vector are never set. Perl sets its
40     numerical variables from any such patterns that are matched before the
41     assertion fails to match something (thereby succeeding), but only if the
42     negative lookahead assertion contains just one branch.
43     </P>
44     <P>
45 nigel 73 4. Though binary zero characters are supported in the subject string, they are
46 nigel 63 not allowed in a pattern string because it is passed as a normal C string,
47 nigel 75 terminated by zero. The escape sequence \0 can be used in the pattern to
48 nigel 63 represent a binary zero.
49     </P>
50     <P>
51 nigel 73 5. The following Perl escape sequences are not supported: \l, \u, \L,
52 nigel 75 \U, and \N. In fact these are implemented by Perl's general string-handling
53     and are not part of its pattern matching engine. If any of these are
54     encountered by PCRE, an error is generated.
55 nigel 63 </P>
56     <P>
57 nigel 75 6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
58     built with Unicode character property support. The properties that can be
59     tested with \p and \P are limited to the general category properties such as
60     Lu and Nd.
61     </P>
62     <P>
63     7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
64 nigel 63 between are treated as literals. This is slightly different from Perl in that $
65     and @ are also handled as literals inside the quotes. In Perl, they cause
66     variable interpolation (but of course PCRE does not have variables). Note the
67     following examples:
68     <pre>
69     Pattern PCRE matches Perl matches
70 nigel 75
71     \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
72 nigel 63 \Qabc\$xyz\E abc\$xyz abc\$xyz
73     \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
74 nigel 75 </pre>
75 nigel 73 The \Q...\E sequence is recognized both inside and outside character classes.
76 nigel 63 </P>
77     <P>
78 nigel 75 8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
79     constructions. However, there is support for recursive patterns using the
80     non-Perl items (?R), (?number), and (?P&#62;name). Also, the PCRE "callout" feature
81     allows an external function to be called during pattern matching. See the
82     <a href="pcrecallout.html"><b>pcrecallout</b></a>
83     documentation for details.
84 nigel 63 </P>
85     <P>
86 nigel 75 9. There are some differences that are concerned with the settings of captured
87 nigel 63 strings when part of a pattern is repeated. For example, matching "aba" against
88     the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
89     </P>
90     <P>
91 nigel 75 10. PCRE provides some extensions to the Perl regular expression facilities:
92     <br>
93     <br>
94 nigel 63 (a) Although lookbehind assertions must match fixed length strings, each
95     alternative branch of a lookbehind assertion can match a different length of
96     string. Perl requires them all to have the same length.
97 nigel 75 <br>
98     <br>
99 nigel 63 (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
100     meta-character matches only at the very end of the string.
101 nigel 75 <br>
102     <br>
103     (c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
104 nigel 63 meaning is faulted.
105 nigel 75 <br>
106     <br>
107 nigel 63 (d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
108     inverted, that is, by default they are not greedy, but if followed by a
109     question mark they are.
110 nigel 75 <br>
111     <br>
112     (e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
113     only at the first matching position in the subject string.
114     <br>
115     <br>
116 nigel 63 (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
117     options for <b>pcre_exec()</b> have no Perl equivalents.
118 nigel 75 <br>
119     <br>
120 nigel 63 (g) The (?R), (?number), and (?P&#62;name) constructs allows for recursive pattern
121     matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
122     support.)
123 nigel 75 <br>
124     <br>
125 nigel 63 (h) PCRE supports named capturing substrings, using the Python syntax.
126 nigel 75 <br>
127     <br>
128 nigel 63 (i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
129     package.
130 nigel 75 <br>
131     <br>
132 nigel 63 (j) The (R) condition, for testing recursion, is a PCRE extension.
133 nigel 75 <br>
134     <br>
135 nigel 63 (k) The callout facility is PCRE-specific.
136 nigel 75 <br>
137     <br>
138     (l) The partial matching facility is PCRE-specific.
139     <br>
140     <br>
141     (m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
142     different hosts that have the other endianness.
143 nigel 77 <br>
144     <br>
145     (n) The alternative matching function (<b>pcre_dfa_exec()</b>) matches in a
146     different way and is not Perl-compatible.
147 nigel 63 </P>
148     <P>
149 nigel 77 Last updated: 28 February 2005
150 nigel 63 <br>
151 nigel 77 Copyright &copy; 1997-2005 University of Cambridge.
152 nigel 75 <p>
153     Return to the <a href="index.html">PCRE index page</a>.
154     </p>

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12