ViewVC logotype

Contents of /code/trunk/doc/html/pcrecompat.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 87 - (hide annotations) (download) (as text)
Sat Feb 24 21:41:21 2007 UTC (8 years, 1 month ago) by nigel
File MIME type: text/html
File size: 5696 byte(s)
Load pcre-6.5 into code/trunk.

1 nigel 63 <html>
2     <head>
3     <title>pcrecompat specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 nigel 75 <h1>pcrecompat man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10     <p>
11     This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14     <br>
15     <br><b>
17     </b><br>
18 nigel 63 <P>
19     This document describes the differences in the ways that PCRE and Perl handle
20     regular expressions. The differences described here are with respect to Perl
21     5.8.
22     </P>
23     <P>
24 nigel 87 1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
25     it does have are given in the
26 nigel 73 <a href="pcre.html#utf8support">section on UTF-8 support</a>
27     in the main
28     <a href="pcre.html"><b>pcre</b></a>
29     page.
30     </P>
31     <P>
32     2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
33 nigel 63 them, but they do not mean what you might think. For example, (?!a){3} does
34     not assert that the next three characters are not "a". It just asserts that the
35     next character is not "a" three times.
36     </P>
37     <P>
38 nigel 73 3. Capturing subpatterns that occur inside negative lookahead assertions are
39 nigel 63 counted, but their entries in the offsets vector are never set. Perl sets its
40     numerical variables from any such patterns that are matched before the
41     assertion fails to match something (thereby succeeding), but only if the
42     negative lookahead assertion contains just one branch.
43     </P>
44     <P>
45 nigel 73 4. Though binary zero characters are supported in the subject string, they are
46 nigel 63 not allowed in a pattern string because it is passed as a normal C string,
47 nigel 75 terminated by zero. The escape sequence \0 can be used in the pattern to
48 nigel 63 represent a binary zero.
49     </P>
50     <P>
51 nigel 73 5. The following Perl escape sequences are not supported: \l, \u, \L,
52 nigel 75 \U, and \N. In fact these are implemented by Perl's general string-handling
53     and are not part of its pattern matching engine. If any of these are
54     encountered by PCRE, an error is generated.
55 nigel 63 </P>
56     <P>
57 nigel 75 6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
58     built with Unicode character property support. The properties that can be
59     tested with \p and \P are limited to the general category properties such as
60 nigel 87 Lu and Nd, script names such as Greek or Han, and the derived properties Any
61     and L&.
62 nigel 75 </P>
63     <P>
64     7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
65 nigel 63 between are treated as literals. This is slightly different from Perl in that $
66     and @ are also handled as literals inside the quotes. In Perl, they cause
67     variable interpolation (but of course PCRE does not have variables). Note the
68     following examples:
69     <pre>
70     Pattern PCRE matches Perl matches
71 nigel 75
72     \Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
73 nigel 63 \Qabc\$xyz\E abc\$xyz abc\$xyz
74     \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
75 nigel 75 </pre>
76 nigel 73 The \Q...\E sequence is recognized both inside and outside character classes.
77 nigel 63 </P>
78     <P>
79 nigel 75 8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
80     constructions. However, there is support for recursive patterns using the
81     non-Perl items (?R), (?number), and (?P&#62;name). Also, the PCRE "callout" feature
82     allows an external function to be called during pattern matching. See the
83     <a href="pcrecallout.html"><b>pcrecallout</b></a>
84     documentation for details.
85 nigel 63 </P>
86     <P>
87 nigel 75 9. There are some differences that are concerned with the settings of captured
88 nigel 63 strings when part of a pattern is repeated. For example, matching "aba" against
89     the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
90     </P>
91     <P>
92 nigel 75 10. PCRE provides some extensions to the Perl regular expression facilities:
93     <br>
94     <br>
95 nigel 63 (a) Although lookbehind assertions must match fixed length strings, each
96     alternative branch of a lookbehind assertion can match a different length of
97     string. Perl requires them all to have the same length.
98 nigel 75 <br>
99     <br>
100 nigel 63 (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
101     meta-character matches only at the very end of the string.
102 nigel 75 <br>
103     <br>
104     (c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
105 nigel 63 meaning is faulted.
106 nigel 75 <br>
107     <br>
108 nigel 63 (d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
109     inverted, that is, by default they are not greedy, but if followed by a
110     question mark they are.
111 nigel 75 <br>
112     <br>
113     (e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
114     only at the first matching position in the subject string.
115     <br>
116     <br>
118     options for <b>pcre_exec()</b> have no Perl equivalents.
119 nigel 75 <br>
120     <br>
121 nigel 63 (g) The (?R), (?number), and (?P&#62;name) constructs allows for recursive pattern
122     matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
123     support.)
124 nigel 75 <br>
125     <br>
126 nigel 63 (h) PCRE supports named capturing substrings, using the Python syntax.
127 nigel 75 <br>
128     <br>
129 nigel 63 (i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
130     package.
131 nigel 75 <br>
132     <br>
133 nigel 63 (j) The (R) condition, for testing recursion, is a PCRE extension.
134 nigel 75 <br>
135     <br>
136 nigel 63 (k) The callout facility is PCRE-specific.
137 nigel 75 <br>
138     <br>
139     (l) The partial matching facility is PCRE-specific.
140     <br>
141     <br>
142     (m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
143     different hosts that have the other endianness.
144 nigel 77 <br>
145     <br>
146     (n) The alternative matching function (<b>pcre_dfa_exec()</b>) matches in a
147     different way and is not Perl-compatible.
148 nigel 63 </P>
149     <P>
150 nigel 87 Last updated: 24 January 2006
151 nigel 63 <br>
152 nigel 87 Copyright &copy; 1997-2006 University of Cambridge.
153 nigel 75 <p>
154     Return to the <a href="index.html">PCRE index page</a>.
155     </p>

ViewVC Help
Powered by ViewVC 1.1.12