/[pcre]/code/trunk/doc/pcre.3
ViewVC logotype

Diff of /code/trunk/doc/pcre.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 74 by nigel, Sat Feb 24 21:40:24 2007 UTC revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC
# Line 1  Line 1 
1  .TH PCRE 3  .TH PCRE 3
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .SH DESCRIPTION  .SH INTRODUCTION
5  .rs  .rs
6  .sp  .sp
7  The PCRE library is a set of functions that implement regular expression  The PCRE library is a set of functions that implement regular expression
8  pattern matching using the same syntax and semantics as Perl, with just a few  pattern matching using the same syntax and semantics as Perl, with just a few
9  differences. The current implementation of PCRE (release 4.x) corresponds  differences. The current implementation of PCRE (release 5.x) corresponds
10  approximately with Perl 5.8, including support for UTF-8 encoded strings.  approximately with Perl 5.8, including support for UTF-8 encoded strings and
11  However, this support has to be explicitly enabled; it is not the default.  Unicode general category properties. However, this support has to be explicitly
12    enabled; it is not the default.
13  PCRE is written in C and released as a C library. However, a number of people  .P
14  have written wrappers and interfaces of various kinds. A C++ class is included  PCRE is written in C and released as a C library. A number of people have
15  in these contributions, which can be found in the \fIContrib\fR directory at  written wrappers and interfaces of various kinds. A C++ class is included in
16  the primary FTP site, which is:  these contributions, which can be found in the \fIContrib\fR directory at the
17    primary FTP site, which is:
18    .sp
19  .\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">  .\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
20  .\" </a>  .\" </a>
21  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
22    .P
23  Details of exactly which Perl regular expression features are and are not  Details of exactly which Perl regular expression features are and are not
24  supported by PCRE are given in separate documents. See the  supported by PCRE are given in separate documents. See the
25  .\" HREF  .\" HREF
# Line 29  and Line 30  and
30  \fBpcrecompat\fR  \fBpcrecompat\fR
31  .\"  .\"
32  pages.  pages.
33    .P
34  Some features of PCRE can be included, excluded, or changed when the library is  Some features of PCRE can be included, excluded, or changed when the library is
35  built. The  built. The
36  .\" HREF  .\" HREF
37  \fBpcre_config()\fR  \fBpcre_config()\fR
38  .\"  .\"
39  function makes it possible for a client to discover which features are  function makes it possible for a client to discover which features are
40  available. Documentation about building PCRE for various operating systems can  available. The features themselves are described in the
41  be found in the \fBREADME\fR file in the source distribution.  .\" HREF
42    \fBpcrebuild\fP
43  .SH USER DOCUMENTATION  .\"
44    page. Documentation about building PCRE for various operating systems can be
45    found in the \fBREADME\fP file in the source distribution.
46    .
47    .
48    .SH "USER DOCUMENTATION"
49  .rs  .rs
50  .sp  .sp
51  The user documentation for PCRE has been split up into a number of different  The user documentation for PCRE comprises a number of different sections. In
52  sections. In the "man" format, each of these is a separate "man page". In the  the "man" format, each of these is a separate "man page". In the HTML format,
53  HTML format, each is a separate page, linked from the index page. In the plain  each is a separate page, linked from the index page. In the plain text format,
54  text format, all the sections are concatenated, for ease of searching. The  all the sections are concatenated, for ease of searching. The sections are as
55  sections are as follows:  follows:
56    .sp
57    pcre              this document    pcre              this document
58    pcreapi           details of PCRE's native API    pcreapi           details of PCRE's native API
59    pcrebuild         options for building PCRE    pcrebuild         options for building PCRE
60    pcrecallout       details of the callout feature    pcrecallout       details of the callout feature
61    pcrecompat        discussion of Perl compatibility    pcrecompat        discussion of Perl compatibility
62    pcregrep          description of the \fBpcregrep\fR command    pcregrep          description of the \fBpcregrep\fP command
63      pcrepartial       details of the partial matching facility
64    .\" JOIN
65    pcrepattern       syntax and semantics of supported    pcrepattern       syntax and semantics of supported
66                        regular expressions                        regular expressions
67    pcreperform       discussion of performance issues    pcreperform       discussion of performance issues
68    pcreposix         the POSIX-compatible API    pcreposix         the POSIX-compatible API
69      pcreprecompile    details of saving and re-using precompiled patterns
70    pcresample        discussion of the sample program    pcresample        discussion of the sample program
71    pcretest          the \fBpcretest\fR testing command    pcretest          description of the \fBpcretest\fP testing command
72    .sp
73  In addition, in the "man" and HTML formats, there is a short page for each  In addition, in the "man" and HTML formats, there is a short page for each
74  library function, listing its arguments and results.  library function, listing its arguments and results.
75    .
76    .
77  .SH LIMITATIONS  .SH LIMITATIONS
78  .rs  .rs
79  .sp  .sp
80  There are some size limitations in PCRE but it is hoped that they will never in  There are some size limitations in PCRE but it is hoped that they will never in
81  practice be relevant.  practice be relevant.
82    .P
83  The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is  The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
84  compiled with the default internal linkage size of 2. If you want to process  compiled with the default internal linkage size of 2. If you want to process
85  regular expressions that are truly enormous, you can compile PCRE with an  regular expressions that are truly enormous, you can compile PCRE with an
86  internal linkage size of 3 or 4 (see the \fBREADME\fR file in the source  internal linkage size of 3 or 4 (see the \fBREADME\fP file in the source
87  distribution and the  distribution and the
88  .\" HREF  .\" HREF
89  \fBpcrebuild\fR  \fBpcrebuild\fP
90  .\"  .\"
91  documentation for details). If these cases the limit is substantially larger.  documentation for details). In these cases the limit is substantially larger.
92  However, the speed of execution will be slower.  However, the speed of execution will be slower.
93    .P
94  All values in repeating quantifiers must be less than 65536.  All values in repeating quantifiers must be less than 65536.
95  The maximum number of capturing subpatterns is 65535.  The maximum number of capturing subpatterns is 65535.
96    .P
97  There is no limit to the number of non-capturing subpatterns, but the maximum  There is no limit to the number of non-capturing subpatterns, but the maximum
98  depth of nesting of all kinds of parenthesized subpattern, including capturing  depth of nesting of all kinds of parenthesized subpattern, including capturing
99  subpatterns, assertions, and other types of subpattern, is 200.  subpatterns, assertions, and other types of subpattern, is 200.
100    .P
101  The maximum length of a subject string is the largest positive number that an  The maximum length of a subject string is the largest positive number that an
102  integer variable can hold. However, PCRE uses recursion to handle subpatterns  integer variable can hold. However, PCRE uses recursion to handle subpatterns
103  and indefinite repetition. This means that the available stack space may limit  and indefinite repetition. This means that the available stack space may limit
104  the size of a subject string that can be processed by certain patterns.  the size of a subject string that can be processed by certain patterns.
105    .sp
106  .\" HTML <a name="utf8support"></a>  .\" HTML <a name="utf8support"></a>
107  .SH UTF-8 SUPPORT  .
108    .
109    .SH "UTF-8 AND UNICODE PROPERTY SUPPORT"
110  .rs  .rs
111  .sp  .sp
112  Starting at release 3.3, PCRE has had some support for character strings  From release 3.3, PCRE has had some support for character strings encoded in
113  encoded in the UTF-8 format. For release 4.0 this has been greatly extended to  the UTF-8 format. For release 4.0 this was greatly extended to cover most
114  cover most common requirements.  common requirements, and in release 5.0 additional support for Unicode general
115    category properties was added.
116    .P
117  In order process UTF-8 strings, you must build PCRE to include UTF-8 support in  In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
118  the code, and, in addition, you must call  the code, and, in addition, you must call
119  .\" HREF  .\" HREF
120  \fBpcre_compile()\fR  \fBpcre_compile()\fP
121  .\"  .\"
122  with the PCRE_UTF8 option flag. When you do this, both the pattern and any  with the PCRE_UTF8 option flag. When you do this, both the pattern and any
123  subject strings that are matched against it are treated as UTF-8 strings  subject strings that are matched against it are treated as UTF-8 strings
124  instead of just strings of bytes.  instead of just strings of bytes.
125    .P
126  If you compile PCRE with UTF-8 support, but do not use it at run time, the  If you compile PCRE with UTF-8 support, but do not use it at run time, the
127  library will be a bit bigger, but the additional run time overhead is limited  library will be a bit bigger, but the additional run time overhead is limited
128  to testing the PCRE_UTF8 flag in several places, so should not be very large.  to testing the PCRE_UTF8 flag in several places, so should not be very large.
129    .P
130    If PCRE is built with Unicode character property support (which implies UTF-8
131    support), the escape sequences \ep{..}, \eP{..}, and \eX are supported.
132    The available properties that can be tested are limited to the general
133    category properties such as Lu for an upper case letter or Nd for a decimal
134    number. A full list is given in the
135    .\" HREF
136    \fBpcrepattern\fP
137    .\"
138    documentation. The PCRE library is increased in size by about 90K when Unicode
139    property support is included.
140    .P
141  The following comments apply when PCRE is running in UTF-8 mode:  The following comments apply when PCRE is running in UTF-8 mode:
142    .P
143  1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects  1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
144  are checked for validity on entry to the relevant functions. If an invalid  are checked for validity on entry to the relevant functions. If an invalid
145  UTF-8 string is passed, an error return is given. In some situations, you may  UTF-8 string is passed, an error return is given. In some situations, you may
# Line 126  is given (respectively) contains only va Line 150  is given (respectively) contains only va
150  not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to  not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
151  PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program  PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
152  may crash.  may crash.
153    .P
154  2. In a pattern, the escape sequence \\x{...}, where the contents of the braces  2. In a pattern, the escape sequence \ex{...}, where the contents of the braces
155  is a string of hexadecimal digits, is interpreted as a UTF-8 character whose  is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
156  code number is the given hexadecimal number, for example: \\x{1234}. If a  code number is the given hexadecimal number, for example: \ex{1234}. If a
157  non-hexadecimal digit appears between the braces, the item is not recognized.  non-hexadecimal digit appears between the braces, the item is not recognized.
158  This escape sequence can be used either as a literal, or within a character  This escape sequence can be used either as a literal, or within a character
159  class.  class.
160    .P
161  3. The original hexadecimal escape sequence, \\xhh, matches a two-byte UTF-8  3. The original hexadecimal escape sequence, \exhh, matches a two-byte UTF-8
162  character if the value is greater than 127.  character if the value is greater than 127.
163    .P
164  4. Repeat quantifiers apply to complete UTF-8 characters, not to individual  4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
165  bytes, for example: \\x{100}{3}.  bytes, for example: \ex{100}{3}.
166    .P
167  5. The dot metacharacter matches one UTF-8 character instead of a single byte.  5. The dot metacharacter matches one UTF-8 character instead of a single byte.
168    .P
169  6. The escape sequence \\C can be used to match a single byte in UTF-8 mode,  6. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
170  but its use can lead to some strange effects.  but its use can lead to some strange effects.
171    .P
172  7. The character escapes \\b, \\B, \\d, \\D, \\s, \\S, \\w, and \\W correctly  7. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
173  test characters of any code value, but the characters that PCRE recognizes as  test characters of any code value, but the characters that PCRE recognizes as
174  digits, spaces, or word characters remain the same set as before, all with  digits, spaces, or word characters remain the same set as before, all with
175  values less than 256.  values less than 256. This remains true even when PCRE includes Unicode
176    property support, because to do otherwise would slow down PCRE in many common
177  8. Case-insensitive matching applies only to characters whose values are less  cases. If you really want to test for a wider sense of, say, "digit", you
178  than 256. PCRE does not support the notion of "case" for higher-valued  must use Unicode property tests such as \ep{Nd}.
179  characters.  .P
180    8. Similarly, characters that match the POSIX named character classes are all
181  9. PCRE does not support the use of Unicode tables and properties or the Perl  low-valued characters.
182  escapes \\p, \\P, and \\X.  .P
183    9. Case-insensitive matching applies only to characters whose values are less
184    than 128, unless PCRE is built with Unicode property support. Even when Unicode
185    property support is available, PCRE still uses its own character tables when
186    checking the case of low-valued characters, so as not to degrade performance.
187    The Unicode property information is used only for characters with higher
188    values.
189    .
190  .SH AUTHOR  .SH AUTHOR
191  .rs  .rs
192  .sp  .sp
# Line 167  University Computing Service, Line 197  University Computing Service,
197  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
198  .br  .br
199  Phone: +44 1223 334714  Phone: +44 1223 334714
200    .sp
201  .in 0  .in 0
202  Last updated: 20 August 2003  Last updated: 09 September 2004
203  .br  .br
204  Copyright (c) 1997-2003 University of Cambridge.  Copyright (c) 1997-2004 University of Cambridge.

Legend:
Removed from v.74  
changed lines
  Added in v.75

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12