/[pcre]/code/trunk/doc/html/pcre.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcre.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 71 by nigel, Sat Feb 24 21:40:24 2007 UTC revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC
# Line 3  Line 3 
3  <title>pcre specification</title>  <title>pcre specification</title>
4  </head>  </head>
5  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6  This HTML document has been generated automatically from the original man page.  <h1>pcre man page</h1>
7  If there is any nonsense in it, please consult the man page, in case the  <p>
8  conversion went wrong.<br>  Return to the <a href="index.html">PCRE index page</a>.
9    </p>
10    <p>
11    This page is part of the PCRE HTML documentation. It was generated automatically
12    from the original man page. If there is any nonsense in it, please consult the
13    man page, in case the conversion went wrong.
14    <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">DESCRIPTION</a>  <li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
17  <li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>  <li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>
18  <li><a name="TOC3" href="#SEC3">LIMITATIONS</a>  <li><a name="TOC3" href="#SEC3">LIMITATIONS</a>
19  <li><a name="TOC4" href="#SEC4">UTF-8 SUPPORT</a>  <li><a name="TOC4" href="#SEC4">UTF-8 AND UNICODE PROPERTY SUPPORT</a>
20  <li><a name="TOC5" href="#SEC5">AUTHOR</a>  <li><a name="TOC5" href="#SEC5">AUTHOR</a>
21  </ul>  </ul>
22  <br><a name="SEC1" href="#TOC1">DESCRIPTION</a><br>  <br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
23  <P>  <P>
24  The PCRE library is a set of functions that implement regular expression  The PCRE library is a set of functions that implement regular expression
25  pattern matching using the same syntax and semantics as Perl, with just a few  pattern matching using the same syntax and semantics as Perl, with just a few
26  differences. The current implementation of PCRE (release 4.x) corresponds  differences. The current implementation of PCRE (release 5.x) corresponds
27  approximately with Perl 5.8, including support for UTF-8 encoded strings.  approximately with Perl 5.8, including support for UTF-8 encoded strings and
28  However, this support has to be explicitly enabled; it is not the default.  Unicode general category properties. However, this support has to be explicitly
29    enabled; it is not the default.
30  </P>  </P>
31  <P>  <P>
32  PCRE is written in C and released as a C library. However, a number of people  PCRE is written in C and released as a C library. A number of people have
33  have written wrappers and interfaces of various kinds. A C++ class is included  written wrappers and interfaces of various kinds. A C++ class is included in
34  in these contributions, which can be found in the <i>Contrib</i> directory at  these contributions, which can be found in the <i>Contrib</i> directory at the
35  the primary FTP site, which is:  primary FTP site, which is:
 </P>  
36  <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>  <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
37    </P>
38  <P>  <P>
39  Details of exactly which Perl regular expression features are and are not  Details of exactly which Perl regular expression features are and are not
40  supported by PCRE are given in separate documents. See the  supported by PCRE are given in separate documents. See the
# Line 41  Some features of PCRE can be included, e Line 48  Some features of PCRE can be included, e
48  built. The  built. The
49  <a href="pcre_config.html"><b>pcre_config()</b></a>  <a href="pcre_config.html"><b>pcre_config()</b></a>
50  function makes it possible for a client to discover which features are  function makes it possible for a client to discover which features are
51  available. Documentation about building PCRE for various operating systems can  available. The features themselves are described in the
52  be found in the <b>README</b> file in the source distribution.  <a href="pcrebuild.html"><b>pcrebuild</b></a>
53    page. Documentation about building PCRE for various operating systems can be
54    found in the <b>README</b> file in the source distribution.
55  </P>  </P>
56  <br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>  <br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>
57  <P>  <P>
58  The user documentation for PCRE has been split up into a number of different  The user documentation for PCRE comprises a number of different sections. In
59  sections. In the "man" format, each of these is a separate "man page". In the  the "man" format, each of these is a separate "man page". In the HTML format,
60  HTML format, each is a separate page, linked from the index page. In the plain  each is a separate page, linked from the index page. In the plain text format,
61  text format, all the sections are concatenated, for ease of searching. The  all the sections are concatenated, for ease of searching. The sections are as
62  sections are as follows:  follows:
 </P>  
 <P>  
63  <pre>  <pre>
64    pcre              this document    pcre              this document
65    pcreapi           details of PCRE's native API    pcreapi           details of PCRE's native API
# Line 60  sections are as follows: Line 67  sections are as follows:
67    pcrecallout       details of the callout feature    pcrecallout       details of the callout feature
68    pcrecompat        discussion of Perl compatibility    pcrecompat        discussion of Perl compatibility
69    pcregrep          description of the <b>pcregrep</b> command    pcregrep          description of the <b>pcregrep</b> command
70    pcrepattern       syntax and semantics of supported    pcrepartial       details of the partial matching facility
71                        regular expressions    pcrepattern       syntax and semantics of supported regular expressions
72    pcreperform       discussion of performance issues    pcreperform       discussion of performance issues
73    pcreposix         the POSIX-compatible API    pcreposix         the POSIX-compatible API
74      pcreprecompile    details of saving and re-using precompiled patterns
75    pcresample        discussion of the sample program    pcresample        discussion of the sample program
76    pcretest          the <b>pcretest</b> testing command    pcretest          description of the <b>pcretest</b> testing command
77  </PRE>  </pre>
 </P>  
 <P>  
78  In addition, in the "man" and HTML formats, there is a short page for each  In addition, in the "man" and HTML formats, there is a short page for each
79  library function, listing its arguments and results.  library function, listing its arguments and results.
80  </P>  </P>
# Line 84  regular expressions that are truly enorm Line 90  regular expressions that are truly enorm
90  internal linkage size of 3 or 4 (see the <b>README</b> file in the source  internal linkage size of 3 or 4 (see the <b>README</b> file in the source
91  distribution and the  distribution and the
92  <a href="pcrebuild.html"><b>pcrebuild</b></a>  <a href="pcrebuild.html"><b>pcrebuild</b></a>
93  documentation for details). If these cases the limit is substantially larger.  documentation for details). In these cases the limit is substantially larger.
94  However, the speed of execution will be slower.  However, the speed of execution will be slower.
95  </P>  </P>
96  <P>  <P>
# Line 101  The maximum length of a subject string i Line 107  The maximum length of a subject string i
107  integer variable can hold. However, PCRE uses recursion to handle subpatterns  integer variable can hold. However, PCRE uses recursion to handle subpatterns
108  and indefinite repetition. This means that the available stack space may limit  and indefinite repetition. This means that the available stack space may limit
109  the size of a subject string that can be processed by certain patterns.  the size of a subject string that can be processed by certain patterns.
110  </P>  <a name="utf8support"></a></P>
111  <a name="utf8support"></a><br><a name="SEC4" href="#TOC1">UTF-8 SUPPORT</a><br>  <br><a name="SEC4" href="#TOC1">UTF-8 AND UNICODE PROPERTY SUPPORT</a><br>
112  <P>  <P>
113  Starting at release 3.3, PCRE has had some support for character strings  From release 3.3, PCRE has had some support for character strings encoded in
114  encoded in the UTF-8 format. For release 4.0 this has been greatly extended to  the UTF-8 format. For release 4.0 this was greatly extended to cover most
115  cover most common requirements.  common requirements, and in release 5.0 additional support for Unicode general
116    category properties was added.
117  </P>  </P>
118  <P>  <P>
119  In order process UTF-8 strings, you must build PCRE to include UTF-8 support in  In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
# Line 122  library will be a bit bigger, but the ad Line 129  library will be a bit bigger, but the ad
129  to testing the PCRE_UTF8 flag in several places, so should not be very large.  to testing the PCRE_UTF8 flag in several places, so should not be very large.
130  </P>  </P>
131  <P>  <P>
132    If PCRE is built with Unicode character property support (which implies UTF-8
133    support), the escape sequences \p{..}, \P{..}, and \X are supported.
134    The available properties that can be tested are limited to the general
135    category properties such as Lu for an upper case letter or Nd for a decimal
136    number. A full list is given in the
137    <a href="pcrepattern.html"><b>pcrepattern</b></a>
138    documentation. The PCRE library is increased in size by about 90K when Unicode
139    property support is included.
140    </P>
141    <P>
142  The following comments apply when PCRE is running in UTF-8 mode:  The following comments apply when PCRE is running in UTF-8 mode:
143  </P>  </P>
144  <P>  <P>
# Line 163  but its use can lead to some strange eff Line 180  but its use can lead to some strange eff
180  7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly  7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
181  test characters of any code value, but the characters that PCRE recognizes as  test characters of any code value, but the characters that PCRE recognizes as
182  digits, spaces, or word characters remain the same set as before, all with  digits, spaces, or word characters remain the same set as before, all with
183  values less than 256.  values less than 256. This remains true even when PCRE includes Unicode
184  </P>  property support, because to do otherwise would slow down PCRE in many common
185  <P>  cases. If you really want to test for a wider sense of, say, "digit", you
186  8. Case-insensitive matching applies only to characters whose values are less  must use Unicode property tests such as \p{Nd}.
187  than 256. PCRE does not support the notion of "case" for higher-valued  </P>
188  characters.  <P>
189  </P>  8. Similarly, characters that match the POSIX named character classes are all
190  <P>  low-valued characters.
191  9. PCRE does not support the use of Unicode tables and properties or the Perl  </P>
192  escapes \p, \P, and \X.  <P>
193    9. Case-insensitive matching applies only to characters whose values are less
194    than 128, unless PCRE is built with Unicode property support. Even when Unicode
195    property support is available, PCRE still uses its own character tables when
196    checking the case of low-valued characters, so as not to degrade performance.
197    The Unicode property information is used only for characters with higher
198    values.
199  </P>  </P>
200  <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
201  <P>  <P>
# Line 183  University Computing Service, Line 206  University Computing Service,
206  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
207  <br>  <br>
208  Phone: +44 1223 334714  Phone: +44 1223 334714
209  </P>  Last updated: 09 September 2004
 <P>  
 Last updated: 20 August 2003  
210  <br>  <br>
211  Copyright &copy; 1997-2003 University of Cambridge.  Copyright &copy; 1997-2004 University of Cambridge.
212    <p>
213    Return to the <a href="index.html">PCRE index page</a>.
214    </p>

Legend:
Removed from v.71  
changed lines
  Added in v.75

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12