/[pcre]/code/trunk/doc/html/pcrebuild.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrebuild.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 691 by ph10, Sun Sep 11 14:31:21 2011 UTC revision 869 by ph10, Sat Jan 14 11:16:23 2012 UTC
# Line 14  man page, in case the conversion went wr Line 14  man page, in case the conversion went wr
14  <br>  <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>  <li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
17  <li><a name="TOC2" href="#SEC2">BUILDING SHARED AND STATIC LIBRARIES</a>  <li><a name="TOC2" href="#SEC2">BUILDING 8-BIT and 16-BIT LIBRARIES</a>
18  <li><a name="TOC3" href="#SEC3">C++ SUPPORT</a>  <li><a name="TOC3" href="#SEC3">BUILDING SHARED AND STATIC LIBRARIES</a>
19  <li><a name="TOC4" href="#SEC4">UTF-8 SUPPORT</a>  <li><a name="TOC4" href="#SEC4">C++ SUPPORT</a>
20  <li><a name="TOC5" href="#SEC5">UNICODE CHARACTER PROPERTY SUPPORT</a>  <li><a name="TOC5" href="#SEC5">UTF-8 and UTF-16 SUPPORT</a>
21  <li><a name="TOC6" href="#SEC6">JUST-IN-TIME COMPILER SUPPORT</a>  <li><a name="TOC6" href="#SEC6">UNICODE CHARACTER PROPERTY SUPPORT</a>
22  <li><a name="TOC7" href="#SEC7">CODE VALUE OF NEWLINE</a>  <li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
23  <li><a name="TOC8" href="#SEC8">WHAT \R MATCHES</a>  <li><a name="TOC8" href="#SEC8">CODE VALUE OF NEWLINE</a>
24  <li><a name="TOC9" href="#SEC9">POSIX MALLOC USAGE</a>  <li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
25  <li><a name="TOC10" href="#SEC10">HANDLING VERY LARGE PATTERNS</a>  <li><a name="TOC10" href="#SEC10">POSIX MALLOC USAGE</a>
26  <li><a name="TOC11" href="#SEC11">AVOIDING EXCESSIVE STACK USAGE</a>  <li><a name="TOC11" href="#SEC11">HANDLING VERY LARGE PATTERNS</a>
27  <li><a name="TOC12" href="#SEC12">LIMITING PCRE RESOURCE USAGE</a>  <li><a name="TOC12" href="#SEC12">AVOIDING EXCESSIVE STACK USAGE</a>
28  <li><a name="TOC13" href="#SEC13">CREATING CHARACTER TABLES AT BUILD TIME</a>  <li><a name="TOC13" href="#SEC13">LIMITING PCRE RESOURCE USAGE</a>
29  <li><a name="TOC14" href="#SEC14">USING EBCDIC CODE</a>  <li><a name="TOC14" href="#SEC14">CREATING CHARACTER TABLES AT BUILD TIME</a>
30  <li><a name="TOC15" href="#SEC15">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>  <li><a name="TOC15" href="#SEC15">USING EBCDIC CODE</a>
31  <li><a name="TOC16" href="#SEC16">PCREGREP BUFFER SIZE</a>  <li><a name="TOC16" href="#SEC16">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
32  <li><a name="TOC17" href="#SEC17">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>  <li><a name="TOC17" href="#SEC17">PCREGREP BUFFER SIZE</a>
33  <li><a name="TOC18" href="#SEC18">SEE ALSO</a>  <li><a name="TOC18" href="#SEC18">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
34  <li><a name="TOC19" href="#SEC19">AUTHOR</a>  <li><a name="TOC19" href="#SEC19">SEE ALSO</a>
35  <li><a name="TOC20" href="#SEC20">REVISION</a>  <li><a name="TOC20" href="#SEC20">AUTHOR</a>
36    <li><a name="TOC21" href="#SEC21">REVISION</a>
37  </ul>  </ul>
38  <br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>  <br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
39  <P>  <P>
# Line 63  The following sections include descripti Line 64  The following sections include descripti
64  --enable and --disable always come in pairs, so the complementary option always  --enable and --disable always come in pairs, so the complementary option always
65  exists as well, but as it specifies the default, it is not described.  exists as well, but as it specifies the default, it is not described.
66  </P>  </P>
67  <br><a name="SEC2" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>  <br><a name="SEC2" href="#TOC1">BUILDING 8-BIT and 16-BIT LIBRARIES</a><br>
68    <P>
69    By default, a library called <b>libpcre</b> is built, containing functions that
70    take string arguments contained in vectors of bytes, either as single-byte
71    characters, or interpreted as UTF-8 strings. You can also build a separate
72    library, called <b>libpcre16</b>, in which strings are contained in vectors of
73    16-bit data units and interpreted either as single-unit characters or UTF-16
74    strings, by adding
75    <pre>
76      --enable-pcre16
77    </pre>
78    to the <b>configure</b> command. If you do not want the 8-bit library, add
79    <pre>
80      --disable-pcre8
81    </pre>
82    as well. At least one of the two libraries must be built. Note that the C++ and
83    POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is an
84    8-bit program. None of these are built if you select only the 16-bit library.
85    </P>
86    <br><a name="SEC3" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
87  <P>  <P>
88  The PCRE building process uses <b>libtool</b> to build both shared and static  The PCRE building process uses <b>libtool</b> to build both shared and static
89  Unix libraries by default. You can suppress one of these by adding one of  Unix libraries by default. You can suppress one of these by adding one of
# Line 73  Unix libraries by default. You can suppr Line 93  Unix libraries by default. You can suppr
93  </pre>  </pre>
94  to the <b>configure</b> command, as required.  to the <b>configure</b> command, as required.
95  </P>  </P>
96  <br><a name="SEC3" href="#TOC1">C++ SUPPORT</a><br>  <br><a name="SEC4" href="#TOC1">C++ SUPPORT</a><br>
97  <P>  <P>
98  By default, the <b>configure</b> script will search for a C++ compiler and C++  By default, if the 8-bit library is being built, the <b>configure</b> script
99  header files. If it finds them, it automatically builds the C++ wrapper library  will search for a C++ compiler and C++ header files. If it finds them, it
100  for PCRE. You can disable this by adding  automatically builds the C++ wrapper library (which supports only 8-bit
101    strings). You can disable this by adding
102  <pre>  <pre>
103    --disable-cpp    --disable-cpp
104  </pre>  </pre>
105  to the <b>configure</b> command.  to the <b>configure</b> command.
106  </P>  </P>
107  <br><a name="SEC4" href="#TOC1">UTF-8 SUPPORT</a><br>  <br><a name="SEC5" href="#TOC1">UTF-8 and UTF-16 SUPPORT</a><br>
108  <P>  <P>
109  To build PCRE with support for UTF-8 Unicode character strings, add  To build PCRE with support for UTF Unicode character strings, add
110  <pre>  <pre>
111    --enable-utf8    --enable-utf
112  </pre>  </pre>
113  to the <b>configure</b> command. Of itself, this does not make PCRE treat  to the <b>configure</b> command. This setting applies to both libraries, adding
114  strings as UTF-8. As well as compiling PCRE with this option, you also have  support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
115  have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>  library. It is not possible to build one library with UTF support and the other
116  or <b>pcre_compile2()</b> functions.  without in the same configuration. (For backwards compatibility, --enable-utf8
117    is a synonym of --enable-utf.)
118    </P>
119    <P>
120    Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
121    well as compiling PCRE with this option, you also have have to set the
122    PCRE_UTF8 or PCRE_UTF16 option when you call one of the pattern compiling
123    functions.
124  </P>  </P>
125  <P>  <P>
126  If you set --enable-utf8 when compiling in an EBCDIC environment, PCRE expects  If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
127  its input to be either ASCII or UTF-8 (depending on the runtime option). It is  its input to be either ASCII or UTF-8 (depending on the runtime option). It is
128  not possible to support both EBCDIC and UTF-8 codes in the same version of the  not possible to support both EBCDIC and UTF-8 codes in the same version of the
129  library. Consequently, --enable-utf8 and --enable-ebcdic are mutually  library. Consequently, --enable-utf and --enable-ebcdic are mutually
130  exclusive.  exclusive.
131  </P>  </P>
132  <br><a name="SEC5" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>  <br><a name="SEC6" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
133  <P>  <P>
134  UTF-8 support allows PCRE to process character values greater than 255 in the  UTF support allows the libraries to process character codepoints up to 0x10ffff
135  strings that it handles. On its own, however, it does not provide any  in the strings that they handle. On its own, however, it does not provide any
136  facilities for accessing the properties of such characters. If you want to be  facilities for accessing the properties of such characters. If you want to be
137  able to use the pattern escapes \P, \p, and \X, which refer to Unicode  able to use the pattern escapes \P, \p, and \X, which refer to Unicode
138  character properties, you must add  character properties, you must add
139  <pre>  <pre>
140    --enable-unicode-properties    --enable-unicode-properties
141  </pre>  </pre>
142  to the <b>configure</b> command. This implies UTF-8 support, even if you have  to the <b>configure</b> command. This implies UTF support, even if you have
143  not explicitly requested it.  not explicitly requested it.
144  </P>  </P>
145  <P>  <P>
# Line 121  supported. Details are given in the Line 149  supported. Details are given in the
149  <a href="pcrepattern.html"><b>pcrepattern</b></a>  <a href="pcrepattern.html"><b>pcrepattern</b></a>
150  documentation.  documentation.
151  </P>  </P>
152  <br><a name="SEC6" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>  <br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
153  <P>  <P>
154  Just-in-time compiler support is included in the build by specifying  Just-in-time compiler support is included in the build by specifying
155  <pre>  <pre>
# Line 138  pcregrep automatically makes use of it, Line 166  pcregrep automatically makes use of it,
166  </pre>  </pre>
167  to the "configure" command.  to the "configure" command.
168  </P>  </P>
169  <br><a name="SEC7" href="#TOC1">CODE VALUE OF NEWLINE</a><br>  <br><a name="SEC8" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
170  <P>  <P>
171  By default, PCRE interprets the linefeed (LF) character as indicating the end  By default, PCRE interprets the linefeed (LF) character as indicating the end
172  of a line. This is the normal newline character on Unix-like systems. You can  of a line. This is the normal newline character on Unix-like systems. You can
# Line 171  Whatever line ending convention is selec Line 199  Whatever line ending convention is selec
199  overridden when the library functions are called. At build time it is  overridden when the library functions are called. At build time it is
200  conventional to use the standard for your operating system.  conventional to use the standard for your operating system.
201  </P>  </P>
202  <br><a name="SEC8" href="#TOC1">WHAT \R MATCHES</a><br>  <br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
203  <P>  <P>
204  By default, the sequence \R in a pattern matches any Unicode newline sequence,  By default, the sequence \R in a pattern matches any Unicode newline sequence,
205  whatever has been selected as the line ending sequence. If you specify  whatever has been selected as the line ending sequence. If you specify
# Line 182  the default is changed so that \R matche Line 210  the default is changed so that \R matche
210  selected when PCRE is built can be overridden when the library functions are  selected when PCRE is built can be overridden when the library functions are
211  called.  called.
212  </P>  </P>
213  <br><a name="SEC9" href="#TOC1">POSIX MALLOC USAGE</a><br>  <br><a name="SEC10" href="#TOC1">POSIX MALLOC USAGE</a><br>
214  <P>  <P>
215  When PCRE is called through the POSIX interface (see the  When the 8-bit library is called through the POSIX interface (see the
216  <a href="pcreposix.html"><b>pcreposix</b></a>  <a href="pcreposix.html"><b>pcreposix</b></a>
217  documentation), additional working storage is required for holding the pointers  documentation), additional working storage is required for holding the pointers
218  to capturing substrings, because PCRE requires three integers per substring,  to capturing substrings, because PCRE requires three integers per substring,
# Line 198  such as Line 226  such as
226  </pre>  </pre>
227  to the <b>configure</b> command.  to the <b>configure</b> command.
228  </P>  </P>
229  <br><a name="SEC10" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>  <br><a name="SEC11" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
230  <P>  <P>
231  Within a compiled pattern, offset values are used to point from one part to  Within a compiled pattern, offset values are used to point from one part to
232  another (for example, from an opening parenthesis to an alternation  another (for example, from an opening parenthesis to an alternation
233  metacharacter). By default, two-byte values are used for these offsets, leading  metacharacter). By default, two-byte values are used for these offsets, leading
234  to a maximum size for a compiled pattern of around 64K. This is sufficient to  to a maximum size for a compiled pattern of around 64K. This is sufficient to
235  handle all but the most gigantic patterns. Nevertheless, some people do want to  handle all but the most gigantic patterns. Nevertheless, some people do want to
236  process truyl enormous patterns, so it is possible to compile PCRE to use  process truly enormous patterns, so it is possible to compile PCRE to use
237  three-byte or four-byte offsets by adding a setting such as  three-byte or four-byte offsets by adding a setting such as
238  <pre>  <pre>
239    --with-link-size=3    --with-link-size=3
240  </pre>  </pre>
241  to the <b>configure</b> command. The value given must be 2, 3, or 4. Using  to the <b>configure</b> command. The value given must be 2, 3, or 4. For the
242  longer offsets slows down the operation of PCRE because it has to load  16-bit library, a value of 3 is rounded up to 4. Using longer offsets slows
243  additional bytes when handling them.  down the operation of PCRE because it has to load additional data when handling
244    them.
245  </P>  </P>
246  <br><a name="SEC11" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>  <br><a name="SEC12" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
247  <P>  <P>
248  When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking  When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
249  by making recursive calls to an internal function called <b>match()</b>. In  by making recursive calls to an internal function called <b>match()</b>. In
# Line 245  perform better than malloc() and Line 274  perform better than malloc() and
274  slowly when built in this way. This option affects only the <b>pcre_exec()</b>  slowly when built in this way. This option affects only the <b>pcre_exec()</b>
275  function; it is not relevant for <b>pcre_dfa_exec()</b>.  function; it is not relevant for <b>pcre_dfa_exec()</b>.
276  </P>  </P>
277  <br><a name="SEC12" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>  <br><a name="SEC13" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
278  <P>  <P>
279  Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly  Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
280  (sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>  (sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
# Line 274  constraints. However, you can set a lowe Line 303  constraints. However, you can set a lowe
303  </pre>  </pre>
304  to the <b>configure</b> command. This value can also be overridden at run time.  to the <b>configure</b> command. This value can also be overridden at run time.
305  </P>  </P>
306  <br><a name="SEC13" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>  <br><a name="SEC14" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
307  <P>  <P>
308  PCRE uses fixed tables for processing characters whose code values are less  PCRE uses fixed tables for processing characters whose code values are less
309  than 256. By default, PCRE is built with a set of tables that are distributed  than 256. By default, PCRE is built with a set of tables that are distributed
# Line 291  compiling, because dftables is ru Line 320  compiling, because dftables is ru
320  create alternative tables when cross compiling, you will have to do so "by  create alternative tables when cross compiling, you will have to do so "by
321  hand".)  hand".)
322  </P>  </P>
323  <br><a name="SEC14" href="#TOC1">USING EBCDIC CODE</a><br>  <br><a name="SEC15" href="#TOC1">USING EBCDIC CODE</a><br>
324  <P>  <P>
325  PCRE assumes by default that it will run in an environment where the character  PCRE assumes by default that it will run in an environment where the character
326  code is ASCII (or Unicode, which is a superset of ASCII). This is the case for  code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
# Line 303  EBCDIC environment by adding Line 332  EBCDIC environment by adding
332  to the <b>configure</b> command. This setting implies  to the <b>configure</b> command. This setting implies
333  --enable-rebuild-chartables. You should only use it if you know that you are in  --enable-rebuild-chartables. You should only use it if you know that you are in
334  an EBCDIC environment (for example, an IBM mainframe operating system). The  an EBCDIC environment (for example, an IBM mainframe operating system). The
335  --enable-ebcdic option is incompatible with --enable-utf8.  --enable-ebcdic option is incompatible with --enable-utf.
336  </P>  </P>
337  <br><a name="SEC15" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>  <br><a name="SEC16" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
338  <P>  <P>
339  By default, <b>pcregrep</b> reads all files as plain text. You can build it so  By default, <b>pcregrep</b> reads all files as plain text. You can build it so
340  that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads  that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
# Line 318  to the configure command. These o Line 347  to the configure command. These o
347  relevant libraries are installed on your system. Configuration will fail if  relevant libraries are installed on your system. Configuration will fail if
348  they are not.  they are not.
349  </P>  </P>
350  <br><a name="SEC16" href="#TOC1">PCREGREP BUFFER SIZE</a><br>  <br><a name="SEC17" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
351  <P>  <P>
352  <b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is  <b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is
353  scanning, in order to be able to output "before" and "after" lines when it  scanning, in order to be able to output "before" and "after" lines when it
# Line 333  parameter value by adding, for example, Line 362  parameter value by adding, for example,
362  to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,  to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,
363  override this value by specifying a run-time option.  override this value by specifying a run-time option.
364  </P>  </P>
365  <br><a name="SEC17" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>  <br><a name="SEC18" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
366  <P>  <P>
367  If you add  If you add
368  <pre>  <pre>
# Line 364  automatically included, you may need to Line 393  automatically included, you may need to
393  </pre>  </pre>
394  immediately before the <b>configure</b> command.  immediately before the <b>configure</b> command.
395  </P>  </P>
396  <br><a name="SEC18" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
397  <P>  <P>
398  <b>pcreapi</b>(3), <b>pcre_config</b>(3).  <b>pcreapi</b>(3), <b>pcre16</b>, <b>pcre_config</b>(3).
399  </P>  </P>
400  <br><a name="SEC19" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
401  <P>  <P>
402  Philip Hazel  Philip Hazel
403  <br>  <br>
# Line 377  University Computing Service Line 406  University Computing Service
406  Cambridge CB2 3QH, England.  Cambridge CB2 3QH, England.
407  <br>  <br>
408  </P>  </P>
409  <br><a name="SEC20" href="#TOC1">REVISION</a><br>  <br><a name="SEC21" href="#TOC1">REVISION</a><br>
410  <P>  <P>
411  Last updated: 06 September 2011  Last updated: 07 January 2012
412  <br>  <br>
413  Copyright &copy; 1997-2011 University of Cambridge.  Copyright &copy; 1997-2012 University of Cambridge.
414  <br>  <br>
415  <p>  <p>
416  Return to the <a href="index.html">PCRE index page</a>.  Return to the <a href="index.html">PCRE index page</a>.

Legend:
Removed from v.691  
changed lines
  Added in v.869

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12