/[pcre]/code/trunk/doc/html/pcrebuild.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcrebuild.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 878 - (hide annotations) (download) (as text)
Sun Jan 15 15:44:47 2012 UTC (16 months ago) by ph10
File MIME type: text/html
File size: 18387 byte(s)
Fix HTML documentation and rebuild.

1 nigel 63 <html>
2     <head>
3     <title>pcrebuild specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 nigel 75 <h1>pcrebuild man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10 ph10 111 <p>
11 nigel 75 This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14 ph10 111 <br>
15 nigel 63 <ul>
16     <li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
17 ph10 869 <li><a name="TOC2" href="#SEC2">BUILDING 8-BIT and 16-BIT LIBRARIES</a>
18     <li><a name="TOC3" href="#SEC3">BUILDING SHARED AND STATIC LIBRARIES</a>
19     <li><a name="TOC4" href="#SEC4">C++ SUPPORT</a>
20     <li><a name="TOC5" href="#SEC5">UTF-8 and UTF-16 SUPPORT</a>
21     <li><a name="TOC6" href="#SEC6">UNICODE CHARACTER PROPERTY SUPPORT</a>
22     <li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
23     <li><a name="TOC8" href="#SEC8">CODE VALUE OF NEWLINE</a>
24     <li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
25     <li><a name="TOC10" href="#SEC10">POSIX MALLOC USAGE</a>
26     <li><a name="TOC11" href="#SEC11">HANDLING VERY LARGE PATTERNS</a>
27     <li><a name="TOC12" href="#SEC12">AVOIDING EXCESSIVE STACK USAGE</a>
28     <li><a name="TOC13" href="#SEC13">LIMITING PCRE RESOURCE USAGE</a>
29     <li><a name="TOC14" href="#SEC14">CREATING CHARACTER TABLES AT BUILD TIME</a>
30     <li><a name="TOC15" href="#SEC15">USING EBCDIC CODE</a>
31     <li><a name="TOC16" href="#SEC16">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
32     <li><a name="TOC17" href="#SEC17">PCREGREP BUFFER SIZE</a>
33     <li><a name="TOC18" href="#SEC18">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a>
34     <li><a name="TOC19" href="#SEC19">SEE ALSO</a>
35     <li><a name="TOC20" href="#SEC20">AUTHOR</a>
36     <li><a name="TOC21" href="#SEC21">REVISION</a>
37 nigel 63 </ul>
38     <br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
39     <P>
40     This document describes the optional features of PCRE that can be selected when
41 ph10 261 the library is compiled. It assumes use of the <b>configure</b> script, where
42     the optional features are selected or deselected by providing options to
43     <b>configure</b> before running the <b>make</b> command. However, the same
44     options can be selected in both Unix-like and non-Unix-like environments using
45 ph10 453 the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead of
46 ph10 461 <b>configure</b> to build PCRE.
47 ph10 261 </P>
48     <P>
49 ph10 461 There is a lot more information about building PCRE in non-Unix-like
50     environments in the file called <i>NON_UNIX_USE</i>, which is part of the PCRE
51     distribution. You should consult this file as well as the <i>README</i> file if
52 ph10 453 you are building in a non-Unix-like environment.
53     </P>
54     <P>
55 ph10 261 The complete list of options for <b>configure</b> (which includes the standard
56     ones such as the selection of the installation directory) can be obtained by
57     running
58 nigel 63 <pre>
59     ./configure --help
60 nigel 75 </pre>
61 ph10 128 The following sections include descriptions of options whose names begin with
62     --enable or --disable. These settings specify changes to the defaults for the
63 nigel 63 <b>configure</b> command. Because of the way that <b>configure</b> works,
64     --enable and --disable always come in pairs, so the complementary option always
65     exists as well, but as it specifies the default, it is not described.
66     </P>
67 ph10 869 <br><a name="SEC2" href="#TOC1">BUILDING 8-BIT and 16-BIT LIBRARIES</a><br>
68 nigel 63 <P>
69 ph10 869 By default, a library called <b>libpcre</b> is built, containing functions that
70     take string arguments contained in vectors of bytes, either as single-byte
71     characters, or interpreted as UTF-8 strings. You can also build a separate
72     library, called <b>libpcre16</b>, in which strings are contained in vectors of
73     16-bit data units and interpreted either as single-unit characters or UTF-16
74     strings, by adding
75     <pre>
76     --enable-pcre16
77     </pre>
78     to the <b>configure</b> command. If you do not want the 8-bit library, add
79     <pre>
80     --disable-pcre8
81     </pre>
82     as well. At least one of the two libraries must be built. Note that the C++ and
83     POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is an
84     8-bit program. None of these are built if you select only the 16-bit library.
85     </P>
86     <br><a name="SEC3" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
87     <P>
88 ph10 654 The PCRE building process uses <b>libtool</b> to build both shared and static
89     Unix libraries by default. You can suppress one of these by adding one of
90     <pre>
91     --disable-shared
92     --disable-static
93     </pre>
94     to the <b>configure</b> command, as required.
95     </P>
96 ph10 869 <br><a name="SEC4" href="#TOC1">C++ SUPPORT</a><br>
97 ph10 654 <P>
98 ph10 869 By default, if the 8-bit library is being built, the <b>configure</b> script
99     will search for a C++ compiler and C++ header files. If it finds them, it
100     automatically builds the C++ wrapper library (which supports only 8-bit
101     strings). You can disable this by adding
102 nigel 83 <pre>
103     --disable-cpp
104     </pre>
105     to the <b>configure</b> command.
106     </P>
107 ph10 869 <br><a name="SEC5" href="#TOC1">UTF-8 and UTF-16 SUPPORT</a><br>
108 nigel 83 <P>
109 ph10 869 To build PCRE with support for UTF Unicode character strings, add
110 nigel 63 <pre>
111 ph10 869 --enable-utf
112 nigel 75 </pre>
113 ph10 878 to the <b>configure</b> command. This setting applies to both libraries, adding
114 ph10 869 support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
115 ph10 878 library. There are no separate options for enabling UTF-8 and UTF-16
116     independently because that would allow ridiculous settings such as requesting
117     UTF-16 support while building only the 8-bit library. It is not possible to
118     build one library with UTF support and the other without in the same
119     configuration. (For backwards compatibility, --enable-utf8 is a synonym of
120     --enable-utf.)
121 nigel 63 </P>
122 ph10 392 <P>
123 ph10 869 Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
124     well as compiling PCRE with this option, you also have have to set the
125     PCRE_UTF8 or PCRE_UTF16 option when you call one of the pattern compiling
126     functions.
127     </P>
128     <P>
129     If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
130 ph10 392 its input to be either ASCII or UTF-8 (depending on the runtime option). It is
131     not possible to support both EBCDIC and UTF-8 codes in the same version of the
132 ph10 869 library. Consequently, --enable-utf and --enable-ebcdic are mutually
133 ph10 392 exclusive.
134     </P>
135 ph10 869 <br><a name="SEC6" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
136 nigel 63 <P>
137 ph10 869 UTF support allows the libraries to process character codepoints up to 0x10ffff
138     in the strings that they handle. On its own, however, it does not provide any
139 nigel 75 facilities for accessing the properties of such characters. If you want to be
140     able to use the pattern escapes \P, \p, and \X, which refer to Unicode
141     character properties, you must add
142     <pre>
143     --enable-unicode-properties
144     </pre>
145 ph10 869 to the <b>configure</b> command. This implies UTF support, even if you have
146 nigel 75 not explicitly requested it.
147     </P>
148     <P>
149 ph10 128 Including Unicode property support adds around 30K of tables to the PCRE
150     library. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
151     supported. Details are given in the
152 nigel 75 <a href="pcrepattern.html"><b>pcrepattern</b></a>
153     documentation.
154     </P>
155 ph10 869 <br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
156 nigel 75 <P>
157 ph10 691 Just-in-time compiler support is included in the build by specifying
158     <pre>
159     --enable-jit
160     </pre>
161     This support is available only for certain hardware architectures. If this
162     option is set for an unsupported architecture, a compile time error occurs.
163     See the
164     <a href="pcrejit.html"><b>pcrejit</b></a>
165     documentation for a discussion of JIT usage. When JIT support is enabled,
166     pcregrep automatically makes use of it, unless you add
167     <pre>
168     --disable-pcregrep-jit
169     </pre>
170     to the "configure" command.
171     </P>
172 ph10 869 <br><a name="SEC8" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
173 ph10 691 <P>
174 ph10 392 By default, PCRE interprets the linefeed (LF) character as indicating the end
175 nigel 91 of a line. This is the normal newline character on Unix-like systems. You can
176 ph10 392 compile PCRE to use carriage return (CR) instead, by adding
177 nigel 63 <pre>
178     --enable-newline-is-cr
179 nigel 75 </pre>
180 nigel 91 to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
181     which explicitly specifies linefeed as the newline character.
182     <br>
183     <br>
184     Alternatively, you can specify that line endings are to be indicated by the two
185     character sequence CRLF. If you want this, add
186     <pre>
187     --enable-newline-is-crlf
188     </pre>
189 nigel 93 to the <b>configure</b> command. There is a fourth option, specified by
190     <pre>
191 ph10 150 --enable-newline-is-anycrlf
192     </pre>
193     which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
194     indicating a line ending. Finally, a fifth option, specified by
195     <pre>
196 nigel 93 --enable-newline-is-any
197     </pre>
198 ph10 150 causes PCRE to recognize any Unicode newline sequence.
199 nigel 63 </P>
200 nigel 93 <P>
201     Whatever line ending convention is selected when PCRE is built can be
202     overridden when the library functions are called. At build time it is
203     conventional to use the standard for your operating system.
204     </P>
205 ph10 869 <br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
206 nigel 63 <P>
207 ph10 231 By default, the sequence \R in a pattern matches any Unicode newline sequence,
208     whatever has been selected as the line ending sequence. If you specify
209     <pre>
210     --enable-bsr-anycrlf
211     </pre>
212     the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
213     selected when PCRE is built can be overridden when the library functions are
214     called.
215     </P>
216 ph10 869 <br><a name="SEC10" href="#TOC1">POSIX MALLOC USAGE</a><br>
217 nigel 63 <P>
218 ph10 869 When the 8-bit library is called through the POSIX interface (see the
219 nigel 75 <a href="pcreposix.html"><b>pcreposix</b></a>
220 nigel 63 documentation), additional working storage is required for holding the pointers
221 nigel 75 to capturing substrings, because PCRE requires three integers per substring,
222 nigel 63 whereas the POSIX interface provides only two. If the number of expected
223     substrings is small, the wrapper function uses space on the stack, because this
224     is faster than using <b>malloc()</b> for each call. The default threshold above
225     which the stack is no longer used is 10; it can be changed by adding a setting
226     such as
227     <pre>
228     --with-posix-malloc-threshold=20
229 nigel 75 </pre>
230 nigel 63 to the <b>configure</b> command.
231     </P>
232 ph10 869 <br><a name="SEC11" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
233 nigel 63 <P>
234     Within a compiled pattern, offset values are used to point from one part to
235     another (for example, from an opening parenthesis to an alternation
236 nigel 75 metacharacter). By default, two-byte values are used for these offsets, leading
237 nigel 63 to a maximum size for a compiled pattern of around 64K. This is sufficient to
238     handle all but the most gigantic patterns. Nevertheless, some people do want to
239 ph10 869 process truly enormous patterns, so it is possible to compile PCRE to use
240 ph10 461 three-byte or four-byte offsets by adding a setting such as
241 nigel 63 <pre>
242     --with-link-size=3
243 nigel 75 </pre>
244 ph10 869 to the <b>configure</b> command. The value given must be 2, 3, or 4. For the
245     16-bit library, a value of 3 is rounded up to 4. Using longer offsets slows
246     down the operation of PCRE because it has to load additional data when handling
247     them.
248 nigel 63 </P>
249 ph10 869 <br><a name="SEC12" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
250 nigel 63 <P>
251 nigel 77 When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
252     by making recursive calls to an internal function called <b>match()</b>. In
253     environments where the size of the stack is limited, this can severely limit
254     PCRE's operation. (The Unix environment does not usually suffer from this
255 nigel 91 problem, but it may sometimes be necessary to increase the maximum stack size.
256     There is a discussion in the
257     <a href="pcrestack.html"><b>pcrestack</b></a>
258     documentation.) An alternative approach to recursion that uses memory from the
259     heap to remember data, instead of using recursive function calls, has been
260     implemented to work round the problem of limited stack size. If you want to
261     build a version of PCRE that works this way, add
262 nigel 73 <pre>
263     --disable-stack-for-recursion
264 nigel 75 </pre>
265 nigel 73 to the <b>configure</b> command. With this configuration, PCRE will use the
266     <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
267 ph10 182 management functions. By default these point to <b>malloc()</b> and
268     <b>free()</b>, but you can replace the pointers so that your own functions are
269 ph10 461 used instead.
270 nigel 73 </P>
271 ph10 182 <P>
272     Separate functions are provided rather than using <b>pcre_malloc</b> and
273     <b>pcre_free</b> because the usage is very predictable: the block sizes
274     requested are always the same, and the blocks are always freed in reverse
275     order. A calling program might be able to implement optimized functions that
276     perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more
277     slowly when built in this way. This option affects only the <b>pcre_exec()</b>
278 ph10 461 function; it is not relevant for <b>pcre_dfa_exec()</b>.
279 ph10 182 </P>
280 ph10 869 <br><a name="SEC13" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
281 nigel 91 <P>
282     Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
283     (sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
284     function. By controlling the maximum number of times this function may be
285     called during a single matching operation, a limit can be placed on the
286     resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
287     at run time, as described in the
288     <a href="pcreapi.html"><b>pcreapi</b></a>
289     documentation. The default is 10 million, but this can be changed by adding a
290     setting such as
291     <pre>
292     --with-match-limit=500000
293     </pre>
294     to the <b>configure</b> command. This setting has no effect on the
295     <b>pcre_dfa_exec()</b> matching function.
296     </P>
297     <P>
298     In some environments it is desirable to limit the depth of recursive calls of
299     <b>match()</b> more strictly than the total number of calls, in order to
300     restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
301     is specified) that is used. A second limit controls this; it defaults to the
302     value that is set for --with-match-limit, which imposes no additional
303     constraints. However, you can set a lower limit by adding, for example,
304     <pre>
305     --with-match-limit-recursion=10000
306     </pre>
307     to the <b>configure</b> command. This value can also be overridden at run time.
308     </P>
309 ph10 869 <br><a name="SEC14" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
310 nigel 73 <P>
311 ph10 128 PCRE uses fixed tables for processing characters whose code values are less
312     than 256. By default, PCRE is built with a set of tables that are distributed
313     in the file <i>pcre_chartables.c.dist</i>. These tables are for ASCII codes
314     only. If you add
315     <pre>
316     --enable-rebuild-chartables
317     </pre>
318     to the <b>configure</b> command, the distributed tables are no longer used.
319     Instead, a program called <b>dftables</b> is compiled and run. This outputs the
320     source for new set of tables, created in the default locale of your C runtime
321     system. (This method of replacing the tables does not work if you are cross
322     compiling, because <b>dftables</b> is run on the local host. If you need to
323     create alternative tables when cross compiling, you will have to do so "by
324     hand".)
325     </P>
326 ph10 869 <br><a name="SEC15" href="#TOC1">USING EBCDIC CODE</a><br>
327 ph10 128 <P>
328 nigel 73 PCRE assumes by default that it will run in an environment where the character
329 ph10 197 code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
330     most computer operating systems. PCRE can, however, be compiled to run in an
331     EBCDIC environment by adding
332 nigel 73 <pre>
333     --enable-ebcdic
334 nigel 75 </pre>
335 ph10 128 to the <b>configure</b> command. This setting implies
336 ph10 197 --enable-rebuild-chartables. You should only use it if you know that you are in
337 ph10 392 an EBCDIC environment (for example, an IBM mainframe operating system). The
338 ph10 869 --enable-ebcdic option is incompatible with --enable-utf.
339 nigel 73 </P>
340 ph10 869 <br><a name="SEC16" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
341 nigel 73 <P>
342 ph10 286 By default, <b>pcregrep</b> reads all files as plain text. You can build it so
343     that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
344     them with <b>libz</b> or <b>libbz2</b>, respectively, by adding one or both of
345     <pre>
346     --enable-pcregrep-libz
347     --enable-pcregrep-libbz2
348     </pre>
349     to the <b>configure</b> command. These options naturally require that the
350     relevant libraries are installed on your system. Configuration will fail if
351     they are not.
352     </P>
353 ph10 869 <br><a name="SEC17" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
354 ph10 286 <P>
355 ph10 654 <b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is
356     scanning, in order to be able to output "before" and "after" lines when it
357     finds a match. The size of the buffer is controlled by a parameter whose
358     default value is 20K. The buffer itself is three times this size, but because
359     of the way it is used for holding "before" lines, the longest line that is
360     guaranteed to be processable is the parameter size. You can change the default
361     parameter value by adding, for example,
362     <pre>
363     --with-pcregrep-bufsize=50K
364     </pre>
365     to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,
366     override this value by specifying a run-time option.
367     </P>
368 ph10 869 <br><a name="SEC18" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
369 ph10 654 <P>
370 ph10 289 If you add
371     <pre>
372     --enable-pcretest-libreadline
373     </pre>
374     to the <b>configure</b> command, <b>pcretest</b> is linked with the
375     <b>libreadline</b> library, and when its input is from a terminal, it reads it
376     using the <b>readline()</b> function. This provides line-editing and history
377 ph10 461 facilities. Note that <b>libreadline</b> is GPL-licensed, so if you distribute a
378 ph10 289 binary of <b>pcretest</b> linked in this way, there may be licensing issues.
379     </P>
380 ph10 345 <P>
381     Setting this option causes the <b>-lreadline</b> option to be added to the
382     <b>pcretest</b> build. In many operating environments with a sytem-installed
383     <b>libreadline</b> this is sufficient. However, in some environments (e.g.
384     if an unmodified distribution version of readline is in use), some extra
385     configuration may be necessary. The INSTALL file for <b>libreadline</b> says
386     this:
387     <pre>
388     "Readline uses the termcap functions, but does not link with the
389     termcap or curses library itself, allowing applications which link
390     with readline the to choose an appropriate library."
391     </pre>
392     If your environment has not been set up so that an appropriate library is
393     automatically included, you may need to add something like
394     <pre>
395     LIBS="-ncurses"
396     </pre>
397     immediately before the <b>configure</b> command.
398     </P>
399 ph10 869 <br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
400 ph10 289 <P>
401 ph10 869 <b>pcreapi</b>(3), <b>pcre16</b>, <b>pcre_config</b>(3).
402 nigel 93 </P>
403 ph10 869 <br><a name="SEC20" href="#TOC1">AUTHOR</a><br>
404 nigel 93 <P>
405 ph10 99 Philip Hazel
406 nigel 63 <br>
407 ph10 99 University Computing Service
408     <br>
409     Cambridge CB2 3QH, England.
410     <br>
411     </P>
412 ph10 869 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
413 ph10 99 <P>
414 ph10 869 Last updated: 07 January 2012
415 ph10 99 <br>
416 ph10 869 Copyright &copy; 1997-2012 University of Cambridge.
417 ph10 99 <br>
418 nigel 75 <p>
419     Return to the <a href="index.html">PCRE index page</a>.
420     </p>

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12