/[pcre]/code/trunk/doc/pcrebuild.3
ViewVC logotype

Contents of /code/trunk/doc/pcrebuild.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 857 - (hide annotations) (download)
Sat Jan 7 17:39:10 2012 UTC (2 years, 8 months ago) by ph10
File size: 15405 byte(s)
Documentation.

1 nigel 79 .TH PCREBUILD 3
2 nigel 63 .SH NAME
3     PCRE - Perl-compatible regular expressions
4 ph10 456 .
5     .
6 nigel 75 .SH "PCRE BUILD-TIME OPTIONS"
7 nigel 63 .rs
8     .sp
9     This document describes the optional features of PCRE that can be selected when
10 ph10 260 the library is compiled. It assumes use of the \fBconfigure\fP script, where
11     the optional features are selected or deselected by providing options to
12     \fBconfigure\fP before running the \fBmake\fP command. However, the same
13     options can be selected in both Unix-like and non-Unix-like environments using
14 ph10 436 the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead of
15 ph10 461 \fBconfigure\fP to build PCRE.
16 ph10 260 .P
17 ph10 461 There is a lot more information about building PCRE in non-Unix-like
18     environments in the file called \fINON_UNIX_USE\fP, which is part of the PCRE
19     distribution. You should consult this file as well as the \fIREADME\fP file if
20 ph10 436 you are building in a non-Unix-like environment.
21     .P
22 ph10 260 The complete list of options for \fBconfigure\fP (which includes the standard
23     ones such as the selection of the installation directory) can be obtained by
24     running
25 nigel 75 .sp
26 nigel 63 ./configure --help
27 nigel 75 .sp
28 ph10 128 The following sections include descriptions of options whose names begin with
29     --enable or --disable. These settings specify changes to the defaults for the
30 nigel 75 \fBconfigure\fP command. Because of the way that \fBconfigure\fP works,
31 nigel 63 --enable and --disable always come in pairs, so the complementary option always
32     exists as well, but as it specifies the default, it is not described.
33 nigel 75 .
34 ph10 456 .
35 ph10 857 .SH "BUILDING 8-BIT and 16-BIT LIBRARIES"
36     .rs
37     .sp
38     By default, a library called \fBlibpcre\fP is built, containing functions that
39     take string arguments contained in vectors of bytes, either as single-byte
40     characters, or interpreted as UTF-8 strings. You can also build a separate
41     library, called \fBlibpcre16\fP, in which strings are contained in vectors of
42     16-bit data units and interpreted either as single-unit characters or UTF-16
43     strings, by adding
44     .sp
45     --enable-pcre16
46     .sp
47     to the \fBconfigure\fP command. If you do not want the 8-bit library, add
48     .sp
49     --disable-pcre8
50     .sp
51     as well. At least one of the two libraries must be built. Note that the C++ and
52     POSIX wrappers are for the 8-bit library only, and that \fBpcregrep\fP is an
53     8-bit program. None of these are built if you select only the 16-bit library.
54     .
55     .
56 ph10 654 .SH "BUILDING SHARED AND STATIC LIBRARIES"
57     .rs
58     .sp
59     The PCRE building process uses \fBlibtool\fP to build both shared and static
60     Unix libraries by default. You can suppress one of these by adding one of
61     .sp
62     --disable-shared
63     --disable-static
64     .sp
65     to the \fBconfigure\fP command, as required.
66     .
67     .
68 nigel 83 .SH "C++ SUPPORT"
69     .rs
70     .sp
71 ph10 857 By default, if the 8-bit library is being built, the \fBconfigure\fP script
72     will search for a C++ compiler and C++ header files. If it finds them, it
73     automatically builds the C++ wrapper library (which supports only 8-bit
74     strings). You can disable this by adding
75 nigel 83 .sp
76     --disable-cpp
77     .sp
78     to the \fBconfigure\fP command.
79     .
80 ph10 456 .
81 ph10 857 .SH "UTF-8 and UTF-16 SUPPORT"
82 nigel 63 .rs
83     .sp
84 ph10 857 To build PCRE with support for UTF Unicode character strings, add
85 nigel 75 .sp
86 ph10 857 --enable-utf
87 nigel 75 .sp
88 ph10 857 to the \fBconfigure\fP command. This setting applies to both libraries, adding
89     support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
90     library. It is not possible to build one library with UTF support and the other
91     without in the same configuration. (For backwards compatibility, --enable-utf8
92     is a synonym of --enable-utf.)
93 ph10 391 .P
94 ph10 857 Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
95     well as compiling PCRE with this option, you also have have to set the
96     PCRE_UTF8 or PCRE_UTF16 option when you call one of the pattern compiling
97     functions.
98     .P
99     If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
100 ph10 392 its input to be either ASCII or UTF-8 (depending on the runtime option). It is
101     not possible to support both EBCDIC and UTF-8 codes in the same version of the
102 ph10 857 library. Consequently, --enable-utf and --enable-ebcdic are mutually
103 ph10 391 exclusive.
104 nigel 75 .
105 ph10 456 .
106 nigel 75 .SH "UNICODE CHARACTER PROPERTY SUPPORT"
107 nigel 63 .rs
108     .sp
109 ph10 857 UTF support allows the libraries to process character codepoints up to 0x10ffff
110     in the strings that they handle. On its own, however, it does not provide any
111 nigel 75 facilities for accessing the properties of such characters. If you want to be
112     able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode
113     character properties, you must add
114     .sp
115     --enable-unicode-properties
116     .sp
117 ph10 857 to the \fBconfigure\fP command. This implies UTF support, even if you have
118 nigel 75 not explicitly requested it.
119     .P
120 ph10 128 Including Unicode property support adds around 30K of tables to the PCRE
121     library. Only the general category properties such as \fILu\fP and \fINd\fP are
122     supported. Details are given in the
123 nigel 75 .\" HREF
124     \fBpcrepattern\fP
125     .\"
126     documentation.
127     .
128 ph10 456 .
129 ph10 678 .SH "JUST-IN-TIME COMPILER SUPPORT"
130     .rs
131     .sp
132     Just-in-time compiler support is included in the build by specifying
133     .sp
134     --enable-jit
135     .sp
136 ph10 691 This support is available only for certain hardware architectures. If this
137 ph10 678 option is set for an unsupported architecture, a compile time error occurs.
138 ph10 691 See the
139 ph10 678 .\" HREF
140     \fBpcrejit\fP
141     .\"
142 ph10 685 documentation for a discussion of JIT usage. When JIT support is enabled,
143     pcregrep automatically makes use of it, unless you add
144     .sp
145 ph10 691 --disable-pcregrep-jit
146     .sp
147     to the "configure" command.
148 ph10 678 .
149     .
150 nigel 75 .SH "CODE VALUE OF NEWLINE"
151     .rs
152     .sp
153 ph10 391 By default, PCRE interprets the linefeed (LF) character as indicating the end
154 nigel 91 of a line. This is the normal newline character on Unix-like systems. You can
155 ph10 391 compile PCRE to use carriage return (CR) instead, by adding
156 nigel 75 .sp
157 nigel 63 --enable-newline-is-cr
158 nigel 75 .sp
159 nigel 91 to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option,
160     which explicitly specifies linefeed as the newline character.
161     .sp
162     Alternatively, you can specify that line endings are to be indicated by the two
163     character sequence CRLF. If you want this, add
164     .sp
165     --enable-newline-is-crlf
166     .sp
167 nigel 93 to the \fBconfigure\fP command. There is a fourth option, specified by
168     .sp
169 ph10 149 --enable-newline-is-anycrlf
170     .sp
171     which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
172     indicating a line ending. Finally, a fifth option, specified by
173     .sp
174 nigel 93 --enable-newline-is-any
175     .sp
176 ph10 149 causes PCRE to recognize any Unicode newline sequence.
177 nigel 93 .P
178     Whatever line ending convention is selected when PCRE is built can be
179     overridden when the library functions are called. At build time it is
180     conventional to use the standard for your operating system.
181 nigel 75 .
182 ph10 456 .
183 ph10 231 .SH "WHAT \eR MATCHES"
184     .rs
185     .sp
186     By default, the sequence \eR in a pattern matches any Unicode newline sequence,
187     whatever has been selected as the line ending sequence. If you specify
188     .sp
189     --enable-bsr-anycrlf
190     .sp
191     the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is
192     selected when PCRE is built can be overridden when the library functions are
193     called.
194     .
195 ph10 456 .
196 nigel 75 .SH "POSIX MALLOC USAGE"
197 nigel 63 .rs
198     .sp
199 ph10 857 When the 8-bit library is called through the POSIX interface (see the
200 nigel 75 .\" HREF
201     \fBpcreposix\fP
202     .\"
203 nigel 63 documentation), additional working storage is required for holding the pointers
204 nigel 75 to capturing substrings, because PCRE requires three integers per substring,
205 nigel 63 whereas the POSIX interface provides only two. If the number of expected
206     substrings is small, the wrapper function uses space on the stack, because this
207 nigel 75 is faster than using \fBmalloc()\fP for each call. The default threshold above
208 nigel 63 which the stack is no longer used is 10; it can be changed by adding a setting
209     such as
210 nigel 75 .sp
211 nigel 63 --with-posix-malloc-threshold=20
212 nigel 75 .sp
213     to the \fBconfigure\fP command.
214     .
215 ph10 456 .
216 nigel 75 .SH "HANDLING VERY LARGE PATTERNS"
217 nigel 63 .rs
218     .sp
219     Within a compiled pattern, offset values are used to point from one part to
220     another (for example, from an opening parenthesis to an alternation
221 nigel 75 metacharacter). By default, two-byte values are used for these offsets, leading
222 nigel 63 to a maximum size for a compiled pattern of around 64K. This is sufficient to
223     handle all but the most gigantic patterns. Nevertheless, some people do want to
224 ph10 857 process truly enormous patterns, so it is possible to compile PCRE to use
225 ph10 456 three-byte or four-byte offsets by adding a setting such as
226 nigel 75 .sp
227 nigel 63 --with-link-size=3
228 nigel 75 .sp
229 ph10 857 to the \fBconfigure\fP command. The value given must be 2, 3, or 4. For the
230     16-bit library, a value of 3 is rounded up to 4. Using longer offsets slows
231     down the operation of PCRE because it has to load additional data when handling
232     them.
233 nigel 75 .
234 ph10 456 .
235 nigel 75 .SH "AVOIDING EXCESSIVE STACK USAGE"
236 nigel 73 .rs
237     .sp
238 nigel 77 When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking
239     by making recursive calls to an internal function called \fBmatch()\fP. In
240     environments where the size of the stack is limited, this can severely limit
241     PCRE's operation. (The Unix environment does not usually suffer from this
242 nigel 91 problem, but it may sometimes be necessary to increase the maximum stack size.
243     There is a discussion in the
244     .\" HREF
245     \fBpcrestack\fP
246     .\"
247     documentation.) An alternative approach to recursion that uses memory from the
248     heap to remember data, instead of using recursive function calls, has been
249     implemented to work round the problem of limited stack size. If you want to
250     build a version of PCRE that works this way, add
251 nigel 75 .sp
252 nigel 73 --disable-stack-for-recursion
253 nigel 75 .sp
254     to the \fBconfigure\fP command. With this configuration, PCRE will use the
255     \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
256 ph10 174 management functions. By default these point to \fBmalloc()\fP and
257     \fBfree()\fP, but you can replace the pointers so that your own functions are
258 ph10 456 used instead.
259 ph10 174 .P
260     Separate functions are provided rather than using \fBpcre_malloc\fP and
261     \fBpcre_free\fP because the usage is very predictable: the block sizes
262     requested are always the same, and the blocks are always freed in reverse
263     order. A calling program might be able to implement optimized functions that
264     perform better than \fBmalloc()\fP and \fBfree()\fP. PCRE runs noticeably more
265     slowly when built in this way. This option affects only the \fBpcre_exec()\fP
266 ph10 456 function; it is not relevant for \fBpcre_dfa_exec()\fP.
267 nigel 75 .
268 ph10 456 .
269 nigel 91 .SH "LIMITING PCRE RESOURCE USAGE"
270     .rs
271     .sp
272     Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly
273     (sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP
274     function. By controlling the maximum number of times this function may be
275     called during a single matching operation, a limit can be placed on the
276     resources used by a single call to \fBpcre_exec()\fP. The limit can be changed
277     at run time, as described in the
278     .\" HREF
279     \fBpcreapi\fP
280     .\"
281     documentation. The default is 10 million, but this can be changed by adding a
282     setting such as
283     .sp
284     --with-match-limit=500000
285     .sp
286     to the \fBconfigure\fP command. This setting has no effect on the
287     \fBpcre_dfa_exec()\fP matching function.
288     .P
289     In some environments it is desirable to limit the depth of recursive calls of
290     \fBmatch()\fP more strictly than the total number of calls, in order to
291     restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
292     is specified) that is used. A second limit controls this; it defaults to the
293     value that is set for --with-match-limit, which imposes no additional
294     constraints. However, you can set a lower limit by adding, for example,
295     .sp
296     --with-match-limit-recursion=10000
297     .sp
298     to the \fBconfigure\fP command. This value can also be overridden at run time.
299     .
300 ph10 456 .
301 ph10 128 .SH "CREATING CHARACTER TABLES AT BUILD TIME"
302     .rs
303     .sp
304     PCRE uses fixed tables for processing characters whose code values are less
305     than 256. By default, PCRE is built with a set of tables that are distributed
306     in the file \fIpcre_chartables.c.dist\fP. These tables are for ASCII codes
307     only. If you add
308     .sp
309     --enable-rebuild-chartables
310     .sp
311     to the \fBconfigure\fP command, the distributed tables are no longer used.
312     Instead, a program called \fBdftables\fP is compiled and run. This outputs the
313     source for new set of tables, created in the default locale of your C runtime
314     system. (This method of replacing the tables does not work if you are cross
315     compiling, because \fBdftables\fP is run on the local host. If you need to
316     create alternative tables when cross compiling, you will have to do so "by
317     hand".)
318     .
319 ph10 456 .
320 nigel 75 .SH "USING EBCDIC CODE"
321 nigel 73 .rs
322     .sp
323     PCRE assumes by default that it will run in an environment where the character
324 ph10 195 code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
325     most computer operating systems. PCRE can, however, be compiled to run in an
326     EBCDIC environment by adding
327 nigel 75 .sp
328 nigel 73 --enable-ebcdic
329 nigel 75 .sp
330 ph10 128 to the \fBconfigure\fP command. This setting implies
331 ph10 197 --enable-rebuild-chartables. You should only use it if you know that you are in
332 ph10 392 an EBCDIC environment (for example, an IBM mainframe operating system). The
333 ph10 857 --enable-ebcdic option is incompatible with --enable-utf.
334 nigel 93 .
335 ph10 456 .
336 ph10 286 .SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT"
337     .rs
338     .sp
339     By default, \fBpcregrep\fP reads all files as plain text. You can build it so
340     that it recognizes files whose names end in \fB.gz\fP or \fB.bz2\fP, and reads
341     them with \fBlibz\fP or \fBlibbz2\fP, respectively, by adding one or both of
342     .sp
343     --enable-pcregrep-libz
344     --enable-pcregrep-libbz2
345     .sp
346     to the \fBconfigure\fP command. These options naturally require that the
347     relevant libraries are installed on your system. Configuration will fail if
348     they are not.
349 nigel 93 .
350 ph10 456 .
351 ph10 654 .SH "PCREGREP BUFFER SIZE"
352     .rs
353     .sp
354     \fBpcregrep\fP uses an internal buffer to hold a "window" on the file it is
355     scanning, in order to be able to output "before" and "after" lines when it
356     finds a match. The size of the buffer is controlled by a parameter whose
357     default value is 20K. The buffer itself is three times this size, but because
358     of the way it is used for holding "before" lines, the longest line that is
359     guaranteed to be processable is the parameter size. You can change the default
360     parameter value by adding, for example,
361     .sp
362     --with-pcregrep-bufsize=50K
363     .sp
364     to the \fBconfigure\fP command. The caller of \fPpcregrep\fP can, however,
365     override this value by specifying a run-time option.
366     .
367     .
368 ph10 287 .SH "PCRETEST OPTION FOR LIBREADLINE SUPPORT"
369     .rs
370     .sp
371     If you add
372     .sp
373     --enable-pcretest-libreadline
374     .sp
375 ph10 289 to the \fBconfigure\fP command, \fBpcretest\fP is linked with the
376     \fBlibreadline\fP library, and when its input is from a terminal, it reads it
377     using the \fBreadline()\fP function. This provides line-editing and history
378 ph10 456 facilities. Note that \fBlibreadline\fP is GPL-licensed, so if you distribute a
379 ph10 287 binary of \fBpcretest\fP linked in this way, there may be licensing issues.
380 ph10 338 .P
381     Setting this option causes the \fB-lreadline\fP option to be added to the
382     \fBpcretest\fP build. In many operating environments with a sytem-installed
383     \fBlibreadline\fP this is sufficient. However, in some environments (e.g.
384     if an unmodified distribution version of readline is in use), some extra
385 ph10 345 configuration may be necessary. The INSTALL file for \fBlibreadline\fP says
386 ph10 338 this:
387     .sp
388 ph10 345 "Readline uses the termcap functions, but does not link with the
389     termcap or curses library itself, allowing applications which link
390 ph10 338 with readline the to choose an appropriate library."
391 ph10 345 .sp
392     If your environment has not been set up so that an appropriate library is
393 ph10 338 automatically included, you may need to add something like
394     .sp
395     LIBS="-ncurses"
396     .sp
397 ph10 345 immediately before the \fBconfigure\fP command.
398 ph10 286 .
399 ph10 287 .
400 nigel 93 .SH "SEE ALSO"
401     .rs
402     .sp
403 ph10 857 \fBpcreapi\fP(3), \fBpcre16\fP, \fBpcre_config\fP(3).
404 ph10 99 .
405     .
406     .SH AUTHOR
407     .rs
408     .sp
409     .nf
410     Philip Hazel
411     University Computing Service
412     Cambridge CB2 3QH, England.
413     .fi
414     .
415     .
416     .SH REVISION
417     .rs
418     .sp
419     .nf
420 ph10 857 Last updated: 07 January 2012
421     Copyright (c) 1997-2012 University of Cambridge.
422 ph10 99 .fi

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12