/[pcre]/code/trunk/doc/pcrebuild.3
ViewVC logotype

Contents of /code/trunk/doc/pcrebuild.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 954 - (hide annotations) (download)
Sat Mar 31 18:09:26 2012 UTC (2 years ago) by ph10
File size: 15618 byte(s)
Add date and PCRE version to .TH macros of all man pages.

1 ph10 954 .TH PCREBUILD 3 "07 January 2012" "PCRE 8.30"
2 nigel 63 .SH NAME
3     PCRE - Perl-compatible regular expressions
4 ph10 456 .
5     .
6 nigel 75 .SH "PCRE BUILD-TIME OPTIONS"
7 nigel 63 .rs
8     .sp
9     This document describes the optional features of PCRE that can be selected when
10 ph10 260 the library is compiled. It assumes use of the \fBconfigure\fP script, where
11     the optional features are selected or deselected by providing options to
12     \fBconfigure\fP before running the \fBmake\fP command. However, the same
13     options can be selected in both Unix-like and non-Unix-like environments using
14 ph10 436 the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead of
15 ph10 461 \fBconfigure\fP to build PCRE.
16 ph10 260 .P
17 ph10 461 There is a lot more information about building PCRE in non-Unix-like
18     environments in the file called \fINON_UNIX_USE\fP, which is part of the PCRE
19     distribution. You should consult this file as well as the \fIREADME\fP file if
20 ph10 436 you are building in a non-Unix-like environment.
21     .P
22 ph10 260 The complete list of options for \fBconfigure\fP (which includes the standard
23     ones such as the selection of the installation directory) can be obtained by
24     running
25 nigel 75 .sp
26 nigel 63 ./configure --help
27 nigel 75 .sp
28 ph10 128 The following sections include descriptions of options whose names begin with
29     --enable or --disable. These settings specify changes to the defaults for the
30 nigel 75 \fBconfigure\fP command. Because of the way that \fBconfigure\fP works,
31 nigel 63 --enable and --disable always come in pairs, so the complementary option always
32     exists as well, but as it specifies the default, it is not described.
33 nigel 75 .
34 ph10 456 .
35 ph10 857 .SH "BUILDING 8-BIT and 16-BIT LIBRARIES"
36     .rs
37     .sp
38 ph10 903 By default, a library called \fBlibpcre\fP is built, containing functions that
39     take string arguments contained in vectors of bytes, either as single-byte
40 ph10 857 characters, or interpreted as UTF-8 strings. You can also build a separate
41 ph10 903 library, called \fBlibpcre16\fP, in which strings are contained in vectors of
42     16-bit data units and interpreted either as single-unit characters or UTF-16
43 ph10 857 strings, by adding
44     .sp
45     --enable-pcre16
46     .sp
47     to the \fBconfigure\fP command. If you do not want the 8-bit library, add
48     .sp
49     --disable-pcre8
50     .sp
51     as well. At least one of the two libraries must be built. Note that the C++ and
52     POSIX wrappers are for the 8-bit library only, and that \fBpcregrep\fP is an
53     8-bit program. None of these are built if you select only the 16-bit library.
54     .
55     .
56 ph10 654 .SH "BUILDING SHARED AND STATIC LIBRARIES"
57     .rs
58     .sp
59     The PCRE building process uses \fBlibtool\fP to build both shared and static
60     Unix libraries by default. You can suppress one of these by adding one of
61     .sp
62     --disable-shared
63     --disable-static
64     .sp
65     to the \fBconfigure\fP command, as required.
66     .
67     .
68 nigel 83 .SH "C++ SUPPORT"
69     .rs
70     .sp
71 ph10 857 By default, if the 8-bit library is being built, the \fBconfigure\fP script
72     will search for a C++ compiler and C++ header files. If it finds them, it
73 ph10 903 automatically builds the C++ wrapper library (which supports only 8-bit
74 ph10 857 strings). You can disable this by adding
75 nigel 83 .sp
76     --disable-cpp
77     .sp
78     to the \fBconfigure\fP command.
79     .
80 ph10 456 .
81 ph10 857 .SH "UTF-8 and UTF-16 SUPPORT"
82 nigel 63 .rs
83     .sp
84 ph10 857 To build PCRE with support for UTF Unicode character strings, add
85 nigel 75 .sp
86 ph10 857 --enable-utf
87 nigel 75 .sp
88 ph10 873 to the \fBconfigure\fP command. This setting applies to both libraries, adding
89 ph10 857 support for UTF-8 to the 8-bit library and support for UTF-16 to the 16-bit
90 ph10 873 library. There are no separate options for enabling UTF-8 and UTF-16
91     independently because that would allow ridiculous settings such as requesting
92     UTF-16 support while building only the 8-bit library. It is not possible to
93     build one library with UTF support and the other without in the same
94     configuration. (For backwards compatibility, --enable-utf8 is a synonym of
95     --enable-utf.)
96 ph10 391 .P
97 ph10 857 Of itself, this setting does not make PCRE treat strings as UTF-8 or UTF-16. As
98     well as compiling PCRE with this option, you also have have to set the
99 ph10 903 PCRE_UTF8 or PCRE_UTF16 option when you call one of the pattern compiling
100 ph10 857 functions.
101     .P
102     If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
103 ph10 392 its input to be either ASCII or UTF-8 (depending on the runtime option). It is
104     not possible to support both EBCDIC and UTF-8 codes in the same version of the
105 ph10 857 library. Consequently, --enable-utf and --enable-ebcdic are mutually
106 ph10 391 exclusive.
107 nigel 75 .
108 ph10 456 .
109 nigel 75 .SH "UNICODE CHARACTER PROPERTY SUPPORT"
110 nigel 63 .rs
111     .sp
112 ph10 857 UTF support allows the libraries to process character codepoints up to 0x10ffff
113     in the strings that they handle. On its own, however, it does not provide any
114 nigel 75 facilities for accessing the properties of such characters. If you want to be
115     able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode
116     character properties, you must add
117     .sp
118     --enable-unicode-properties
119     .sp
120 ph10 857 to the \fBconfigure\fP command. This implies UTF support, even if you have
121 nigel 75 not explicitly requested it.
122     .P
123 ph10 128 Including Unicode property support adds around 30K of tables to the PCRE
124     library. Only the general category properties such as \fILu\fP and \fINd\fP are
125     supported. Details are given in the
126 nigel 75 .\" HREF
127     \fBpcrepattern\fP
128     .\"
129     documentation.
130     .
131 ph10 456 .
132 ph10 678 .SH "JUST-IN-TIME COMPILER SUPPORT"
133     .rs
134     .sp
135     Just-in-time compiler support is included in the build by specifying
136     .sp
137     --enable-jit
138     .sp
139 ph10 691 This support is available only for certain hardware architectures. If this
140 ph10 678 option is set for an unsupported architecture, a compile time error occurs.
141 ph10 691 See the
142 ph10 678 .\" HREF
143     \fBpcrejit\fP
144     .\"
145 ph10 685 documentation for a discussion of JIT usage. When JIT support is enabled,
146     pcregrep automatically makes use of it, unless you add
147     .sp
148 ph10 691 --disable-pcregrep-jit
149     .sp
150     to the "configure" command.
151 ph10 678 .
152     .
153 nigel 75 .SH "CODE VALUE OF NEWLINE"
154     .rs
155     .sp
156 ph10 391 By default, PCRE interprets the linefeed (LF) character as indicating the end
157 nigel 91 of a line. This is the normal newline character on Unix-like systems. You can
158 ph10 391 compile PCRE to use carriage return (CR) instead, by adding
159 nigel 75 .sp
160 nigel 63 --enable-newline-is-cr
161 nigel 75 .sp
162 nigel 91 to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option,
163     which explicitly specifies linefeed as the newline character.
164     .sp
165     Alternatively, you can specify that line endings are to be indicated by the two
166     character sequence CRLF. If you want this, add
167     .sp
168     --enable-newline-is-crlf
169     .sp
170 nigel 93 to the \fBconfigure\fP command. There is a fourth option, specified by
171     .sp
172 ph10 149 --enable-newline-is-anycrlf
173     .sp
174     which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
175     indicating a line ending. Finally, a fifth option, specified by
176     .sp
177 nigel 93 --enable-newline-is-any
178     .sp
179 ph10 149 causes PCRE to recognize any Unicode newline sequence.
180 nigel 93 .P
181     Whatever line ending convention is selected when PCRE is built can be
182     overridden when the library functions are called. At build time it is
183     conventional to use the standard for your operating system.
184 nigel 75 .
185 ph10 456 .
186 ph10 231 .SH "WHAT \eR MATCHES"
187     .rs
188     .sp
189     By default, the sequence \eR in a pattern matches any Unicode newline sequence,
190     whatever has been selected as the line ending sequence. If you specify
191     .sp
192     --enable-bsr-anycrlf
193     .sp
194     the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is
195     selected when PCRE is built can be overridden when the library functions are
196     called.
197     .
198 ph10 456 .
199 nigel 75 .SH "POSIX MALLOC USAGE"
200 nigel 63 .rs
201     .sp
202 ph10 857 When the 8-bit library is called through the POSIX interface (see the
203 nigel 75 .\" HREF
204     \fBpcreposix\fP
205     .\"
206 nigel 63 documentation), additional working storage is required for holding the pointers
207 nigel 75 to capturing substrings, because PCRE requires three integers per substring,
208 nigel 63 whereas the POSIX interface provides only two. If the number of expected
209     substrings is small, the wrapper function uses space on the stack, because this
210 nigel 75 is faster than using \fBmalloc()\fP for each call. The default threshold above
211 nigel 63 which the stack is no longer used is 10; it can be changed by adding a setting
212     such as
213 nigel 75 .sp
214 nigel 63 --with-posix-malloc-threshold=20
215 nigel 75 .sp
216     to the \fBconfigure\fP command.
217     .
218 ph10 456 .
219 nigel 75 .SH "HANDLING VERY LARGE PATTERNS"
220 nigel 63 .rs
221     .sp
222     Within a compiled pattern, offset values are used to point from one part to
223     another (for example, from an opening parenthesis to an alternation
224 nigel 75 metacharacter). By default, two-byte values are used for these offsets, leading
225 nigel 63 to a maximum size for a compiled pattern of around 64K. This is sufficient to
226     handle all but the most gigantic patterns. Nevertheless, some people do want to
227 ph10 857 process truly enormous patterns, so it is possible to compile PCRE to use
228 ph10 456 three-byte or four-byte offsets by adding a setting such as
229 nigel 75 .sp
230 nigel 63 --with-link-size=3
231 nigel 75 .sp
232 ph10 857 to the \fBconfigure\fP command. The value given must be 2, 3, or 4. For the
233     16-bit library, a value of 3 is rounded up to 4. Using longer offsets slows
234     down the operation of PCRE because it has to load additional data when handling
235     them.
236 nigel 75 .
237 ph10 456 .
238 nigel 75 .SH "AVOIDING EXCESSIVE STACK USAGE"
239 nigel 73 .rs
240     .sp
241 nigel 77 When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking
242     by making recursive calls to an internal function called \fBmatch()\fP. In
243     environments where the size of the stack is limited, this can severely limit
244     PCRE's operation. (The Unix environment does not usually suffer from this
245 nigel 91 problem, but it may sometimes be necessary to increase the maximum stack size.
246     There is a discussion in the
247     .\" HREF
248     \fBpcrestack\fP
249     .\"
250     documentation.) An alternative approach to recursion that uses memory from the
251     heap to remember data, instead of using recursive function calls, has been
252     implemented to work round the problem of limited stack size. If you want to
253     build a version of PCRE that works this way, add
254 nigel 75 .sp
255 nigel 73 --disable-stack-for-recursion
256 nigel 75 .sp
257     to the \fBconfigure\fP command. With this configuration, PCRE will use the
258     \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
259 ph10 174 management functions. By default these point to \fBmalloc()\fP and
260     \fBfree()\fP, but you can replace the pointers so that your own functions are
261 ph10 456 used instead.
262 ph10 174 .P
263     Separate functions are provided rather than using \fBpcre_malloc\fP and
264     \fBpcre_free\fP because the usage is very predictable: the block sizes
265     requested are always the same, and the blocks are always freed in reverse
266     order. A calling program might be able to implement optimized functions that
267     perform better than \fBmalloc()\fP and \fBfree()\fP. PCRE runs noticeably more
268     slowly when built in this way. This option affects only the \fBpcre_exec()\fP
269 ph10 456 function; it is not relevant for \fBpcre_dfa_exec()\fP.
270 nigel 75 .
271 ph10 456 .
272 nigel 91 .SH "LIMITING PCRE RESOURCE USAGE"
273     .rs
274     .sp
275     Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly
276     (sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP
277     function. By controlling the maximum number of times this function may be
278     called during a single matching operation, a limit can be placed on the
279     resources used by a single call to \fBpcre_exec()\fP. The limit can be changed
280     at run time, as described in the
281     .\" HREF
282     \fBpcreapi\fP
283     .\"
284     documentation. The default is 10 million, but this can be changed by adding a
285     setting such as
286     .sp
287     --with-match-limit=500000
288     .sp
289     to the \fBconfigure\fP command. This setting has no effect on the
290     \fBpcre_dfa_exec()\fP matching function.
291     .P
292     In some environments it is desirable to limit the depth of recursive calls of
293     \fBmatch()\fP more strictly than the total number of calls, in order to
294     restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
295     is specified) that is used. A second limit controls this; it defaults to the
296     value that is set for --with-match-limit, which imposes no additional
297     constraints. However, you can set a lower limit by adding, for example,
298     .sp
299     --with-match-limit-recursion=10000
300     .sp
301     to the \fBconfigure\fP command. This value can also be overridden at run time.
302     .
303 ph10 456 .
304 ph10 128 .SH "CREATING CHARACTER TABLES AT BUILD TIME"
305     .rs
306     .sp
307     PCRE uses fixed tables for processing characters whose code values are less
308     than 256. By default, PCRE is built with a set of tables that are distributed
309     in the file \fIpcre_chartables.c.dist\fP. These tables are for ASCII codes
310     only. If you add
311     .sp
312     --enable-rebuild-chartables
313     .sp
314     to the \fBconfigure\fP command, the distributed tables are no longer used.
315     Instead, a program called \fBdftables\fP is compiled and run. This outputs the
316     source for new set of tables, created in the default locale of your C runtime
317     system. (This method of replacing the tables does not work if you are cross
318     compiling, because \fBdftables\fP is run on the local host. If you need to
319     create alternative tables when cross compiling, you will have to do so "by
320     hand".)
321     .
322 ph10 456 .
323 nigel 75 .SH "USING EBCDIC CODE"
324 nigel 73 .rs
325     .sp
326     PCRE assumes by default that it will run in an environment where the character
327 ph10 195 code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
328     most computer operating systems. PCRE can, however, be compiled to run in an
329     EBCDIC environment by adding
330 nigel 75 .sp
331 nigel 73 --enable-ebcdic
332 nigel 75 .sp
333 ph10 128 to the \fBconfigure\fP command. This setting implies
334 ph10 197 --enable-rebuild-chartables. You should only use it if you know that you are in
335 ph10 392 an EBCDIC environment (for example, an IBM mainframe operating system). The
336 ph10 857 --enable-ebcdic option is incompatible with --enable-utf.
337 nigel 93 .
338 ph10 456 .
339 ph10 286 .SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT"
340     .rs
341     .sp
342     By default, \fBpcregrep\fP reads all files as plain text. You can build it so
343     that it recognizes files whose names end in \fB.gz\fP or \fB.bz2\fP, and reads
344     them with \fBlibz\fP or \fBlibbz2\fP, respectively, by adding one or both of
345     .sp
346     --enable-pcregrep-libz
347     --enable-pcregrep-libbz2
348     .sp
349     to the \fBconfigure\fP command. These options naturally require that the
350     relevant libraries are installed on your system. Configuration will fail if
351     they are not.
352 nigel 93 .
353 ph10 456 .
354 ph10 654 .SH "PCREGREP BUFFER SIZE"
355     .rs
356     .sp
357     \fBpcregrep\fP uses an internal buffer to hold a "window" on the file it is
358     scanning, in order to be able to output "before" and "after" lines when it
359     finds a match. The size of the buffer is controlled by a parameter whose
360     default value is 20K. The buffer itself is three times this size, but because
361     of the way it is used for holding "before" lines, the longest line that is
362     guaranteed to be processable is the parameter size. You can change the default
363     parameter value by adding, for example,
364     .sp
365     --with-pcregrep-bufsize=50K
366     .sp
367     to the \fBconfigure\fP command. The caller of \fPpcregrep\fP can, however,
368     override this value by specifying a run-time option.
369     .
370     .
371 ph10 287 .SH "PCRETEST OPTION FOR LIBREADLINE SUPPORT"
372     .rs
373     .sp
374     If you add
375     .sp
376     --enable-pcretest-libreadline
377     .sp
378 ph10 289 to the \fBconfigure\fP command, \fBpcretest\fP is linked with the
379     \fBlibreadline\fP library, and when its input is from a terminal, it reads it
380     using the \fBreadline()\fP function. This provides line-editing and history
381 ph10 456 facilities. Note that \fBlibreadline\fP is GPL-licensed, so if you distribute a
382 ph10 287 binary of \fBpcretest\fP linked in this way, there may be licensing issues.
383 ph10 338 .P
384     Setting this option causes the \fB-lreadline\fP option to be added to the
385     \fBpcretest\fP build. In many operating environments with a sytem-installed
386     \fBlibreadline\fP this is sufficient. However, in some environments (e.g.
387     if an unmodified distribution version of readline is in use), some extra
388 ph10 345 configuration may be necessary. The INSTALL file for \fBlibreadline\fP says
389 ph10 338 this:
390     .sp
391 ph10 345 "Readline uses the termcap functions, but does not link with the
392     termcap or curses library itself, allowing applications which link
393 ph10 338 with readline the to choose an appropriate library."
394 ph10 345 .sp
395     If your environment has not been set up so that an appropriate library is
396 ph10 338 automatically included, you may need to add something like
397     .sp
398     LIBS="-ncurses"
399     .sp
400 ph10 345 immediately before the \fBconfigure\fP command.
401 ph10 286 .
402 ph10 287 .
403 nigel 93 .SH "SEE ALSO"
404     .rs
405     .sp
406 ph10 857 \fBpcreapi\fP(3), \fBpcre16\fP, \fBpcre_config\fP(3).
407 ph10 99 .
408     .
409     .SH AUTHOR
410     .rs
411     .sp
412     .nf
413     Philip Hazel
414     University Computing Service
415     Cambridge CB2 3QH, England.
416     .fi
417     .
418     .
419     .SH REVISION
420     .rs
421     .sp
422     .nf
423 ph10 857 Last updated: 07 January 2012
424     Copyright (c) 1997-2012 University of Cambridge.
425 ph10 99 .fi

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12