/[pcre]/code/trunk/doc/pcregrep.txt
ViewVC logotype

Contents of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1298 - (hide annotations) (download)
Fri Mar 22 16:13:13 2013 UTC (8 weeks, 1 day ago) by ph10
File MIME type: text/plain
File size: 42182 byte(s)
Fix COMMIT in recursion; document backtracking verbs in assertions and 
subroutines.

1 ph10 1298 PCREGREP(1) General Commands Manual PCREGREP(1)
2 nigel 49
3    
4 ph10 1298
5 nigel 73 NAME
6     pcregrep - a grep with Perl-compatible regular expressions.
7    
8 nigel 49 SYNOPSIS
9 nigel 87 pcregrep [options] [long options] [pattern] [path1 path2 ...]
10 nigel 49
11    
12 nigel 63 DESCRIPTION
13 nigel 49
14 nigel 73 pcregrep searches files for character patterns, in the same way as
15     other grep commands do, but it uses the PCRE regular expression library
16     to support patterns that are compatible with the regular expressions of
17 nigel 93 Perl 5. See pcrepattern(3) for a full description of syntax and seman-
18     tics of the regular expressions that PCRE supports.
19 nigel 49
20 nigel 87 Patterns, whether supplied on the command line or in a separate file,
21     are given without delimiters. For example:
22 nigel 63
23 nigel 87 pcregrep Thursday /etc/motd
24    
25     If you attempt to use delimiters (for example, by surrounding a pattern
26     with slashes, as is common in Perl scripts), they are interpreted as
27 ph10 286 part of the pattern. Quotes can of course be used to delimit patterns
28     on the command line because they are interpreted by the shell, and
29 ph10 1194 indeed quotes are required if a pattern contains white space or shell
30 ph10 286 metacharacters.
31 nigel 87
32 ph10 286 The first argument that follows any option settings is treated as the
33     single pattern to be matched when neither -e nor -f is present. Con-
34     versely, when one or both of these options are used to specify pat-
35 nigel 87 terns, all arguments are treated as path names. At least one of -e, -f,
36     or an argument pattern must be provided.
37    
38 nigel 77 If no files are specified, pcregrep reads the standard input. The stan-
39 ph10 286 dard input can also be referenced by a name consisting of a single
40 nigel 77 hyphen. For example:
41 nigel 49
42 nigel 77 pcregrep some-pattern /file1 - /file3
43 nigel 49
44 ph10 286 By default, each line that matches a pattern is copied to the standard
45     output, and if there is more than one file, the file name is output at
46     the start of each line, followed by a colon. However, there are options
47     that can change how pcregrep behaves. In particular, the -M option
48     makes it possible to search for patterns that span line boundaries.
49     What defines a line boundary is controlled by the -N (--newline)
50     option.
51 nigel 49
52 ph10 654 The amount of memory used for buffering files that are being scanned is
53     controlled by a parameter that can be set by the --buffer-size option.
54     The default value for this parameter is specified when pcregrep is
55     built, with the default default being 20K. A block of memory three
56     times this size is used (to allow for buffering "before" and "after"
57     lines). An error occurs if a line overflows the buffer.
58 nigel 77
59 ph10 1194 Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the
60     greater. BUFSIZ is defined in <stdio.h>. When there is more than one
61     pattern (specified by the use of -e and/or -f), each pattern is applied
62     to each line in the order in which they are defined, except that all
63     the -e patterns are tried before the -f patterns.
64 ph10 654
65 ph10 1194 By default, as soon as one pattern matches a line, no further patterns
66     are considered. However, if --colour (or --color) is used to colour the
67     matching substrings, or if --only-matching, --file-offsets, or --line-
68     offsets is used to output only the part of the line that matched
69     (either shown literally, or as an offset), scanning resumes immediately
70     following the match, so that further matches on the same line can be
71     found. If there are multiple patterns, they are all tried on the
72     remainder of the line, but patterns that follow the one that matched
73     are not tried on the earlier part of the line.
74 ph10 286
75 ph10 1194 This behaviour means that the order in which multiple patterns are
76     specified can affect the output when one of the above options is used.
77     This is no longer the same behaviour as GNU grep, which now manages to
78     display earlier matches for later patterns (as long as there is no
79     overlap).
80 ph10 392
81 ph10 654 Patterns that can match an empty string are accepted, but empty string
82 ph10 453 matches are never recognized. An example is the pattern
83 ph10 654 "(super)?(man)?", in which all components are optional. This pattern
84     finds all occurrences of both "super" and "man"; the output differs
85     from matching with "super|man" when only the matching substrings are
86 ph10 453 being shown.
87 ph10 392
88 ph10 654 If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
89     the value to set a locale when calling the PCRE library. The --locale
90 nigel 87 option can be used to override this.
91 nigel 77
92 nigel 87
93 ph10 286 SUPPORT FOR COMPRESSED FILES
94    
95 ph10 654 It is possible to compile pcregrep so that it uses libz or libbz2 to
96     read files whose names end in .gz or .bz2, respectively. You can find
97 ph10 286 out whether your binary has support for one or both of these file types
98     by running it with the --help option. If the appropriate support is not
99 ph10 654 present, files are treated as plain text. The standard input is always
100 ph10 286 so treated.
101    
102    
103 ph10 954 BINARY FILES
104    
105     By default, a file that contains a binary zero byte within the first
106     1024 bytes is identified as a binary file, and is processed specially.
107     (GNU grep also identifies binary files in this manner.) See the
108     --binary-files option for a means of changing the way binary files are
109     handled.
110    
111    
112 nigel 63 OPTIONS
113 nigel 49
114 ph10 654 The order in which some of the options appear can affect the output.
115     For example, both the -h and -l options affect the printing of file
116     names. Whichever comes later in the command line will be the one that
117 ph10 1194 takes effect. Similarly, except where noted below, if an option is
118     given twice, the later setting is used. Numerical values for options
119     may be followed by K or M, to signify multiplication by 1024 or
120     1024*1024 respectively.
121 ph10 429
122 ph10 654 -- This terminates the list of options. It is useful if the next
123 ph10 453 item on the command line starts with a hyphen but is not an
124     option. This allows for the processing of patterns and file-
125 nigel 87 names that start with hyphens.
126 nigel 63
127 nigel 87 -A number, --after-context=number
128 ph10 453 Output number lines of context after each matching line. If
129 nigel 87 filenames and/or line numbers are being output, a hyphen sep-
130 ph10 453 arator is used instead of a colon for the context lines. A
131     line containing "--" is output between each group of lines,
132     unless they are in fact contiguous in the input file. The
133     value of number is expected to be relatively small. However,
134 nigel 87 pcregrep guarantees to have up to 8K of following text avail-
135     able for context output.
136    
137 ph10 954 -a, --text
138     Treat binary files as text. This is equivalent to --binary-
139     files=text.
140    
141 nigel 87 -B number, --before-context=number
142 ph10 954 Output number lines of context before each matching line. If
143 nigel 87 filenames and/or line numbers are being output, a hyphen sep-
144 ph10 954 arator is used instead of a colon for the context lines. A
145     line containing "--" is output between each group of lines,
146     unless they are in fact contiguous in the input file. The
147     value of number is expected to be relatively small. However,
148 nigel 77 pcregrep guarantees to have up to 8K of preceding text avail-
149 nigel 87 able for context output.
150 nigel 77
151 ph10 954 --binary-files=word
152     Specify how binary files are to be processed. If the word is
153     "binary" (the default), pattern matching is performed on
154     binary files, but the only output is "Binary file <name>
155     matches" when a match succeeds. If the word is "text", which
156     is equivalent to the -a or --text option, binary files are
157     processed in the same way as any other file. In this case,
158     when a match succeeds, the output may be binary garbage,
159     which can have nasty effects if sent to a terminal. If the
160     word is "without-match", which is equivalent to the -I
161     option, binary files are not processed at all; they are
162     assumed not to be of interest.
163    
164 ph10 654 --buffer-size=number
165 ph10 954 Set the parameter that controls how much memory is used for
166 ph10 654 buffering files that are being scanned.
167    
168 nigel 87 -C number, --context=number
169 ph10 954 Output number lines of context both before and after each
170     matching line. This is equivalent to setting both -A and -B
171 nigel 77 to the same value.
172    
173 nigel 87 -c, --count
174 ph10 954 Do not output individual lines from the files that are being
175 ph10 429 scanned; instead output the number of lines that would other-
176 ph10 954 wise have been shown. If no lines are selected, the number
177     zero is output. If several files are are being scanned, a
178     count is output for each of them. However, if the --files-
179     with-matches option is also used, only those files whose
180 ph10 429 counts are greater than zero are listed. When -c is used, the
181     -A, -B, and -C options are ignored.
182 nigel 49
183 nigel 87 --colour, --color
184     If this option is given without any data, it is equivalent to
185 ph10 954 "--colour=auto". If data is required, it must be given in
186 nigel 87 the same shell item, separated by an equals sign.
187    
188     --colour=value, --color=value
189 ph10 392 This option specifies under what circumstances the parts of a
190 nigel 87 line that matched a pattern should be coloured in the output.
191 ph10 954 By default, the output is not coloured. The value (which is
192     optional, see above) may be "never", "always", or "auto". In
193     the latter case, colouring happens only if the standard out-
194     put is connected to a terminal. More resources are used when
195     colouring is enabled, because pcregrep has to search for all
196     possible matches in a line, not just one, in order to colour
197 ph10 392 them all.
198 nigel 87
199 ph10 392 The colour that is used can be specified by setting the envi-
200     ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
201     of this variable should be a string of two numbers, separated
202 ph10 954 by a semicolon. They are copied directly into the control
203     string for setting colour on a terminal, so it is your
204     responsibility to ensure that they make sense. If neither of
205     the environment variables is set, the default is "1;31",
206 ph10 392 which gives red.
207    
208 nigel 87 -D action, --devices=action
209 ph10 954 If an input path is not a regular file or a directory,
210     "action" specifies how it is to be processed. Valid values
211 ph10 392 are "read" (the default) or "skip" (silently skip the path).
212 nigel 87
213     -d action, --directories=action
214     If an input path is a directory, "action" specifies how it is
215 ph10 1194 to be processed. Valid values are "read" (the default in
216     non-Windows environments, for compatibility with GNU grep),
217     "recurse" (equivalent to the -r option), or "skip" (silently
218     skip the path, the default in Windows environments). In the
219     "read" case, directories are read as if they were ordinary
220     files. In some operating systems the effect of reading a
221     directory like this is an immediate end-of-file; in others it
222     may provoke an error.
223 nigel 87
224 ph10 286 -e pattern, --regex=pattern, --regexp=pattern
225     Specify a pattern to be matched. This option can be used mul-
226     tiple times in order to specify several patterns. It can also
227 ph10 954 be used as a way of specifying a single pattern that starts
228     with a hyphen. When -e is used, no argument pattern is taken
229     from the command line; all arguments are treated as file
230 ph10 1194 names. There is no limit to the number of patterns. They are
231 ph10 954 applied to each line in the order in which they are defined
232 ph10 1194 until one matches.
233 nigel 87
234 ph10 1194 If -f is used with -e, the command line patterns are matched
235     first, followed by the patterns from the file(s), independent
236     of the order in which these options are specified. Note that
237     multiple use of -e is not the same as a single pattern with
238     alternatives. For example, X|Y finds the first character in a
239     line that is X or Y, whereas if the two patterns are given
240     separately, with X first, pcregrep finds X if it is present,
241     even if it follows Y in the line. It finds Y only if there is
242     no X in the line. This matters only if you are using -o or
243     --colo(u)r to show the part(s) of the line that matched.
244    
245 nigel 77 --exclude=pattern
246 ph10 1194 Files (but not directories) whose names match the pattern are
247     skipped without being processed. This applies to all files,
248     whether listed on the command line, obtained from --file-
249     list, or by scanning a directory. The pattern is a PCRE regu-
250     lar expression, and is matched against the final component of
251     the file name, not the entire path. The -F, -w, and -x
252     options do not apply to this pattern. The option may be given
253     any number of times in order to specify multiple patterns. If
254     a file name matches both an --include and an --exclude pat-
255     tern, it is excluded. There is no short form for this option.
256 nigel 77
257 ph10 1194 --exclude-from=filename
258     Treat each non-empty line of the file as the data for an
259     --exclude option. What constitutes a newline when reading the
260     file is the operating system's default. The --newline option
261     has no effect on this option. This option may be given more
262     than once in order to specify a number of files to read.
263    
264 ph10 572 --exclude-dir=pattern
265 ph10 1194 Directories whose names match the pattern are skipped without
266     being processed, whatever the setting of the --recursive
267     option. This applies to all directories, whether listed on
268     the command line, obtained from --file-list, or by scanning a
269     parent directory. The pattern is a PCRE regular expression,
270     and is matched against the final component of the directory
271     name, not the entire path. The -F, -w, and -x options do not
272     apply to this pattern. The option may be given any number of
273     times in order to specify more than one pattern. If a direc-
274     tory matches both --include-dir and --exclude-dir, it is
275     excluded. There is no short form for this option.
276 ph10 345
277 nigel 87 -F, --fixed-strings
278 ph10 1194 Interpret each data-matching pattern as a list of fixed
279     strings, separated by newlines, instead of as a regular
280     expression. What constitutes a newline for this purpose is
281     controlled by the --newline option. The -w (match as a word)
282     and -x (match whole line) options can be used with -F. They
283     apply to each of the fixed strings. A line is selected if any
284     of the fixed strings are found in it (subject to -w or -x, if
285     present). This option applies only to the patterns that are
286     matched against the contents of files; it does not apply to
287     patterns specified by any of the --include or --exclude
288     options.
289 nigel 87
290     -f filename, --file=filename
291 ph10 1194 Read patterns from the file, one per line, and match them
292     against each line of input. What constitutes a newline when
293     reading the file is the operating system's default. The
294     --newline option has no effect on this option. Trailing white
295     space is removed from each line, and blank lines are ignored.
296     An empty file contains no patterns and therefore matches
297     nothing. See also the comments about multiple patterns versus
298     a single pattern with alternatives in the description of -e
299     above.
300 nigel 53
301 ph10 1194 If this option is given more than once, all the specified
302     files are read. A data line is output if any of the patterns
303     match it. A filename can be given as "-" to refer to the
304     standard input. When -f is used, patterns specified on the
305     command line using -e may also be present; they are tested
306     before the file's patterns. However, no other pattern is
307     taken from the command line; all arguments are treated as the
308     names of paths to be searched.
309    
310 ph10 954 --file-list=filename
311 ph10 1194 Read a list of files and/or directories that are to be
312     scanned from the given file, one per line. Trailing white
313     space is removed from each line, and blank lines are ignored.
314     These paths are processed before any that are listed on the
315     command line. The filename can be given as "-" to refer to
316     the standard input. If --file and --file-list are both spec-
317     ified as "-", patterns are read first. This is useful only
318     when the standard input is a terminal, from which further
319     lines (the list of files) can be read after an end-of-file
320     indication. If this option is given more than once, all the
321     specified files are read.
322 ph10 954
323 ph10 286 --file-offsets
324 ph10 1194 Instead of showing lines or parts of lines that match, show
325     each match as an offset from the start of the file and a
326     length, separated by a comma. In this mode, no context is
327     shown. That is, the -A, -B, and -C options are ignored. If
328 ph10 286 there is more than one match in a line, each of them is shown
329 ph10 1194 separately. This option is mutually exclusive with --line-
330 ph10 286 offsets and --only-matching.
331    
332 nigel 87 -H, --with-filename
333 ph10 1194 Force the inclusion of the filename at the start of output
334     lines when searching a single file. By default, the filename
335     is not shown in this case. For matching lines, the filename
336 ph10 392 is followed by a colon; for context lines, a hyphen separator
337 ph10 1194 is used. If a line number is also being output, it follows
338 ph10 392 the file name.
339 nigel 49
340 nigel 87 -h, --no-filename
341 ph10 1194 Suppress the output filenames when searching multiple files.
342     By default, filenames are shown when multiple files are
343     searched. For matching lines, the filename is followed by a
344     colon; for context lines, a hyphen separator is used. If a
345 ph10 392 line number is also being output, it follows the file name.
346 nigel 49
347 ph10 1194 --help Output a help message, giving brief details of the command
348     options and file type support, and then exit. Anything else
349     on the command line is ignored.
350 nigel 87
351 ph10 954 -I Treat binary files as never matching. This is equivalent to
352     --binary-files=without-match.
353    
354 nigel 87 -i, --ignore-case
355     Ignore upper/lower case distinctions during comparisons.
356    
357 nigel 77 --include=pattern
358 ph10 1194 If any --include patterns are specified, the only files that
359     are processed are those that match one of the patterns (and
360     do not match an --exclude pattern). This option does not
361     affect directories, but it applies to all files, whether
362     listed on the command line, obtained from --file-list, or by
363     scanning a directory. The pattern is a PCRE regular expres-
364     sion, and is matched against the final component of the file
365     name, not the entire path. The -F, -w, and -x options do not
366     apply to this pattern. The option may be given any number of
367     times. If a file name matches both an --include and an
368     --exclude pattern, it is excluded. There is no short form
369     for this option.
370 nigel 49
371 ph10 1194 --include-from=filename
372     Treat each non-empty line of the file as the data for an
373     --include option. What constitutes a newline for this purpose
374     is the operating system's default. The --newline option has
375     no effect on this option. This option may be given any number
376     of times; all the files are read.
377    
378 ph10 572 --include-dir=pattern
379 ph10 1194 If any --include-dir patterns are specified, the only direc-
380     tories that are processed are those that match one of the
381     patterns (and do not match an --exclude-dir pattern). This
382     applies to all directories, whether listed on the command
383     line, obtained from --file-list, or by scanning a parent
384     directory. The pattern is a PCRE regular expression, and is
385     matched against the final component of the directory name,
386     not the entire path. The -F, -w, and -x options do not apply
387     to this pattern. The option may be given any number of times.
388     If a directory matches both --include-dir and --exclude-dir,
389     it is excluded. There is no short form for this option.
390 ph10 345
391 nigel 87 -L, --files-without-match
392 ph10 1194 Instead of outputting lines from the files, just output the
393     names of the files that do not contain any lines that would
394     have been output. Each file name is output once, on a sepa-
395 nigel 77 rate line.
396    
397 nigel 87 -l, --files-with-matches
398 ph10 1194 Instead of outputting lines from the files, just output the
399 nigel 87 names of the files containing lines that would have been out-
400 ph10 1194 put. Each file name is output once, on a separate line.
401     Searching normally stops as soon as a matching line is found
402     in a file. However, if the -c (count) option is also used,
403     matching continues in order to obtain the correct count, and
404     those files that have at least one match are listed along
405 ph10 429 with their counts. Using this option with -c is a way of sup-
406     pressing the listing of files with no matches.
407 nigel 77
408     --label=name
409     This option supplies a name to be used for the standard input
410 nigel 87 when file names are being output. If not supplied, "(standard
411     input)" is used. There is no short form for this option.
412 nigel 77
413 ph10 535 --line-buffered
414 ph10 1194 When this option is given, input is read and processed line
415     by line, and the output is flushed after each write. By
416     default, input is read in large chunks, unless pcregrep can
417     determine that it is reading from a terminal (which is cur-
418     rently possible only in Unix-like environments). Output to
419     terminal is normally automatically flushed by the operating
420     system. This option can be useful when the input or output is
421     attached to a pipe and you do not want pcregrep to buffer up
422     large amounts of data. However, its use will affect perfor-
423 ph10 535 mance, and the -M (multiline) option ceases to work.
424    
425 ph10 286 --line-offsets
426 ph10 1194 Instead of showing lines or parts of lines that match, show
427 ph10 286 each match as a line number, the offset from the start of the
428 ph10 1194 line, and a length. The line number is terminated by a colon
429     (as usual; see the -n option), and the offset and length are
430     separated by a comma. In this mode, no context is shown.
431     That is, the -A, -B, and -C options are ignored. If there is
432     more than one match in a line, each of them is shown sepa-
433 ph10 286 rately. This option is mutually exclusive with --file-offsets
434     and --only-matching.
435    
436 nigel 87 --locale=locale-name
437 ph10 1194 This option specifies a locale to be used for pattern match-
438     ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
439     ronment variables. If no locale is specified, the PCRE
440     library's default (usually the "C" locale) is used. There is
441 nigel 87 no short form for this option.
442    
443 ph10 567 --match-limit=number
444 ph10 1194 Processing some regular expression patterns can require a
445     very large amount of memory, leading in some cases to a pro-
446     gram crash if not enough is available. Other patterns may
447     take a very long time to search for all possible matching
448     strings. The pcre_exec() function that is called by pcregrep
449     to do the matching has two parameters that can limit the
450 ph10 567 resources that it uses.
451    
452 ph10 1194 The --match-limit option provides a means of limiting
453 ph10 567 resource usage when processing patterns that are not going to
454     match, but which have a very large number of possibilities in
455 ph10 1194 their search trees. The classic example is a pattern that
456     uses nested unlimited repeats. Internally, PCRE uses a func-
457     tion called match() which it calls repeatedly (sometimes
458     recursively). The limit set by --match-limit is imposed on
459     the number of times this function is called during a match,
460     which has the effect of limiting the amount of backtracking
461 ph10 567 that can take place.
462    
463     The --recursion-limit option is similar to --match-limit, but
464     instead of limiting the total number of times that match() is
465     called, it limits the depth of recursive calls, which in turn
466 ph10 1194 limits the amount of memory that can be used. The recursion
467     depth is a smaller number than the total number of calls,
468 ph10 567 because not all calls to match() are recursive. This limit is
469     of use only if it is set smaller than --match-limit.
470    
471 ph10 1194 There are no short forms for these options. The default set-
472     tings are specified when the PCRE library is compiled, with
473 ph10 567 the default default being 10 million.
474    
475 nigel 87 -M, --multiline
476 ph10 1194 Allow patterns to match more than one line. When this option
477 nigel 77 is given, patterns may usefully contain literal newline char-
478 ph10 1194 acters and internal occurrences of ^ and $ characters. The
479     output for a successful match may consist of more than one
480     line, the last of which is the one in which the match ended.
481 ph10 589 If the matched string ends with a newline sequence the output
482     ends at the end of that line.
483    
484 ph10 1194 When this option is set, the PCRE library is called in "mul-
485     tiline" mode. There is a limit to the number of lines that
486     can be matched, imposed by the way that pcregrep buffers the
487     input file as it scans it. However, pcregrep ensures that at
488 nigel 77 least 8K characters or the rest of the document (whichever is
489 ph10 1194 the shorter) are available for forward matching, and simi-
490 nigel 77 larly the previous 8K characters (or all the previous charac-
491 ph10 1194 ters, if fewer than 8K) are guaranteed to be available for
492     lookbehind assertions. This option does not work when input
493 ph10 535 is read line by line (see --line-buffered.)
494 nigel 77
495 nigel 91 -N newline-type, --newline=newline-type
496 ph10 1194 The PCRE library supports five different conventions for
497     indicating the ends of lines. They are the single-character
498     sequences CR (carriage return) and LF (linefeed), the two-
499     character sequence CRLF, an "anycrlf" convention, which rec-
500     ognizes any of the preceding three types, and an "any" con-
501 ph10 150 vention, in which any Unicode line ending sequence is assumed
502 ph10 1194 to end a line. The Unicode sequences are the three just men-
503     tioned, plus VT (vertical tab, U+000B), FF (form feed,
504     U+000C), NEL (next line, U+0085), LS (line separator,
505 ph10 150 U+2028), and PS (paragraph separator, U+2029).
506 nigel 91
507 nigel 93 When the PCRE library is built, a default line-ending
508 ph10 1194 sequence is specified. This is normally the standard
509 nigel 93 sequence for the operating system. Unless otherwise specified
510 ph10 1194 by this option, pcregrep uses the library's default. The
511 ph10 150 possible values for this option are CR, LF, CRLF, ANYCRLF, or
512 ph10 1194 ANY. This makes it possible to use pcregrep to scan files
513     that have come from other environments without having to mod-
514     ify their line endings. If the data that is being scanned
515     does not agree with the convention set by this option, pcre-
516     grep may behave in strange ways. Note that this option does
517     not apply to files specified by the -f, --exclude-from, or
518     --include-from options, which are expected to use the operat-
519     ing system's standard newline sequence.
520 nigel 93
521 nigel 87 -n, --line-number
522     Precede each output line by its line number in the file, fol-
523 ph10 654 lowed by a colon for matching lines or a hyphen for context
524     lines. If the filename is also being output, it precedes the
525 ph10 392 line number. This option is forced if --line-offsets is used.
526 nigel 49
527 ph10 691 --no-jit If the PCRE library is built with support for just-in-time
528     compiling (which speeds up matching), pcregrep automatically
529     makes use of this, unless it was explicitly disabled at build
530     time. This option can be used to disable the use of JIT at
531     run time. It is provided for testing and working round prob-
532     lems. It should never be needed in normal use.
533    
534 nigel 87 -o, --only-matching
535 ph10 567 Show only the part of the line that matched a pattern instead
536 ph10 691 of the whole line. In this mode, no context is shown. That
537     is, the -A, -B, and -C options are ignored. If there is more
538     than one match in a line, each of them is shown separately.
539     If -o is combined with -v (invert the sense of the match to
540     find non-matching lines), no output is generated, but the
541     return code is set appropriately. If the matched portion of
542     the line is empty, nothing is output unless the file name or
543     line number are being printed, in which case they are shown
544 ph10 567 on an otherwise empty line. This option is mutually exclusive
545     with --file-offsets and --line-offsets.
546 nigel 87
547 ph10 567 -onumber, --only-matching=number
548 ph10 691 Show only the part of the line that matched the capturing
549 ph10 567 parentheses of the given number. Up to 32 capturing parenthe-
550 ph10 1194 ses are supported, and -o0 is equivalent to -o without a num-
551     ber. Because these options can be given without an argument
552     (see above), if an argument is present, it must be given in
553     the same shell item, for example, -o3 or --only-matching=2.
554     The comments given for the non-argument case above also apply
555     to this case. If the specified capturing parentheses do not
556     exist in the pattern, or were not set in the match, nothing
557     is output unless the file name or line number are being
558     printed.
559 ph10 567
560 ph10 1194 If this option is given multiple times, multiple substrings
561     are output, in the order the options are given. For example,
562     -o3 -o1 -o3 causes the substrings matched by capturing paren-
563     theses 3 and 1 and then 3 again to be output. By default,
564     there is no separator (but see the next option).
565    
566     --om-separator=text
567     Specify a separating string for multiple occurrences of -o.
568     The default is an empty string. Separating strings are never
569     coloured.
570    
571 nigel 87 -q, --quiet
572     Work quietly, that is, display nothing except error messages.
573 ph10 1194 The exit status indicates whether or not any matches were
574 nigel 73 found.
575 nigel 49
576 nigel 87 -r, --recursive
577 ph10 1194 If any given path is a directory, recursively scan the files
578     it contains, taking note of any --include and --exclude set-
579     tings. By default, a directory is read as a normal file; in
580     some operating systems this gives an immediate end-of-file.
581     This option is a shorthand for setting the -d option to
582 nigel 87 "recurse".
583 nigel 77
584 ph10 567 --recursion-limit=number
585     See --match-limit above.
586    
587 nigel 87 -s, --no-messages
588 ph10 1194 Suppress error messages about non-existent or unreadable
589     files. Such files are quietly skipped. However, the return
590 nigel 77 code is still 2, even if matches were found in other files.
591    
592 nigel 87 -u, --utf-8
593 ph10 1194 Operate in UTF-8 mode. This option is available only if PCRE
594     has been compiled with UTF-8 support. All patterns (including
595     those for any --exclude and --include options) and all sub-
596     ject lines that are scanned must be valid strings of UTF-8
597     characters.
598 nigel 63
599 nigel 87 -V, --version
600 ph10 1194 Write the version numbers of pcregrep and the PCRE library to
601     the standard output and then exit. Anything else on the com-
602     mand line is ignored.
603 nigel 49
604 nigel 87 -v, --invert-match
605 ph10 691 Invert the sense of the match, so that lines which do not
606 nigel 87 match any of the patterns are the ones that are found.
607 nigel 77
608 nigel 87 -w, --word-regex, --word-regexp
609     Force the patterns to match only whole words. This is equiva-
610 ph10 1194 lent to having \b at the start and end of the pattern. This
611     option applies only to the patterns that are matched against
612     the contents of files; it does not apply to patterns speci-
613     fied by any of the --include or --exclude options.
614 nigel 77
615 nigel 87 -x, --line-regex, --line-regexp
616 ph10 1194 Force the patterns to be anchored (each must start matching
617     at the beginning of a line) and in addition, require them to
618     match entire lines. This is equivalent to having ^ and $
619 nigel 73 characters at the start and end of each alternative branch in
620 ph10 1194 every pattern. This option applies only to the patterns that
621     are matched against the contents of files; it does not apply
622     to patterns specified by any of the --include or --exclude
623     options.
624 nigel 49
625    
626 nigel 87 ENVIRONMENT VARIABLES
627 nigel 49
628 ph10 691 The environment variables LC_ALL and LC_CTYPE are examined, in that
629     order, for a locale. The first one that is set is used. This can be
630     overridden by the --locale option. If no locale is set, the PCRE
631 nigel 87 library's default (usually the "C" locale) is used.
632 nigel 49
633    
634 nigel 91 NEWLINES
635    
636 ph10 691 The -N (--newline) option allows pcregrep to scan files with different
637 ph10 1194 newline conventions from the default. Any parts of the input files that
638     are written to the standard output are copied identically, with what-
639     ever newline sequences they have in the input. However, the setting of
640     this option does not affect the interpretation of files specified by
641     the -f, --exclude-from, or --include-from options, which are assumed to
642     use the operating system's standard newline sequence, nor does it
643     affect the way in which pcregrep writes informational messages to the
644     standard error and output streams. For these it uses the string "\n" to
645     indicate newlines, relying on the C I/O library to convert this to an
646     appropriate sequence.
647 nigel 91
648    
649 nigel 87 OPTIONS COMPATIBILITY
650 nigel 49
651 ph10 691 Many of the short and long forms of pcregrep's options are the same as
652 ph10 954 in the GNU grep program. Any long option of the form --xxx-regexp (GNU
653     terminology) is also available as --xxx-regex (PCRE terminology). How-
654     ever, the --file-list, --file-offsets, --include-dir, --line-offsets,
655 ph10 1194 --locale, --match-limit, -M, --multiline, -N, --newline, --om-separa-
656     tor, --recursion-limit, -u, and --utf-8 options are specific to pcre-
657     grep, as is the use of the --only-matching option with a capturing
658     parentheses number.
659 nigel 87
660 ph10 1194 Although most of the common options work the same way, a few are dif-
661     ferent in pcregrep. For example, the --include option's argument is a
662     glob for GNU grep, but a regular expression for pcregrep. If both the
663     -c and -l options are given, GNU grep lists only file names, without
664 ph10 572 counts, but pcregrep gives the counts.
665 nigel 87
666 ph10 572
667 nigel 77 OPTIONS WITH DATA
668 nigel 49
669 nigel 77 There are four different ways in which an option with data can be spec-
670 ph10 1194 ified. If a short form option is used, the data may follow immedi-
671 ph10 572 ately, or (with one exception) in the next command line item. For exam-
672     ple:
673 nigel 77
674     -f/some/file
675     -f /some/file
676    
677 ph10 1194 The exception is the -o option, which may appear with or without data.
678     Because of this, if data is present, it must follow immediately in the
679 ph10 572 same item, for example -o3.
680    
681 ph10 1194 If a long form option is used, the data may appear in the same command
682     line item, separated by an equals character, or (with two exceptions)
683 ph10 572 it may appear in the next command line item. For example:
684 nigel 77
685     --file=/some/file
686     --file /some/file
687    
688 ph10 1194 Note, however, that if you want to supply a file name beginning with ~
689     as data in a shell command, and have the shell expand ~ to a home
690 nigel 87 directory, you must separate the file name from the option, because the
691 ph10 392 shell does not treat ~ specially unless it is at the start of an item.
692 nigel 77
693 ph10 1194 The exceptions to the above are the --colour (or --color) and --only-
694     matching options, for which the data is optional. If one of these
695     options does have data, it must be given in the first form, using an
696 ph10 579 equals character. Otherwise pcregrep will assume that it has no data.
697 nigel 87
698    
699     MATCHING ERRORS
700    
701 ph10 1194 It is possible to supply a regular expression that takes a very long
702     time to fail to match certain lines. Such patterns normally involve
703     nested indefinite repeats, for example: (a+)*\d when matched against a
704     line of a's with no final digit. The PCRE matching function has a
705     resource limit that causes it to abort in these circumstances. If this
706 nigel 87 happens, pcregrep outputs an error message and the line that caused the
707 ph10 1194 problem to the standard error stream. If there are more than 20 such
708 nigel 87 errors, pcregrep gives up.
709    
710 ph10 1194 The --match-limit option of pcregrep can be used to set the overall
711     resource limit; there is a second option called --recursion-limit that
712     sets a limit on the amount of memory (usually stack) that is used (see
713 ph10 572 the discussion of these options above).
714 nigel 87
715 ph10 572
716 nigel 63 DIAGNOSTICS
717 nigel 49
718 nigel 73 Exit status is 0 if any matches were found, 1 if no matches were found,
719 ph10 1194 and 2 for syntax errors, overlong lines, non-existent or inaccessible
720     files (even if matches were found in other files) or too many matching
721 ph10 654 errors. Using the -s option to suppress error messages about inaccessi-
722     ble files does not affect the return code.
723 nigel 49
724    
725 nigel 93 SEE ALSO
726    
727 ph10 1194 pcrepattern(3), pcresyntax(3), pcretest(1).
728 nigel 93
729    
730 nigel 49 AUTHOR
731 nigel 63
732 nigel 77 Philip Hazel
733 nigel 73 University Computing Service
734 nigel 93 Cambridge CB2 3QH, England.
735 nigel 49
736 ph10 99
737     REVISION
738    
739 ph10 1194 Last updated: 13 September 2012
740 ph10 954 Copyright (c) 1997-2012 University of Cambridge.

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12