/[pcre]/code/trunk/doc/pcregrep.txt
ViewVC logotype

Contents of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 87 - (hide annotations) (download)
Sat Feb 24 21:41:21 2007 UTC (7 years, 8 months ago) by nigel
File MIME type: text/plain
File size: 19410 byte(s)
Load pcre-6.5 into code/trunk.

1 nigel 73 PCREGREP(1) PCREGREP(1)
2 nigel 49
3    
4 nigel 73 NAME
5     pcregrep - a grep with Perl-compatible regular expressions.
6    
7 nigel 79
8 nigel 49 SYNOPSIS
9 nigel 87 pcregrep [options] [long options] [pattern] [path1 path2 ...]
10 nigel 49
11    
12 nigel 63 DESCRIPTION
13 nigel 49
14 nigel 73 pcregrep searches files for character patterns, in the same way as
15     other grep commands do, but it uses the PCRE regular expression library
16     to support patterns that are compatible with the regular expressions of
17     Perl 5. See pcrepattern for a full description of syntax and semantics
18     of the regular expressions that PCRE supports.
19 nigel 49
20 nigel 87 Patterns, whether supplied on the command line or in a separate file,
21     are given without delimiters. For example:
22 nigel 63
23 nigel 87 pcregrep Thursday /etc/motd
24    
25     If you attempt to use delimiters (for example, by surrounding a pattern
26     with slashes, as is common in Perl scripts), they are interpreted as
27     part of the pattern. Quotes can of course be used on the command line
28     because they are interpreted by the shell, and indeed they are required
29     if a pattern contains white space or shell metacharacters.
30    
31     The first argument that follows any option settings is treated as the
32     single pattern to be matched when neither -e nor -f is present. Con-
33     versely, when one or both of these options are used to specify pat-
34     terns, all arguments are treated as path names. At least one of -e, -f,
35     or an argument pattern must be provided.
36    
37 nigel 77 If no files are specified, pcregrep reads the standard input. The stan-
38     dard input can also be referenced by a name consisting of a single
39     hyphen. For example:
40 nigel 49
41 nigel 77 pcregrep some-pattern /file1 - /file3
42 nigel 49
43 nigel 77 By default, each line that matches the pattern is copied to the stan-
44 nigel 87 dard output, and if there is more than one file, the file name is out-
45     put at the start of each line. However, there are options that can
46 nigel 77 change how pcregrep behaves. In particular, the -M option makes it pos-
47     sible to search for patterns that span line boundaries.
48 nigel 49
49 nigel 77 Patterns are limited to 8K or BUFSIZ characters, whichever is the
50     greater. BUFSIZ is defined in <stdio.h>.
51    
52 nigel 87 If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
53     the value to set a locale when calling the PCRE library. The --locale
54     option can be used to override this.
55 nigel 77
56 nigel 87
57 nigel 63 OPTIONS
58 nigel 49
59 nigel 77 -- This terminate the list of options. It is useful if the next
60 nigel 87 item on the command line starts with a hyphen but is not an
61     option. This allows for the processing of patterns and file-
62     names that start with hyphens.
63 nigel 63
64 nigel 87 -A number, --after-context=number
65     Output number lines of context after each matching line. If
66     filenames and/or line numbers are being output, a hyphen sep-
67     arator is used instead of a colon for the context lines. A
68     line containing "--" is output between each group of lines,
69     unless they are in fact contiguous in the input file. The
70     value of number is expected to be relatively small. However,
71     pcregrep guarantees to have up to 8K of following text avail-
72     able for context output.
73    
74     -B number, --before-context=number
75     Output number lines of context before each matching line. If
76     filenames and/or line numbers are being output, a hyphen sep-
77     arator is used instead of a colon for the context lines. A
78     line containing "--" is output between each group of lines,
79 nigel 77 unless they are in fact contiguous in the input file. The
80     value of number is expected to be relatively small. However,
81     pcregrep guarantees to have up to 8K of preceding text avail-
82 nigel 87 able for context output.
83 nigel 77
84 nigel 87 -C number, --context=number
85     Output number lines of context both before and after each
86     matching line. This is equivalent to setting both -A and -B
87 nigel 77 to the same value.
88    
89 nigel 87 -c, --count
90     Do not output individual lines; instead just output a count
91     of the number of lines that would otherwise have been output.
92     If several files are given, a count is output for each of
93     them. In this mode, the -A, -B, and -C options are ignored.
94 nigel 49
95 nigel 87 --colour, --color
96     If this option is given without any data, it is equivalent to
97     "--colour=auto". If data is required, it must be given in
98     the same shell item, separated by an equals sign.
99    
100     --colour=value, --color=value
101     This option specifies under what circumstances the part of a
102     line that matched a pattern should be coloured in the output.
103     The value may be "never" (the default), "always", or "auto".
104     In the latter case, colouring happens only if the standard
105     output is connected to a terminal. The colour can be speci-
106     fied by setting the environment variable PCREGREP_COLOUR or
107     PCREGREP_COLOR. The value of this variable should be a string
108     of two numbers, separated by a semicolon. They are copied
109     directly into the control string for setting colour on a ter-
110     minal, so it is your responsibility to ensure that they make
111     sense. If neither of the environment variables is set, the
112     default is "1;31", which gives red.
113    
114     -D action, --devices=action
115     If an input path is not a regular file or a directory,
116     "action" specifies how it is to be processed. Valid values
117     are "read" (the default) or "skip" (silently skip the path).
118    
119     -d action, --directories=action
120     If an input path is a directory, "action" specifies how it is
121     to be processed. Valid values are "read" (the default),
122     "recurse" (equivalent to the -r option), or "skip" (silently
123     skip the path). In the default case, directories are read as
124     if they were ordinary files. In some operating systems the
125     effect of reading a directory like this is an immediate end-
126     of-file.
127    
128     -e pattern, --regex=pattern,
129     --regexp=pattern Specify a pattern to be matched. This option
130     can be used multiple times in order to specify several pat-
131     terns. It can also be used as a way of specifying a single
132     pattern that starts with a hyphen. When -e is used, no argu-
133     ment pattern is taken from the command line; all arguments
134     are treated as file names. There is an overall maximum of 100
135     patterns. They are applied to each line in the order in which
136     they are defined until one matches (or fails to match if -v
137     is used). If -f is used with -e, the command line patterns
138     are matched first, followed by the patterns from the file,
139     independent of the order in which these options are speci-
140     fied. Note that multiple use of -e is not the same as a sin-
141     gle pattern with alternatives. For example, X|Y finds the
142     first character in a line that is X or Y, whereas if the two
143     patterns are given separately, pcregrep finds X if it is
144     present, even if it follows Y in the line. It finds Y only if
145     there is no X in the line. This really matters only if you
146     are using -o to show the portion of the line that matched.
147    
148 nigel 77 --exclude=pattern
149     When pcregrep is searching the files in a directory as a con-
150     sequence of the -r (recursive search) option, any files whose
151     names match the pattern are excluded. The pattern is a PCRE
152     regular expression. If a file name matches both --include and
153     --exclude, it is excluded. There is no short form for this
154     option.
155    
156 nigel 87 -F, --fixed-strings
157     Interpret each pattern as a list of fixed strings, separated
158     by newlines, instead of as a regular expression. The -w
159     (match as a word) and -x (match whole line) options can be
160     used with -F. They apply to each of the fixed strings. A line
161     is selected if any of the fixed strings are found in it (sub-
162     ject to -w or -x, if present).
163    
164     -f filename, --file=filename
165     Read a number of patterns from the file, one per line, and
166     match them against each line of input. A data line is output
167     if any of the patterns match it. The filename can be given as
168     "-" to refer to the standard input. When -f is used, patterns
169     specified on the command line using -e may also be present;
170     they are tested before the file's patterns. However, no other
171 nigel 77 pattern is taken from the command line; all arguments are
172 nigel 87 treated as file names. There is an overall maximum of 100
173     patterns. Trailing white space is removed from each line, and
174     blank lines are ignored. An empty file contains no patterns
175     and therefore matches nothing.
176 nigel 53
177 nigel 87 -H, --with-filename
178     Force the inclusion of the filename at the start of output
179     lines when searching a single file. By default, the filename
180     is not shown in this case. For matching lines, the filename
181     is followed by a colon and a space; for context lines, a
182     hyphen separator is used. If a line number is also being out-
183     put, it follows the file name without a space.
184 nigel 49
185 nigel 87 -h, --no-filename
186     Suppress the output filenames when searching multiple files.
187     By default, filenames are shown when multiple files are
188     searched. For matching lines, the filename is followed by a
189     colon and a space; for context lines, a hyphen separator is
190     used. If a line number is also being output, it follows the
191     file name without a space.
192 nigel 49
193 nigel 87 --help Output a brief help message and exit.
194    
195     -i, --ignore-case
196     Ignore upper/lower case distinctions during comparisons.
197    
198 nigel 77 --include=pattern
199     When pcregrep is searching the files in a directory as a con-
200 nigel 87 sequence of the -r (recursive search) option, only those
201     files whose names match the pattern are included. The pattern
202     is a PCRE regular expression. If a file name matches both
203     --include and --exclude, it is excluded. There is no short
204 nigel 77 form for this option.
205 nigel 49
206 nigel 87 -L, --files-without-match
207     Instead of outputting lines from the files, just output the
208     names of the files that do not contain any lines that would
209     have been output. Each file name is output once, on a sepa-
210 nigel 77 rate line.
211    
212 nigel 87 -l, --files-with-matches
213     Instead of outputting lines from the files, just output the
214     names of the files containing lines that would have been out-
215     put. Each file name is output once, on a separate line.
216     Searching stops as soon as a matching line is found in a
217     file.
218 nigel 77
219     --label=name
220     This option supplies a name to be used for the standard input
221 nigel 87 when file names are being output. If not supplied, "(standard
222     input)" is used. There is no short form for this option.
223 nigel 77
224 nigel 87 --locale=locale-name
225     This option specifies a locale to be used for pattern match-
226     ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
227     ronment variables. If no locale is specified, the PCRE
228     library's default (usually the "C" locale) is used. There is
229     no short form for this option.
230    
231     -M, --multiline
232     Allow patterns to match more than one line. When this option
233 nigel 77 is given, patterns may usefully contain literal newline char-
234 nigel 87 acters and internal occurrences of ^ and $ characters. The
235     output for any one match may consist of more than one line.
236     When this option is set, the PCRE library is called in "mul-
237     tiline" mode. There is a limit to the number of lines that
238     can be matched, imposed by the way that pcregrep buffers the
239     input file as it scans it. However, pcregrep ensures that at
240 nigel 77 least 8K characters or the rest of the document (whichever is
241 nigel 87 the shorter) are available for forward matching, and simi-
242 nigel 77 larly the previous 8K characters (or all the previous charac-
243 nigel 87 ters, if fewer than 8K) are guaranteed to be available for
244 nigel 77 lookbehind assertions.
245    
246 nigel 87 -n, --line-number
247     Precede each output line by its line number in the file, fol-
248     lowed by a colon and a space for matching lines or a hyphen
249     and a space for context lines. If the filename is also being
250     output, it precedes the line number.
251 nigel 49
252 nigel 87 -o, --only-matching
253     Show only the part of the line that matched a pattern. In
254     this mode, no context is shown. That is, the -A, -B, and -C
255     options are ignored.
256    
257     -q, --quiet
258     Work quietly, that is, display nothing except error messages.
259 nigel 77 The exit status indicates whether or not any matches were
260 nigel 73 found.
261 nigel 49
262 nigel 87 -r, --recursive
263     If any given path is a directory, recursively scan the files
264 nigel 77 it contains, taking note of any --include and --exclude set-
265 nigel 87 tings. By default, a directory is read as a normal file; in
266     some operating systems this gives an immediate end-of-file.
267     This option is a shorthand for setting the -d option to
268     "recurse".
269 nigel 77
270 nigel 87 -s, --no-messages
271     Suppress error messages about non-existent or unreadable
272     files. Such files are quietly skipped. However, the return
273 nigel 77 code is still 2, even if matches were found in other files.
274    
275 nigel 87 -u, --utf-8
276     Operate in UTF-8 mode. This option is available only if PCRE
277     has been compiled with UTF-8 support. Both patterns and sub-
278     ject lines must be valid strings of UTF-8 characters.
279 nigel 63
280 nigel 87 -V, --version
281     Write the version numbers of pcregrep and the PCRE library
282 nigel 77 that is being used to the standard error stream.
283 nigel 49
284 nigel 87 -v, --invert-match
285     Invert the sense of the match, so that lines which do not
286     match any of the patterns are the ones that are found.
287 nigel 77
288 nigel 87 -w, --word-regex, --word-regexp
289     Force the patterns to match only whole words. This is equiva-
290 nigel 77 lent to having \b at the start and end of the pattern.
291    
292 nigel 87 -x, --line-regex, --line-regexp
293     Force the patterns to be anchored (each must start matching
294     at the beginning of a line) and in addition, require them to
295     match entire lines. This is equivalent to having ^ and $
296 nigel 73 characters at the start and end of each alternative branch in
297 nigel 87 every pattern.
298 nigel 49
299    
300 nigel 87 ENVIRONMENT VARIABLES
301 nigel 49
302 nigel 87 The environment variables LC_ALL and LC_CTYPE are examined, in that
303     order, for a locale. The first one that is set is used. This can be
304     overridden by the --locale option. If no locale is set, the PCRE
305     library's default (usually the "C" locale) is used.
306 nigel 49
307    
308 nigel 87 OPTIONS COMPATIBILITY
309 nigel 49
310 nigel 87 The majority of short and long forms of pcregrep's options are the same
311     as in the GNU grep program. Any long option of the form --xxx-regexp
312     (GNU terminology) is also available as --xxx-regex (PCRE terminology).
313     However, the --locale, -M, --multiline, -u, and --utf-8 options are
314     specific to pcregrep.
315    
316    
317 nigel 77 OPTIONS WITH DATA
318 nigel 49
319 nigel 77 There are four different ways in which an option with data can be spec-
320     ified. If a short form option is used, the data may follow immedi-
321     ately, or in the next command line item. For example:
322    
323     -f/some/file
324     -f /some/file
325    
326     If a long form option is used, the data may appear in the same command
327 nigel 87 line item, separated by an equals character, or (with one exception) it
328     may appear in the next command line item. For example:
329 nigel 77
330     --file=/some/file
331     --file /some/file
332    
333 nigel 87 Note, however, that if you want to supply a file name beginning with ~
334     as data in a shell command, and have the shell expand ~ to a home
335     directory, you must separate the file name from the option, because the
336     shell does not treat ~ specially unless it is at the start of an item.
337 nigel 77
338 nigel 87 The exception to the above is the --colour (or --color) option, for
339     which the data is optional. If this option does have data, it must be
340     given in the first form, using an equals character. Otherwise it will
341     be assumed that it has no data.
342    
343    
344     MATCHING ERRORS
345    
346     It is possible to supply a regular expression that takes a very long
347     time to fail to match certain lines. Such patterns normally involve
348     nested indefinite repeats, for example: (a+)*\d when matched against a
349     line of a's with no final digit. The PCRE matching function has a
350     resource limit that causes it to abort in these circumstances. If this
351     happens, pcregrep outputs an error message and the line that caused the
352     problem to the standard error stream. If there are more than 20 such
353     errors, pcregrep gives up.
354    
355    
356 nigel 63 DIAGNOSTICS
357 nigel 49
358 nigel 73 Exit status is 0 if any matches were found, 1 if no matches were found,
359 nigel 77 and 2 for syntax errors and non-existent or inacessible files (even if
360 nigel 87 matches were found in other files) or too many matching errors. Using
361     the -s option to suppress error messages about inaccessble files does
362     not affect the return code.
363 nigel 49
364    
365     AUTHOR
366 nigel 63
367 nigel 77 Philip Hazel
368 nigel 73 University Computing Service
369     Cambridge CB2 3QG, England.
370 nigel 49
371 nigel 87 Last updated: 23 January 2006
372     Copyright (c) 1997-2006 University of Cambridge.

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12