/[pcre]/code/trunk/doc/pcregrep.txt
ViewVC logotype

Contents of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 93 - (hide annotations) (download)
Sat Feb 24 21:41:42 2007 UTC (6 years, 3 months ago) by nigel
File MIME type: text/plain
File size: 21514 byte(s)
Load pcre-7.0 into code/trunk.

1 nigel 73 PCREGREP(1) PCREGREP(1)
2 nigel 49
3    
4 nigel 73 NAME
5     pcregrep - a grep with Perl-compatible regular expressions.
6    
7 nigel 79
8 nigel 49 SYNOPSIS
9 nigel 87 pcregrep [options] [long options] [pattern] [path1 path2 ...]
10 nigel 49
11    
12 nigel 63 DESCRIPTION
13 nigel 49
14 nigel 73 pcregrep searches files for character patterns, in the same way as
15     other grep commands do, but it uses the PCRE regular expression library
16     to support patterns that are compatible with the regular expressions of
17 nigel 93 Perl 5. See pcrepattern(3) for a full description of syntax and seman-
18     tics of the regular expressions that PCRE supports.
19 nigel 49
20 nigel 87 Patterns, whether supplied on the command line or in a separate file,
21     are given without delimiters. For example:
22 nigel 63
23 nigel 87 pcregrep Thursday /etc/motd
24    
25     If you attempt to use delimiters (for example, by surrounding a pattern
26     with slashes, as is common in Perl scripts), they are interpreted as
27     part of the pattern. Quotes can of course be used on the command line
28     because they are interpreted by the shell, and indeed they are required
29     if a pattern contains white space or shell metacharacters.
30    
31     The first argument that follows any option settings is treated as the
32     single pattern to be matched when neither -e nor -f is present. Con-
33     versely, when one or both of these options are used to specify pat-
34     terns, all arguments are treated as path names. At least one of -e, -f,
35     or an argument pattern must be provided.
36    
37 nigel 77 If no files are specified, pcregrep reads the standard input. The stan-
38     dard input can also be referenced by a name consisting of a single
39     hyphen. For example:
40 nigel 49
41 nigel 77 pcregrep some-pattern /file1 - /file3
42 nigel 49
43 nigel 77 By default, each line that matches the pattern is copied to the stan-
44 nigel 87 dard output, and if there is more than one file, the file name is out-
45     put at the start of each line. However, there are options that can
46 nigel 77 change how pcregrep behaves. In particular, the -M option makes it pos-
47 nigel 91 sible to search for patterns that span line boundaries. What defines a
48     line boundary is controlled by the -N (--newline) option.
49 nigel 49
50 nigel 91 Patterns are limited to 8K or BUFSIZ characters, whichever is the
51 nigel 77 greater. BUFSIZ is defined in <stdio.h>.
52    
53 nigel 91 If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
54     the value to set a locale when calling the PCRE library. The --locale
55 nigel 87 option can be used to override this.
56 nigel 77
57 nigel 87
58 nigel 63 OPTIONS
59 nigel 49
60 nigel 91 -- This terminate the list of options. It is useful if the next
61     item on the command line starts with a hyphen but is not an
62     option. This allows for the processing of patterns and file-
63 nigel 87 names that start with hyphens.
64 nigel 63
65 nigel 87 -A number, --after-context=number
66 nigel 91 Output number lines of context after each matching line. If
67 nigel 87 filenames and/or line numbers are being output, a hyphen sep-
68 nigel 91 arator is used instead of a colon for the context lines. A
69     line containing "--" is output between each group of lines,
70     unless they are in fact contiguous in the input file. The
71     value of number is expected to be relatively small. However,
72 nigel 87 pcregrep guarantees to have up to 8K of following text avail-
73     able for context output.
74    
75     -B number, --before-context=number
76 nigel 91 Output number lines of context before each matching line. If
77 nigel 87 filenames and/or line numbers are being output, a hyphen sep-
78 nigel 91 arator is used instead of a colon for the context lines. A
79     line containing "--" is output between each group of lines,
80     unless they are in fact contiguous in the input file. The
81     value of number is expected to be relatively small. However,
82 nigel 77 pcregrep guarantees to have up to 8K of preceding text avail-
83 nigel 87 able for context output.
84 nigel 77
85 nigel 87 -C number, --context=number
86 nigel 91 Output number lines of context both before and after each
87     matching line. This is equivalent to setting both -A and -B
88 nigel 77 to the same value.
89    
90 nigel 87 -c, --count
91 nigel 91 Do not output individual lines; instead just output a count
92 nigel 87 of the number of lines that would otherwise have been output.
93 nigel 91 If several files are given, a count is output for each of
94 nigel 87 them. In this mode, the -A, -B, and -C options are ignored.
95 nigel 49
96 nigel 87 --colour, --color
97     If this option is given without any data, it is equivalent to
98 nigel 91 "--colour=auto". If data is required, it must be given in
99 nigel 87 the same shell item, separated by an equals sign.
100    
101     --colour=value, --color=value
102 nigel 91 This option specifies under what circumstances the part of a
103 nigel 87 line that matched a pattern should be coloured in the output.
104 nigel 91 The value may be "never" (the default), "always", or "auto".
105     In the latter case, colouring happens only if the standard
106     output is connected to a terminal. The colour can be speci-
107     fied by setting the environment variable PCREGREP_COLOUR or
108 nigel 87 PCREGREP_COLOR. The value of this variable should be a string
109 nigel 91 of two numbers, separated by a semicolon. They are copied
110 nigel 87 directly into the control string for setting colour on a ter-
111 nigel 91 minal, so it is your responsibility to ensure that they make
112     sense. If neither of the environment variables is set, the
113 nigel 87 default is "1;31", which gives red.
114    
115     -D action, --devices=action
116 nigel 91 If an input path is not a regular file or a directory,
117     "action" specifies how it is to be processed. Valid values
118     are "read" (the default) or "skip" (silently skip the path).
119 nigel 87
120     -d action, --directories=action
121     If an input path is a directory, "action" specifies how it is
122 nigel 91 to be processed. Valid values are "read" (the default),
123     "recurse" (equivalent to the -r option), or "skip" (silently
124     skip the path). In the default case, directories are read as
125     if they were ordinary files. In some operating systems the
126     effect of reading a directory like this is an immediate end-
127 nigel 87 of-file.
128    
129     -e pattern, --regex=pattern,
130     --regexp=pattern Specify a pattern to be matched. This option
131 nigel 91 can be used multiple times in order to specify several pat-
132     terns. It can also be used as a way of specifying a single
133     pattern that starts with a hyphen. When -e is used, no argu-
134     ment pattern is taken from the command line; all arguments
135 nigel 87 are treated as file names. There is an overall maximum of 100
136     patterns. They are applied to each line in the order in which
137 nigel 91 they are defined until one matches (or fails to match if -v
138     is used). If -f is used with -e, the command line patterns
139     are matched first, followed by the patterns from the file,
140     independent of the order in which these options are speci-
141     fied. Note that multiple use of -e is not the same as a sin-
142     gle pattern with alternatives. For example, X|Y finds the
143     first character in a line that is X or Y, whereas if the two
144     patterns are given separately, pcregrep finds X if it is
145 nigel 87 present, even if it follows Y in the line. It finds Y only if
146 nigel 91 there is no X in the line. This really matters only if you
147 nigel 87 are using -o to show the portion of the line that matched.
148    
149 nigel 77 --exclude=pattern
150     When pcregrep is searching the files in a directory as a con-
151     sequence of the -r (recursive search) option, any files whose
152 nigel 91 names match the pattern are excluded. The pattern is a PCRE
153 nigel 77 regular expression. If a file name matches both --include and
154 nigel 91 --exclude, it is excluded. There is no short form for this
155 nigel 77 option.
156    
157 nigel 87 -F, --fixed-strings
158 nigel 91 Interpret each pattern as a list of fixed strings, separated
159     by newlines, instead of as a regular expression. The -w
160     (match as a word) and -x (match whole line) options can be
161 nigel 87 used with -F. They apply to each of the fixed strings. A line
162     is selected if any of the fixed strings are found in it (sub-
163     ject to -w or -x, if present).
164    
165     -f filename, --file=filename
166 nigel 91 Read a number of patterns from the file, one per line, and
167     match them against each line of input. A data line is output
168 nigel 87 if any of the patterns match it. The filename can be given as
169     "-" to refer to the standard input. When -f is used, patterns
170 nigel 91 specified on the command line using -e may also be present;
171 nigel 87 they are tested before the file's patterns. However, no other
172 nigel 91 pattern is taken from the command line; all arguments are
173     treated as file names. There is an overall maximum of 100
174 nigel 87 patterns. Trailing white space is removed from each line, and
175 nigel 91 blank lines are ignored. An empty file contains no patterns
176 nigel 87 and therefore matches nothing.
177 nigel 53
178 nigel 87 -H, --with-filename
179 nigel 91 Force the inclusion of the filename at the start of output
180     lines when searching a single file. By default, the filename
181     is not shown in this case. For matching lines, the filename
182     is followed by a colon and a space; for context lines, a
183 nigel 87 hyphen separator is used. If a line number is also being out-
184     put, it follows the file name without a space.
185 nigel 49
186 nigel 87 -h, --no-filename
187 nigel 91 Suppress the output filenames when searching multiple files.
188     By default, filenames are shown when multiple files are
189     searched. For matching lines, the filename is followed by a
190     colon and a space; for context lines, a hyphen separator is
191     used. If a line number is also being output, it follows the
192 nigel 87 file name without a space.
193 nigel 49
194 nigel 87 --help Output a brief help message and exit.
195    
196     -i, --ignore-case
197     Ignore upper/lower case distinctions during comparisons.
198    
199 nigel 77 --include=pattern
200     When pcregrep is searching the files in a directory as a con-
201 nigel 91 sequence of the -r (recursive search) option, only those
202 nigel 87 files whose names match the pattern are included. The pattern
203 nigel 91 is a PCRE regular expression. If a file name matches both
204     --include and --exclude, it is excluded. There is no short
205 nigel 77 form for this option.
206 nigel 49
207 nigel 87 -L, --files-without-match
208 nigel 91 Instead of outputting lines from the files, just output the
209     names of the files that do not contain any lines that would
210     have been output. Each file name is output once, on a sepa-
211 nigel 77 rate line.
212    
213 nigel 87 -l, --files-with-matches
214 nigel 91 Instead of outputting lines from the files, just output the
215 nigel 87 names of the files containing lines that would have been out-
216 nigel 91 put. Each file name is output once, on a separate line.
217     Searching stops as soon as a matching line is found in a
218 nigel 87 file.
219 nigel 77
220     --label=name
221     This option supplies a name to be used for the standard input
222 nigel 87 when file names are being output. If not supplied, "(standard
223     input)" is used. There is no short form for this option.
224 nigel 77
225 nigel 87 --locale=locale-name
226 nigel 91 This option specifies a locale to be used for pattern match-
227     ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
228     ronment variables. If no locale is specified, the PCRE
229     library's default (usually the "C" locale) is used. There is
230 nigel 87 no short form for this option.
231    
232     -M, --multiline
233 nigel 91 Allow patterns to match more than one line. When this option
234 nigel 77 is given, patterns may usefully contain literal newline char-
235 nigel 91 acters and internal occurrences of ^ and $ characters. The
236     output for any one match may consist of more than one line.
237     When this option is set, the PCRE library is called in "mul-
238     tiline" mode. There is a limit to the number of lines that
239     can be matched, imposed by the way that pcregrep buffers the
240     input file as it scans it. However, pcregrep ensures that at
241 nigel 77 least 8K characters or the rest of the document (whichever is
242 nigel 91 the shorter) are available for forward matching, and simi-
243 nigel 77 larly the previous 8K characters (or all the previous charac-
244 nigel 91 ters, if fewer than 8K) are guaranteed to be available for
245 nigel 77 lookbehind assertions.
246    
247 nigel 91 -N newline-type, --newline=newline-type
248 nigel 93 The PCRE library supports four different conventions for
249     indicating the ends of lines. They are the single-character
250     sequences CR (carriage return) and LF (linefeed), the two-
251     character sequence CRLF, and an "any" convention, in which
252     any Unicode line ending sequence is assumed to end a line.
253     The Unicode sequences are the three just mentioned, plus VT
254     (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next
255     line, U+0085), LS (line separator, U+2028), and PS (paragraph
256     separator, U+0029).
257 nigel 91
258 nigel 93 When the PCRE library is built, a default line-ending
259     sequence is specified. This is normally the standard
260     sequence for the operating system. Unless otherwise specified
261     by this option, pcregrep uses the library's default. The
262     possible values for this option are CR, LF, CRLF, or ANY.
263     This makes it possible to use pcregrep on files that have
264     come from other environments without having to modify their
265     line endings. If the data that is being scanned does not
266     agree with the convention set by this option, pcregrep may
267     behave in strange ways.
268    
269 nigel 87 -n, --line-number
270     Precede each output line by its line number in the file, fol-
271 nigel 93 lowed by a colon and a space for matching lines or a hyphen
272     and a space for context lines. If the filename is also being
273 nigel 87 output, it precedes the line number.
274 nigel 49
275 nigel 87 -o, --only-matching
276 nigel 93 Show only the part of the line that matched a pattern. In
277     this mode, no context is shown. That is, the -A, -B, and -C
278 nigel 87 options are ignored.
279    
280     -q, --quiet
281     Work quietly, that is, display nothing except error messages.
282 nigel 93 The exit status indicates whether or not any matches were
283 nigel 73 found.
284 nigel 49
285 nigel 87 -r, --recursive
286 nigel 93 If any given path is a directory, recursively scan the files
287     it contains, taking note of any --include and --exclude set-
288     tings. By default, a directory is read as a normal file; in
289     some operating systems this gives an immediate end-of-file.
290     This option is a shorthand for setting the -d option to
291 nigel 87 "recurse".
292 nigel 77
293 nigel 87 -s, --no-messages
294 nigel 93 Suppress error messages about non-existent or unreadable
295     files. Such files are quietly skipped. However, the return
296 nigel 77 code is still 2, even if matches were found in other files.
297    
298 nigel 87 -u, --utf-8
299 nigel 93 Operate in UTF-8 mode. This option is available only if PCRE
300     has been compiled with UTF-8 support. Both patterns and sub-
301 nigel 87 ject lines must be valid strings of UTF-8 characters.
302 nigel 63
303 nigel 87 -V, --version
304 nigel 93 Write the version numbers of pcregrep and the PCRE library
305 nigel 77 that is being used to the standard error stream.
306 nigel 49
307 nigel 87 -v, --invert-match
308 nigel 93 Invert the sense of the match, so that lines which do not
309 nigel 87 match any of the patterns are the ones that are found.
310 nigel 77
311 nigel 87 -w, --word-regex, --word-regexp
312     Force the patterns to match only whole words. This is equiva-
313 nigel 77 lent to having \b at the start and end of the pattern.
314    
315 nigel 87 -x, --line-regex, --line-regexp
316 nigel 93 Force the patterns to be anchored (each must start matching
317     at the beginning of a line) and in addition, require them to
318     match entire lines. This is equivalent to having ^ and $
319 nigel 73 characters at the start and end of each alternative branch in
320 nigel 87 every pattern.
321 nigel 49
322    
323 nigel 87 ENVIRONMENT VARIABLES
324 nigel 49
325 nigel 93 The environment variables LC_ALL and LC_CTYPE are examined, in that
326     order, for a locale. The first one that is set is used. This can be
327     overridden by the --locale option. If no locale is set, the PCRE
328 nigel 87 library's default (usually the "C" locale) is used.
329 nigel 49
330    
331 nigel 91 NEWLINES
332    
333 nigel 93 The -N (--newline) option allows pcregrep to scan files with different
334     newline conventions from the default. However, the setting of this
335     option does not affect the way in which pcregrep writes information to
336     the standard error and output streams. It uses the string "\n" in C
337     printf() calls to indicate newlines, relying on the C I/O library to
338     convert this to an appropriate sequence if the output is sent to a
339 nigel 91 file.
340    
341    
342 nigel 87 OPTIONS COMPATIBILITY
343 nigel 49
344 nigel 87 The majority of short and long forms of pcregrep's options are the same
345 nigel 93 as in the GNU grep program. Any long option of the form --xxx-regexp
346     (GNU terminology) is also available as --xxx-regex (PCRE terminology).
347     However, the --locale, -M, --multiline, -u, and --utf-8 options are
348 nigel 87 specific to pcregrep.
349    
350    
351 nigel 77 OPTIONS WITH DATA
352 nigel 49
353 nigel 77 There are four different ways in which an option with data can be spec-
354 nigel 93 ified. If a short form option is used, the data may follow immedi-
355 nigel 77 ately, or in the next command line item. For example:
356    
357     -f/some/file
358     -f /some/file
359    
360 nigel 93 If a long form option is used, the data may appear in the same command
361 nigel 87 line item, separated by an equals character, or (with one exception) it
362     may appear in the next command line item. For example:
363 nigel 77
364     --file=/some/file
365     --file /some/file
366    
367 nigel 93 Note, however, that if you want to supply a file name beginning with ~
368     as data in a shell command, and have the shell expand ~ to a home
369 nigel 87 directory, you must separate the file name from the option, because the
370 nigel 93 shell does not treat ~ specially unless it is at the start of an item.
371 nigel 77
372 nigel 93 The exception to the above is the --colour (or --color) option, for
373     which the data is optional. If this option does have data, it must be
374     given in the first form, using an equals character. Otherwise it will
375 nigel 87 be assumed that it has no data.
376    
377    
378     MATCHING ERRORS
379    
380 nigel 93 It is possible to supply a regular expression that takes a very long
381     time to fail to match certain lines. Such patterns normally involve
382     nested indefinite repeats, for example: (a+)*\d when matched against a
383     line of a's with no final digit. The PCRE matching function has a
384     resource limit that causes it to abort in these circumstances. If this
385 nigel 87 happens, pcregrep outputs an error message and the line that caused the
386 nigel 93 problem to the standard error stream. If there are more than 20 such
387 nigel 87 errors, pcregrep gives up.
388    
389    
390 nigel 63 DIAGNOSTICS
391 nigel 49
392 nigel 73 Exit status is 0 if any matches were found, 1 if no matches were found,
393 nigel 93 and 2 for syntax errors and non-existent or inacessible files (even if
394     matches were found in other files) or too many matching errors. Using
395     the -s option to suppress error messages about inaccessble files does
396 nigel 87 not affect the return code.
397 nigel 49
398    
399 nigel 93 SEE ALSO
400    
401     pcrepattern(3), pcretest(1).
402    
403    
404 nigel 49 AUTHOR
405 nigel 63
406 nigel 77 Philip Hazel
407 nigel 73 University Computing Service
408 nigel 93 Cambridge CB2 3QH, England.
409 nigel 49
410 nigel 93 Last updated: 29 November 2006
411 nigel 87 Copyright (c) 1997-2006 University of Cambridge.

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12