/[pcre]/code/trunk/doc/pcregrep.txt
ViewVC logotype

Contents of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 93 - (show annotations) (download)
Sat Feb 24 21:41:42 2007 UTC (7 years, 5 months ago) by nigel
File MIME type: text/plain
File size: 21514 byte(s)
Load pcre-7.0 into code/trunk.

1 PCREGREP(1) PCREGREP(1)
2
3
4 NAME
5 pcregrep - a grep with Perl-compatible regular expressions.
6
7
8 SYNOPSIS
9 pcregrep [options] [long options] [pattern] [path1 path2 ...]
10
11
12 DESCRIPTION
13
14 pcregrep searches files for character patterns, in the same way as
15 other grep commands do, but it uses the PCRE regular expression library
16 to support patterns that are compatible with the regular expressions of
17 Perl 5. See pcrepattern(3) for a full description of syntax and seman-
18 tics of the regular expressions that PCRE supports.
19
20 Patterns, whether supplied on the command line or in a separate file,
21 are given without delimiters. For example:
22
23 pcregrep Thursday /etc/motd
24
25 If you attempt to use delimiters (for example, by surrounding a pattern
26 with slashes, as is common in Perl scripts), they are interpreted as
27 part of the pattern. Quotes can of course be used on the command line
28 because they are interpreted by the shell, and indeed they are required
29 if a pattern contains white space or shell metacharacters.
30
31 The first argument that follows any option settings is treated as the
32 single pattern to be matched when neither -e nor -f is present. Con-
33 versely, when one or both of these options are used to specify pat-
34 terns, all arguments are treated as path names. At least one of -e, -f,
35 or an argument pattern must be provided.
36
37 If no files are specified, pcregrep reads the standard input. The stan-
38 dard input can also be referenced by a name consisting of a single
39 hyphen. For example:
40
41 pcregrep some-pattern /file1 - /file3
42
43 By default, each line that matches the pattern is copied to the stan-
44 dard output, and if there is more than one file, the file name is out-
45 put at the start of each line. However, there are options that can
46 change how pcregrep behaves. In particular, the -M option makes it pos-
47 sible to search for patterns that span line boundaries. What defines a
48 line boundary is controlled by the -N (--newline) option.
49
50 Patterns are limited to 8K or BUFSIZ characters, whichever is the
51 greater. BUFSIZ is defined in <stdio.h>.
52
53 If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
54 the value to set a locale when calling the PCRE library. The --locale
55 option can be used to override this.
56
57
58 OPTIONS
59
60 -- This terminate the list of options. It is useful if the next
61 item on the command line starts with a hyphen but is not an
62 option. This allows for the processing of patterns and file-
63 names that start with hyphens.
64
65 -A number, --after-context=number
66 Output number lines of context after each matching line. If
67 filenames and/or line numbers are being output, a hyphen sep-
68 arator is used instead of a colon for the context lines. A
69 line containing "--" is output between each group of lines,
70 unless they are in fact contiguous in the input file. The
71 value of number is expected to be relatively small. However,
72 pcregrep guarantees to have up to 8K of following text avail-
73 able for context output.
74
75 -B number, --before-context=number
76 Output number lines of context before each matching line. If
77 filenames and/or line numbers are being output, a hyphen sep-
78 arator is used instead of a colon for the context lines. A
79 line containing "--" is output between each group of lines,
80 unless they are in fact contiguous in the input file. The
81 value of number is expected to be relatively small. However,
82 pcregrep guarantees to have up to 8K of preceding text avail-
83 able for context output.
84
85 -C number, --context=number
86 Output number lines of context both before and after each
87 matching line. This is equivalent to setting both -A and -B
88 to the same value.
89
90 -c, --count
91 Do not output individual lines; instead just output a count
92 of the number of lines that would otherwise have been output.
93 If several files are given, a count is output for each of
94 them. In this mode, the -A, -B, and -C options are ignored.
95
96 --colour, --color
97 If this option is given without any data, it is equivalent to
98 "--colour=auto". If data is required, it must be given in
99 the same shell item, separated by an equals sign.
100
101 --colour=value, --color=value
102 This option specifies under what circumstances the part of a
103 line that matched a pattern should be coloured in the output.
104 The value may be "never" (the default), "always", or "auto".
105 In the latter case, colouring happens only if the standard
106 output is connected to a terminal. The colour can be speci-
107 fied by setting the environment variable PCREGREP_COLOUR or
108 PCREGREP_COLOR. The value of this variable should be a string
109 of two numbers, separated by a semicolon. They are copied
110 directly into the control string for setting colour on a ter-
111 minal, so it is your responsibility to ensure that they make
112 sense. If neither of the environment variables is set, the
113 default is "1;31", which gives red.
114
115 -D action, --devices=action
116 If an input path is not a regular file or a directory,
117 "action" specifies how it is to be processed. Valid values
118 are "read" (the default) or "skip" (silently skip the path).
119
120 -d action, --directories=action
121 If an input path is a directory, "action" specifies how it is
122 to be processed. Valid values are "read" (the default),
123 "recurse" (equivalent to the -r option), or "skip" (silently
124 skip the path). In the default case, directories are read as
125 if they were ordinary files. In some operating systems the
126 effect of reading a directory like this is an immediate end-
127 of-file.
128
129 -e pattern, --regex=pattern,
130 --regexp=pattern Specify a pattern to be matched. This option
131 can be used multiple times in order to specify several pat-
132 terns. It can also be used as a way of specifying a single
133 pattern that starts with a hyphen. When -e is used, no argu-
134 ment pattern is taken from the command line; all arguments
135 are treated as file names. There is an overall maximum of 100
136 patterns. They are applied to each line in the order in which
137 they are defined until one matches (or fails to match if -v
138 is used). If -f is used with -e, the command line patterns
139 are matched first, followed by the patterns from the file,
140 independent of the order in which these options are speci-
141 fied. Note that multiple use of -e is not the same as a sin-
142 gle pattern with alternatives. For example, X|Y finds the
143 first character in a line that is X or Y, whereas if the two
144 patterns are given separately, pcregrep finds X if it is
145 present, even if it follows Y in the line. It finds Y only if
146 there is no X in the line. This really matters only if you
147 are using -o to show the portion of the line that matched.
148
149 --exclude=pattern
150 When pcregrep is searching the files in a directory as a con-
151 sequence of the -r (recursive search) option, any files whose
152 names match the pattern are excluded. The pattern is a PCRE
153 regular expression. If a file name matches both --include and
154 --exclude, it is excluded. There is no short form for this
155 option.
156
157 -F, --fixed-strings
158 Interpret each pattern as a list of fixed strings, separated
159 by newlines, instead of as a regular expression. The -w
160 (match as a word) and -x (match whole line) options can be
161 used with -F. They apply to each of the fixed strings. A line
162 is selected if any of the fixed strings are found in it (sub-
163 ject to -w or -x, if present).
164
165 -f filename, --file=filename
166 Read a number of patterns from the file, one per line, and
167 match them against each line of input. A data line is output
168 if any of the patterns match it. The filename can be given as
169 "-" to refer to the standard input. When -f is used, patterns
170 specified on the command line using -e may also be present;
171 they are tested before the file's patterns. However, no other
172 pattern is taken from the command line; all arguments are
173 treated as file names. There is an overall maximum of 100
174 patterns. Trailing white space is removed from each line, and
175 blank lines are ignored. An empty file contains no patterns
176 and therefore matches nothing.
177
178 -H, --with-filename
179 Force the inclusion of the filename at the start of output
180 lines when searching a single file. By default, the filename
181 is not shown in this case. For matching lines, the filename
182 is followed by a colon and a space; for context lines, a
183 hyphen separator is used. If a line number is also being out-
184 put, it follows the file name without a space.
185
186 -h, --no-filename
187 Suppress the output filenames when searching multiple files.
188 By default, filenames are shown when multiple files are
189 searched. For matching lines, the filename is followed by a
190 colon and a space; for context lines, a hyphen separator is
191 used. If a line number is also being output, it follows the
192 file name without a space.
193
194 --help Output a brief help message and exit.
195
196 -i, --ignore-case
197 Ignore upper/lower case distinctions during comparisons.
198
199 --include=pattern
200 When pcregrep is searching the files in a directory as a con-
201 sequence of the -r (recursive search) option, only those
202 files whose names match the pattern are included. The pattern
203 is a PCRE regular expression. If a file name matches both
204 --include and --exclude, it is excluded. There is no short
205 form for this option.
206
207 -L, --files-without-match
208 Instead of outputting lines from the files, just output the
209 names of the files that do not contain any lines that would
210 have been output. Each file name is output once, on a sepa-
211 rate line.
212
213 -l, --files-with-matches
214 Instead of outputting lines from the files, just output the
215 names of the files containing lines that would have been out-
216 put. Each file name is output once, on a separate line.
217 Searching stops as soon as a matching line is found in a
218 file.
219
220 --label=name
221 This option supplies a name to be used for the standard input
222 when file names are being output. If not supplied, "(standard
223 input)" is used. There is no short form for this option.
224
225 --locale=locale-name
226 This option specifies a locale to be used for pattern match-
227 ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
228 ronment variables. If no locale is specified, the PCRE
229 library's default (usually the "C" locale) is used. There is
230 no short form for this option.
231
232 -M, --multiline
233 Allow patterns to match more than one line. When this option
234 is given, patterns may usefully contain literal newline char-
235 acters and internal occurrences of ^ and $ characters. The
236 output for any one match may consist of more than one line.
237 When this option is set, the PCRE library is called in "mul-
238 tiline" mode. There is a limit to the number of lines that
239 can be matched, imposed by the way that pcregrep buffers the
240 input file as it scans it. However, pcregrep ensures that at
241 least 8K characters or the rest of the document (whichever is
242 the shorter) are available for forward matching, and simi-
243 larly the previous 8K characters (or all the previous charac-
244 ters, if fewer than 8K) are guaranteed to be available for
245 lookbehind assertions.
246
247 -N newline-type, --newline=newline-type
248 The PCRE library supports four different conventions for
249 indicating the ends of lines. They are the single-character
250 sequences CR (carriage return) and LF (linefeed), the two-
251 character sequence CRLF, and an "any" convention, in which
252 any Unicode line ending sequence is assumed to end a line.
253 The Unicode sequences are the three just mentioned, plus VT
254 (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next
255 line, U+0085), LS (line separator, U+2028), and PS (paragraph
256 separator, U+0029).
257
258 When the PCRE library is built, a default line-ending
259 sequence is specified. This is normally the standard
260 sequence for the operating system. Unless otherwise specified
261 by this option, pcregrep uses the library's default. The
262 possible values for this option are CR, LF, CRLF, or ANY.
263 This makes it possible to use pcregrep on files that have
264 come from other environments without having to modify their
265 line endings. If the data that is being scanned does not
266 agree with the convention set by this option, pcregrep may
267 behave in strange ways.
268
269 -n, --line-number
270 Precede each output line by its line number in the file, fol-
271 lowed by a colon and a space for matching lines or a hyphen
272 and a space for context lines. If the filename is also being
273 output, it precedes the line number.
274
275 -o, --only-matching
276 Show only the part of the line that matched a pattern. In
277 this mode, no context is shown. That is, the -A, -B, and -C
278 options are ignored.
279
280 -q, --quiet
281 Work quietly, that is, display nothing except error messages.
282 The exit status indicates whether or not any matches were
283 found.
284
285 -r, --recursive
286 If any given path is a directory, recursively scan the files
287 it contains, taking note of any --include and --exclude set-
288 tings. By default, a directory is read as a normal file; in
289 some operating systems this gives an immediate end-of-file.
290 This option is a shorthand for setting the -d option to
291 "recurse".
292
293 -s, --no-messages
294 Suppress error messages about non-existent or unreadable
295 files. Such files are quietly skipped. However, the return
296 code is still 2, even if matches were found in other files.
297
298 -u, --utf-8
299 Operate in UTF-8 mode. This option is available only if PCRE
300 has been compiled with UTF-8 support. Both patterns and sub-
301 ject lines must be valid strings of UTF-8 characters.
302
303 -V, --version
304 Write the version numbers of pcregrep and the PCRE library
305 that is being used to the standard error stream.
306
307 -v, --invert-match
308 Invert the sense of the match, so that lines which do not
309 match any of the patterns are the ones that are found.
310
311 -w, --word-regex, --word-regexp
312 Force the patterns to match only whole words. This is equiva-
313 lent to having \b at the start and end of the pattern.
314
315 -x, --line-regex, --line-regexp
316 Force the patterns to be anchored (each must start matching
317 at the beginning of a line) and in addition, require them to
318 match entire lines. This is equivalent to having ^ and $
319 characters at the start and end of each alternative branch in
320 every pattern.
321
322
323 ENVIRONMENT VARIABLES
324
325 The environment variables LC_ALL and LC_CTYPE are examined, in that
326 order, for a locale. The first one that is set is used. This can be
327 overridden by the --locale option. If no locale is set, the PCRE
328 library's default (usually the "C" locale) is used.
329
330
331 NEWLINES
332
333 The -N (--newline) option allows pcregrep to scan files with different
334 newline conventions from the default. However, the setting of this
335 option does not affect the way in which pcregrep writes information to
336 the standard error and output streams. It uses the string "\n" in C
337 printf() calls to indicate newlines, relying on the C I/O library to
338 convert this to an appropriate sequence if the output is sent to a
339 file.
340
341
342 OPTIONS COMPATIBILITY
343
344 The majority of short and long forms of pcregrep's options are the same
345 as in the GNU grep program. Any long option of the form --xxx-regexp
346 (GNU terminology) is also available as --xxx-regex (PCRE terminology).
347 However, the --locale, -M, --multiline, -u, and --utf-8 options are
348 specific to pcregrep.
349
350
351 OPTIONS WITH DATA
352
353 There are four different ways in which an option with data can be spec-
354 ified. If a short form option is used, the data may follow immedi-
355 ately, or in the next command line item. For example:
356
357 -f/some/file
358 -f /some/file
359
360 If a long form option is used, the data may appear in the same command
361 line item, separated by an equals character, or (with one exception) it
362 may appear in the next command line item. For example:
363
364 --file=/some/file
365 --file /some/file
366
367 Note, however, that if you want to supply a file name beginning with ~
368 as data in a shell command, and have the shell expand ~ to a home
369 directory, you must separate the file name from the option, because the
370 shell does not treat ~ specially unless it is at the start of an item.
371
372 The exception to the above is the --colour (or --color) option, for
373 which the data is optional. If this option does have data, it must be
374 given in the first form, using an equals character. Otherwise it will
375 be assumed that it has no data.
376
377
378 MATCHING ERRORS
379
380 It is possible to supply a regular expression that takes a very long
381 time to fail to match certain lines. Such patterns normally involve
382 nested indefinite repeats, for example: (a+)*\d when matched against a
383 line of a's with no final digit. The PCRE matching function has a
384 resource limit that causes it to abort in these circumstances. If this
385 happens, pcregrep outputs an error message and the line that caused the
386 problem to the standard error stream. If there are more than 20 such
387 errors, pcregrep gives up.
388
389
390 DIAGNOSTICS
391
392 Exit status is 0 if any matches were found, 1 if no matches were found,
393 and 2 for syntax errors and non-existent or inacessible files (even if
394 matches were found in other files) or too many matching errors. Using
395 the -s option to suppress error messages about inaccessble files does
396 not affect the return code.
397
398
399 SEE ALSO
400
401 pcrepattern(3), pcretest(1).
402
403
404 AUTHOR
405
406 Philip Hazel
407 University Computing Service
408 Cambridge CB2 3QH, England.
409
410 Last updated: 29 November 2006
411 Copyright (c) 1997-2006 University of Cambridge.

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12