ViewVC logotype

Contents of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log

Revision 150 - (hide annotations) (download)
Tue Apr 17 08:22:40 2007 UTC (8 years ago) by ph10
File MIME type: text/plain
File size: 21646 byte(s)
Update HTML documentation.

1 nigel 73 PCREGREP(1) PCREGREP(1)
2 nigel 49
4 nigel 73 NAME
5     pcregrep - a grep with Perl-compatible regular expressions.
7 nigel 79
8 nigel 49 SYNOPSIS
9 nigel 87 pcregrep [options] [long options] [pattern] [path1 path2 ...]
10 nigel 49
12 nigel 63 DESCRIPTION
13 nigel 49
14 nigel 73 pcregrep searches files for character patterns, in the same way as
15     other grep commands do, but it uses the PCRE regular expression library
16     to support patterns that are compatible with the regular expressions of
17 nigel 93 Perl 5. See pcrepattern(3) for a full description of syntax and seman-
18     tics of the regular expressions that PCRE supports.
19 nigel 49
20 nigel 87 Patterns, whether supplied on the command line or in a separate file,
21     are given without delimiters. For example:
22 nigel 63
23 nigel 87 pcregrep Thursday /etc/motd
25     If you attempt to use delimiters (for example, by surrounding a pattern
26     with slashes, as is common in Perl scripts), they are interpreted as
27     part of the pattern. Quotes can of course be used on the command line
28     because they are interpreted by the shell, and indeed they are required
29     if a pattern contains white space or shell metacharacters.
31     The first argument that follows any option settings is treated as the
32     single pattern to be matched when neither -e nor -f is present. Con-
33     versely, when one or both of these options are used to specify pat-
34     terns, all arguments are treated as path names. At least one of -e, -f,
35     or an argument pattern must be provided.
37 nigel 77 If no files are specified, pcregrep reads the standard input. The stan-
38     dard input can also be referenced by a name consisting of a single
39     hyphen. For example:
40 nigel 49
41 nigel 77 pcregrep some-pattern /file1 - /file3
42 nigel 49
43 nigel 77 By default, each line that matches the pattern is copied to the stan-
44 nigel 87 dard output, and if there is more than one file, the file name is out-
45     put at the start of each line. However, there are options that can
46 nigel 77 change how pcregrep behaves. In particular, the -M option makes it pos-
47 nigel 91 sible to search for patterns that span line boundaries. What defines a
48     line boundary is controlled by the -N (--newline) option.
49 nigel 49
50 nigel 91 Patterns are limited to 8K or BUFSIZ characters, whichever is the
51 nigel 77 greater. BUFSIZ is defined in <stdio.h>.
53 nigel 91 If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
54     the value to set a locale when calling the PCRE library. The --locale
55 nigel 87 option can be used to override this.
56 nigel 77
57 nigel 87
58 nigel 63 OPTIONS
59 nigel 49
60 nigel 91 -- This terminate the list of options. It is useful if the next
61     item on the command line starts with a hyphen but is not an
62     option. This allows for the processing of patterns and file-
63 nigel 87 names that start with hyphens.
64 nigel 63
65 nigel 87 -A number, --after-context=number
66 nigel 91 Output number lines of context after each matching line. If
67 nigel 87 filenames and/or line numbers are being output, a hyphen sep-
68 nigel 91 arator is used instead of a colon for the context lines. A
69     line containing "--" is output between each group of lines,
70     unless they are in fact contiguous in the input file. The
71     value of number is expected to be relatively small. However,
72 nigel 87 pcregrep guarantees to have up to 8K of following text avail-
73     able for context output.
75     -B number, --before-context=number
76 nigel 91 Output number lines of context before each matching line. If
77 nigel 87 filenames and/or line numbers are being output, a hyphen sep-
78 nigel 91 arator is used instead of a colon for the context lines. A
79     line containing "--" is output between each group of lines,
80     unless they are in fact contiguous in the input file. The
81     value of number is expected to be relatively small. However,
82 nigel 77 pcregrep guarantees to have up to 8K of preceding text avail-
83 nigel 87 able for context output.
84 nigel 77
85 nigel 87 -C number, --context=number
86 nigel 91 Output number lines of context both before and after each
87     matching line. This is equivalent to setting both -A and -B
88 nigel 77 to the same value.
90 nigel 87 -c, --count
91 nigel 91 Do not output individual lines; instead just output a count
92 nigel 87 of the number of lines that would otherwise have been output.
93 nigel 91 If several files are given, a count is output for each of
94 nigel 87 them. In this mode, the -A, -B, and -C options are ignored.
95 nigel 49
96 nigel 87 --colour, --color
97     If this option is given without any data, it is equivalent to
98 nigel 91 "--colour=auto". If data is required, it must be given in
99 nigel 87 the same shell item, separated by an equals sign.
101     --colour=value, --color=value
102 nigel 91 This option specifies under what circumstances the part of a
103 nigel 87 line that matched a pattern should be coloured in the output.
104 nigel 91 The value may be "never" (the default), "always", or "auto".
105     In the latter case, colouring happens only if the standard
106     output is connected to a terminal. The colour can be speci-
107     fied by setting the environment variable PCREGREP_COLOUR or
108 nigel 87 PCREGREP_COLOR. The value of this variable should be a string
109 nigel 91 of two numbers, separated by a semicolon. They are copied
110 nigel 87 directly into the control string for setting colour on a ter-
111 nigel 91 minal, so it is your responsibility to ensure that they make
112     sense. If neither of the environment variables is set, the
113 nigel 87 default is "1;31", which gives red.
115     -D action, --devices=action
116 nigel 91 If an input path is not a regular file or a directory,
117     "action" specifies how it is to be processed. Valid values
118     are "read" (the default) or "skip" (silently skip the path).
119 nigel 87
120     -d action, --directories=action
121     If an input path is a directory, "action" specifies how it is
122 nigel 91 to be processed. Valid values are "read" (the default),
123     "recurse" (equivalent to the -r option), or "skip" (silently
124     skip the path). In the default case, directories are read as
125     if they were ordinary files. In some operating systems the
126     effect of reading a directory like this is an immediate end-
127 nigel 87 of-file.
129     -e pattern, --regex=pattern,
130     --regexp=pattern Specify a pattern to be matched. This option
131 nigel 91 can be used multiple times in order to specify several pat-
132     terns. It can also be used as a way of specifying a single
133     pattern that starts with a hyphen. When -e is used, no argu-
134     ment pattern is taken from the command line; all arguments
135 nigel 87 are treated as file names. There is an overall maximum of 100
136     patterns. They are applied to each line in the order in which
137 nigel 91 they are defined until one matches (or fails to match if -v
138     is used). If -f is used with -e, the command line patterns
139     are matched first, followed by the patterns from the file,
140     independent of the order in which these options are speci-
141     fied. Note that multiple use of -e is not the same as a sin-
142     gle pattern with alternatives. For example, X|Y finds the
143     first character in a line that is X or Y, whereas if the two
144     patterns are given separately, pcregrep finds X if it is
145 nigel 87 present, even if it follows Y in the line. It finds Y only if
146 nigel 91 there is no X in the line. This really matters only if you
147 nigel 87 are using -o to show the portion of the line that matched.
149 nigel 77 --exclude=pattern
150     When pcregrep is searching the files in a directory as a con-
151     sequence of the -r (recursive search) option, any files whose
152 nigel 91 names match the pattern are excluded. The pattern is a PCRE
153 nigel 77 regular expression. If a file name matches both --include and
154 nigel 91 --exclude, it is excluded. There is no short form for this
155 nigel 77 option.
157 nigel 87 -F, --fixed-strings
158 nigel 91 Interpret each pattern as a list of fixed strings, separated
159     by newlines, instead of as a regular expression. The -w
160     (match as a word) and -x (match whole line) options can be
161 nigel 87 used with -F. They apply to each of the fixed strings. A line
162     is selected if any of the fixed strings are found in it (sub-
163     ject to -w or -x, if present).
165     -f filename, --file=filename
166 nigel 91 Read a number of patterns from the file, one per line, and
167     match them against each line of input. A data line is output
168 nigel 87 if any of the patterns match it. The filename can be given as
169     "-" to refer to the standard input. When -f is used, patterns
170 nigel 91 specified on the command line using -e may also be present;
171 nigel 87 they are tested before the file's patterns. However, no other
172 nigel 91 pattern is taken from the command line; all arguments are
173     treated as file names. There is an overall maximum of 100
174 nigel 87 patterns. Trailing white space is removed from each line, and
175 nigel 91 blank lines are ignored. An empty file contains no patterns
176 nigel 87 and therefore matches nothing.
177 nigel 53
178 nigel 87 -H, --with-filename
179 nigel 91 Force the inclusion of the filename at the start of output
180     lines when searching a single file. By default, the filename
181     is not shown in this case. For matching lines, the filename
182     is followed by a colon and a space; for context lines, a
183 nigel 87 hyphen separator is used. If a line number is also being out-
184     put, it follows the file name without a space.
185 nigel 49
186 nigel 87 -h, --no-filename
187 nigel 91 Suppress the output filenames when searching multiple files.
188     By default, filenames are shown when multiple files are
189     searched. For matching lines, the filename is followed by a
190     colon and a space; for context lines, a hyphen separator is
191     used. If a line number is also being output, it follows the
192 nigel 87 file name without a space.
193 nigel 49
194 nigel 87 --help Output a brief help message and exit.
196     -i, --ignore-case
197     Ignore upper/lower case distinctions during comparisons.
199 nigel 77 --include=pattern
200     When pcregrep is searching the files in a directory as a con-
201 nigel 91 sequence of the -r (recursive search) option, only those
202 nigel 87 files whose names match the pattern are included. The pattern
203 nigel 91 is a PCRE regular expression. If a file name matches both
204     --include and --exclude, it is excluded. There is no short
205 nigel 77 form for this option.
206 nigel 49
207 nigel 87 -L, --files-without-match
208 nigel 91 Instead of outputting lines from the files, just output the
209     names of the files that do not contain any lines that would
210     have been output. Each file name is output once, on a sepa-
211 nigel 77 rate line.
213 nigel 87 -l, --files-with-matches
214 nigel 91 Instead of outputting lines from the files, just output the
215 nigel 87 names of the files containing lines that would have been out-
216 nigel 91 put. Each file name is output once, on a separate line.
217     Searching stops as soon as a matching line is found in a
218 nigel 87 file.
219 nigel 77
220     --label=name
221     This option supplies a name to be used for the standard input
222 nigel 87 when file names are being output. If not supplied, "(standard
223     input)" is used. There is no short form for this option.
224 nigel 77
225 nigel 87 --locale=locale-name
226 nigel 91 This option specifies a locale to be used for pattern match-
227     ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
228     ronment variables. If no locale is specified, the PCRE
229     library's default (usually the "C" locale) is used. There is
230 nigel 87 no short form for this option.
232     -M, --multiline
233 nigel 91 Allow patterns to match more than one line. When this option
234 nigel 77 is given, patterns may usefully contain literal newline char-
235 nigel 91 acters and internal occurrences of ^ and $ characters. The
236     output for any one match may consist of more than one line.
237     When this option is set, the PCRE library is called in "mul-
238     tiline" mode. There is a limit to the number of lines that
239     can be matched, imposed by the way that pcregrep buffers the
240     input file as it scans it. However, pcregrep ensures that at
241 nigel 77 least 8K characters or the rest of the document (whichever is
242 nigel 91 the shorter) are available for forward matching, and simi-
243 nigel 77 larly the previous 8K characters (or all the previous charac-
244 nigel 91 ters, if fewer than 8K) are guaranteed to be available for
245 nigel 77 lookbehind assertions.
247 nigel 91 -N newline-type, --newline=newline-type
248 ph10 150 The PCRE library supports five different conventions for
249 nigel 93 indicating the ends of lines. They are the single-character
250     sequences CR (carriage return) and LF (linefeed), the two-
251 ph10 150 character sequence CRLF, an "anycrlf" convention, which rec-
252     ognizes any of the preceding three types, and an "any" con-
253     vention, in which any Unicode line ending sequence is assumed
254     to end a line. The Unicode sequences are the three just men-
255     tioned, plus VT (vertical tab, U+000B), FF (formfeed,
256     U+000C), NEL (next line, U+0085), LS (line separator,
257     U+2028), and PS (paragraph separator, U+2029).
258 nigel 91
259 nigel 93 When the PCRE library is built, a default line-ending
260 ph10 150 sequence is specified. This is normally the standard
261 nigel 93 sequence for the operating system. Unless otherwise specified
262 ph10 150 by this option, pcregrep uses the library's default. The
263     possible values for this option are CR, LF, CRLF, ANYCRLF, or
264     ANY. This makes it possible to use pcregrep on files that
265     have come from other environments without having to modify
266     their line endings. If the data that is being scanned does
267     not agree with the convention set by this option, pcregrep
268     may behave in strange ways.
269 nigel 93
270 nigel 87 -n, --line-number
271     Precede each output line by its line number in the file, fol-
272 ph10 150 lowed by a colon and a space for matching lines or a hyphen
273     and a space for context lines. If the filename is also being
274 nigel 87 output, it precedes the line number.
275 nigel 49
276 nigel 87 -o, --only-matching
277 ph10 150 Show only the part of the line that matched a pattern. In
278     this mode, no context is shown. That is, the -A, -B, and -C
279 nigel 87 options are ignored.
281     -q, --quiet
282     Work quietly, that is, display nothing except error messages.
283 ph10 150 The exit status indicates whether or not any matches were
284 nigel 73 found.
285 nigel 49
286 nigel 87 -r, --recursive
287 ph10 150 If any given path is a directory, recursively scan the files
288     it contains, taking note of any --include and --exclude set-
289     tings. By default, a directory is read as a normal file; in
290     some operating systems this gives an immediate end-of-file.
291     This option is a shorthand for setting the -d option to
292 nigel 87 "recurse".
293 nigel 77
294 nigel 87 -s, --no-messages
295 ph10 150 Suppress error messages about non-existent or unreadable
296     files. Such files are quietly skipped. However, the return
297 nigel 77 code is still 2, even if matches were found in other files.
299 nigel 87 -u, --utf-8
300 ph10 150 Operate in UTF-8 mode. This option is available only if PCRE
301     has been compiled with UTF-8 support. Both patterns and sub-
302 nigel 87 ject lines must be valid strings of UTF-8 characters.
303 nigel 63
304 nigel 87 -V, --version
305 ph10 150 Write the version numbers of pcregrep and the PCRE library
306 nigel 77 that is being used to the standard error stream.
307 nigel 49
308 nigel 87 -v, --invert-match
309 ph10 150 Invert the sense of the match, so that lines which do not
310 nigel 87 match any of the patterns are the ones that are found.
311 nigel 77
312 nigel 87 -w, --word-regex, --word-regexp
313     Force the patterns to match only whole words. This is equiva-
314 nigel 77 lent to having \b at the start and end of the pattern.
316 nigel 87 -x, --line-regex, --line-regexp
317 ph10 150 Force the patterns to be anchored (each must start matching
318     at the beginning of a line) and in addition, require them to
319     match entire lines. This is equivalent to having ^ and $
320 nigel 73 characters at the start and end of each alternative branch in
321 nigel 87 every pattern.
322 nigel 49
325 nigel 49
326 ph10 150 The environment variables LC_ALL and LC_CTYPE are examined, in that
327     order, for a locale. The first one that is set is used. This can be
328     overridden by the --locale option. If no locale is set, the PCRE
329 nigel 87 library's default (usually the "C" locale) is used.
330 nigel 49
332 nigel 91 NEWLINES
334 ph10 150 The -N (--newline) option allows pcregrep to scan files with different
335     newline conventions from the default. However, the setting of this
336     option does not affect the way in which pcregrep writes information to
337     the standard error and output streams. It uses the string "\n" in C
338     printf() calls to indicate newlines, relying on the C I/O library to
339     convert this to an appropriate sequence if the output is sent to a
340 nigel 91 file.
344 nigel 49
345 nigel 87 The majority of short and long forms of pcregrep's options are the same
346 ph10 150 as in the GNU grep program. Any long option of the form --xxx-regexp
347     (GNU terminology) is also available as --xxx-regex (PCRE terminology).
348     However, the --locale, -M, --multiline, -u, and --utf-8 options are
349 nigel 87 specific to pcregrep.
352 nigel 77 OPTIONS WITH DATA
353 nigel 49
354 nigel 77 There are four different ways in which an option with data can be spec-
355 ph10 150 ified. If a short form option is used, the data may follow immedi-
356 nigel 77 ately, or in the next command line item. For example:
358     -f/some/file
359     -f /some/file
361 ph10 150 If a long form option is used, the data may appear in the same command
362 nigel 87 line item, separated by an equals character, or (with one exception) it
363     may appear in the next command line item. For example:
364 nigel 77
365     --file=/some/file
366     --file /some/file
368 ph10 150 Note, however, that if you want to supply a file name beginning with ~
369     as data in a shell command, and have the shell expand ~ to a home
370 nigel 87 directory, you must separate the file name from the option, because the
371 ph10 150 shell does not treat ~ specially unless it is at the start of an item.
372 nigel 77
373 ph10 150 The exception to the above is the --colour (or --color) option, for
374     which the data is optional. If this option does have data, it must be
375     given in the first form, using an equals character. Otherwise it will
376 nigel 87 be assumed that it has no data.
381 ph10 150 It is possible to supply a regular expression that takes a very long
382     time to fail to match certain lines. Such patterns normally involve
383     nested indefinite repeats, for example: (a+)*\d when matched against a
384     line of a's with no final digit. The PCRE matching function has a
385     resource limit that causes it to abort in these circumstances. If this
386 nigel 87 happens, pcregrep outputs an error message and the line that caused the
387 ph10 150 problem to the standard error stream. If there are more than 20 such
388 nigel 87 errors, pcregrep gives up.
391 nigel 63 DIAGNOSTICS
392 nigel 49
393 nigel 73 Exit status is 0 if any matches were found, 1 if no matches were found,
394 ph10 150 and 2 for syntax errors and non-existent or inacessible files (even if
395     matches were found in other files) or too many matching errors. Using
396     the -s option to suppress error messages about inaccessble files does
397 nigel 87 not affect the return code.
398 nigel 49
400 nigel 93 SEE ALSO
402     pcrepattern(3), pcretest(1).
405 nigel 49 AUTHOR
406 nigel 63
407 nigel 77 Philip Hazel
408 nigel 73 University Computing Service
409 nigel 93 Cambridge CB2 3QH, England.
410 nigel 49
411 ph10 99
412     REVISION
414 ph10 150 Last updated: 16 April 2007
415 ph10 99 Copyright (c) 1997-2007 University of Cambridge.


Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

ViewVC Help
Powered by ViewVC 1.1.12