/[pcre]/code/trunk/doc/html/pcregrep.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcregrep.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 975 - (hide annotations) (download) (as text)
Sat Jun 2 11:03:06 2012 UTC (22 months, 2 weeks ago) by ph10
File MIME type: text/html
File size: 33939 byte(s)
Document update for 8.31-RC1 test release.

1 nigel 63 <html>
2     <head>
3     <title>pcregrep specification</title>
4     </head>
5     <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 nigel 75 <h1>pcregrep man page</h1>
7     <p>
8     Return to the <a href="index.html">PCRE index page</a>.
9     </p>
10 ph10 111 <p>
11 nigel 75 This page is part of the PCRE HTML documentation. It was generated automatically
12     from the original man page. If there is any nonsense in it, please consult the
13     man page, in case the conversion went wrong.
14 ph10 111 <br>
15 nigel 63 <ul>
16     <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
17     <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
18 ph10 286 <li><a name="TOC3" href="#SEC3">SUPPORT FOR COMPRESSED FILES</a>
19 ph10 954 <li><a name="TOC4" href="#SEC4">BINARY FILES</a>
20     <li><a name="TOC5" href="#SEC5">OPTIONS</a>
21     <li><a name="TOC6" href="#SEC6">ENVIRONMENT VARIABLES</a>
22     <li><a name="TOC7" href="#SEC7">NEWLINES</a>
23     <li><a name="TOC8" href="#SEC8">OPTIONS COMPATIBILITY</a>
24     <li><a name="TOC9" href="#SEC9">OPTIONS WITH DATA</a>
25     <li><a name="TOC10" href="#SEC10">MATCHING ERRORS</a>
26     <li><a name="TOC11" href="#SEC11">DIAGNOSTICS</a>
27     <li><a name="TOC12" href="#SEC12">SEE ALSO</a>
28     <li><a name="TOC13" href="#SEC13">AUTHOR</a>
29     <li><a name="TOC14" href="#SEC14">REVISION</a>
30 nigel 63 </ul>
31     <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
32     <P>
33 nigel 87 <b>pcregrep [options] [long options] [pattern] [path1 path2 ...]</b>
34 nigel 63 </P>
35     <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
36     <P>
37     <b>pcregrep</b> searches files for character patterns, in the same way as other
38     grep commands do, but it uses the PCRE regular expression library to support
39     patterns that are compatible with the regular expressions of Perl 5. See
40 nigel 93 <a href="pcrepattern.html"><b>pcrepattern</b>(3)</a>
41     for a full description of syntax and semantics of the regular expressions
42     that PCRE supports.
43 nigel 63 </P>
44     <P>
45 nigel 87 Patterns, whether supplied on the command line or in a separate file, are given
46     without delimiters. For example:
47     <pre>
48     pcregrep Thursday /etc/motd
49     </pre>
50     If you attempt to use delimiters (for example, by surrounding a pattern with
51     slashes, as is common in Perl scripts), they are interpreted as part of the
52 ph10 286 pattern. Quotes can of course be used to delimit patterns on the command line
53     because they are interpreted by the shell, and indeed they are required if a
54     pattern contains white space or shell metacharacters.
55 nigel 63 </P>
56     <P>
57 nigel 87 The first argument that follows any option settings is treated as the single
58     pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
59     Conversely, when one or both of these options are used to specify patterns, all
60     arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
61     argument pattern must be provided.
62     </P>
63     <P>
64 nigel 77 If no files are specified, <b>pcregrep</b> reads the standard input. The
65     standard input can also be referenced by a name consisting of a single hyphen.
66     For example:
67     <pre>
68     pcregrep some-pattern /file1 - /file3
69     </pre>
70 ph10 286 By default, each line that matches a pattern is copied to the standard
71 nigel 87 output, and if there is more than one file, the file name is output at the
72 ph10 286 start of each line, followed by a colon. However, there are options that can
73     change how <b>pcregrep</b> behaves. In particular, the <b>-M</b> option makes it
74     possible to search for patterns that span line boundaries. What defines a line
75     boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
76 nigel 63 </P>
77     <P>
78 ph10 654 The amount of memory used for buffering files that are being scanned is
79     controlled by a parameter that can be set by the <b>--buffer-size</b> option.
80     The default value for this parameter is specified when <b>pcregrep</b> is built,
81     with the default default being 20K. A block of memory three times this size is
82     used (to allow for buffering "before" and "after" lines). An error occurs if a
83     line overflows the buffer.
84 nigel 63 </P>
85 nigel 87 <P>
86 ph10 654 Patterns are limited to 8K or BUFSIZ bytes, whichever is the greater. BUFSIZ is
87     defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern (specified by
88     the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to each line in
89     the order in which they are defined, except that all the <b>-e</b> patterns are
90     tried before the <b>-f</b> patterns.
91     </P>
92     <P>
93 ph10 392 By default, as soon as one pattern matches (or fails to match when <b>-v</b> is
94     used), no further patterns are considered. However, if <b>--colour</b> (or
95     <b>--color</b>) is used to colour the matching substrings, or if
96     <b>--only-matching</b>, <b>--file-offsets</b>, or <b>--line-offsets</b> is used to
97     output only the part of the line that matched (either shown literally, or as an
98     offset), scanning resumes immediately following the match, so that further
99     matches on the same line can be found. If there are multiple patterns, they are
100     all tried on the remainder of the line, but patterns that follow the one that
101     matched are not tried on the earlier part of the line.
102 ph10 286 </P>
103     <P>
104 ph10 392 This is the same behaviour as GNU grep, but it does mean that the order in
105     which multiple patterns are specified can affect the output when one of the
106     above options is used.
107     </P>
108     <P>
109     Patterns that can match an empty string are accepted, but empty string
110 ph10 453 matches are never recognized. An example is the pattern "(super)?(man)?", in
111 ph10 392 which all components are optional. This pattern finds all occurrences of both
112     "super" and "man"; the output differs from matching with "super|man" when only
113     the matching substrings are being shown.
114     </P>
115     <P>
116 nigel 87 If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
117     <b>pcregrep</b> uses the value to set a locale when calling the PCRE library.
118     The <b>--locale</b> option can be used to override this.
119     </P>
120 ph10 286 <br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
121 nigel 63 <P>
122 ph10 286 It is possible to compile <b>pcregrep</b> so that it uses <b>libz</b> or
123     <b>libbz2</b> to read files whose names end in <b>.gz</b> or <b>.bz2</b>,
124     respectively. You can find out whether your binary has support for one or both
125     of these file types by running it with the <b>--help</b> option. If the
126     appropriate support is not present, files are treated as plain text. The
127     standard input is always so treated.
128     </P>
129 ph10 954 <br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
130 ph10 286 <P>
131 ph10 975 By default, a file that contains a binary zero byte within the first 1024 bytes
132 ph10 954 is identified as a binary file, and is processed specially. (GNU grep also
133     identifies binary files in this manner.) See the <b>--binary-files</b> option
134     for a means of changing the way binary files are handled.
135     </P>
136     <br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
137     <P>
138 ph10 461 The order in which some of the options appear can affect the output. For
139     example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
140     names. Whichever comes later in the command line will be the one that takes
141 ph10 654 effect. Numerical values for options may be followed by K or M, to signify
142     multiplication by 1024 or 1024*1024 respectively.
143 ph10 429 </P>
144     <P>
145 nigel 77 <b>--</b>
146 ph10 654 This terminates the list of options. It is useful if the next item on the
147 nigel 87 command line starts with a hyphen but is not an option. This allows for the
148     processing of patterns and filenames that start with hyphens.
149 nigel 63 </P>
150     <P>
151 nigel 87 <b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
152     Output <i>number</i> lines of context after each matching line. If filenames
153     and/or line numbers are being output, a hyphen separator is used instead of a
154     colon for the context lines. A line containing "--" is output between each
155 nigel 77 group of lines, unless they are in fact contiguous in the input file. The value
156     of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
157 nigel 87 guarantees to have up to 8K of following text available for context output.
158 nigel 77 </P>
159     <P>
160 ph10 954 <b>-a</b>, <b>--text</b>
161     Treat binary files as text. This is equivalent to
162     <b>--binary-files</b>=<i>text</i>.
163     </P>
164     <P>
165 nigel 87 <b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
166     Output <i>number</i> lines of context before each matching line. If filenames
167     and/or line numbers are being output, a hyphen separator is used instead of a
168     colon for the context lines. A line containing "--" is output between each
169 nigel 77 group of lines, unless they are in fact contiguous in the input file. The value
170     of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
171 nigel 87 guarantees to have up to 8K of preceding text available for context output.
172 nigel 77 </P>
173     <P>
174 ph10 954 <b>--binary-files=</b><i>word</i>
175 ph10 975 Specify how binary files are to be processed. If the word is "binary" (the
176 ph10 954 default), pattern matching is performed on binary files, but the only output is
177     "Binary file &#60;name&#62; matches" when a match succeeds. If the word is "text",
178     which is equivalent to the <b>-a</b> or <b>--text</b> option, binary files are
179     processed in the same way as any other file. In this case, when a match
180     succeeds, the output may be binary garbage, which can have nasty effects if
181     sent to a terminal. If the word is "without-match", which is equivalent to the
182     <b>-I</b> option, binary files are not processed at all; they are assumed not to
183     be of interest.
184     </P>
185     <P>
186 ph10 654 <b>--buffer-size=</b><i>number</i>
187     Set the parameter that controls how much memory is used for buffering files
188     that are being scanned.
189     </P>
190     <P>
191 nigel 87 <b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
192     Output <i>number</i> lines of context both before and after each matching line.
193 nigel 77 This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
194     </P>
195     <P>
196 nigel 87 <b>-c</b>, <b>--count</b>
197 ph10 429 Do not output individual lines from the files that are being scanned; instead
198     output the number of lines that would otherwise have been shown. If no lines
199     are selected, the number zero is output. If several files are are being
200     scanned, a count is output for each of them. However, if the
201     <b>--files-with-matches</b> option is also used, only those files whose counts
202     are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
203     <b>-B</b>, and <b>-C</b> options are ignored.
204 nigel 63 </P>
205     <P>
206 nigel 87 <b>--colour</b>, <b>--color</b>
207     If this option is given without any data, it is equivalent to "--colour=auto".
208     If data is required, it must be given in the same shell item, separated by an
209     equals sign.
210     </P>
211     <P>
212     <b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
213 ph10 392 This option specifies under what circumstances the parts of a line that matched
214     a pattern should be coloured in the output. By default, the output is not
215     coloured. The value (which is optional, see above) may be "never", "always", or
216     "auto". In the latter case, colouring happens only if the standard output is
217     connected to a terminal. More resources are used when colouring is enabled,
218     because <b>pcregrep</b> has to search for all possible matches in a line, not
219     just one, in order to colour them all.
220 ph10 535 <br>
221     <br>
222 ph10 392 The colour that is used can be specified by setting the environment variable
223     PCREGREP_COLOUR or PCREGREP_COLOR. The value of this variable should be a
224     string of two numbers, separated by a semicolon. They are copied directly into
225     the control string for setting colour on a terminal, so it is your
226     responsibility to ensure that they make sense. If neither of the environment
227     variables is set, the default is "1;31", which gives red.
228     </P>
229     <P>
230 nigel 87 <b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
231     If an input path is not a regular file or a directory, "action" specifies how
232     it is to be processed. Valid values are "read" (the default) or "skip"
233     (silently skip the path).
234     </P>
235     <P>
236     <b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
237     If an input path is a directory, "action" specifies how it is to be processed.
238     Valid values are "read" (the default), "recurse" (equivalent to the <b>-r</b>
239     option), or "skip" (silently skip the path). In the default case, directories
240     are read as if they were ordinary files. In some operating systems the effect
241     of reading a directory like this is an immediate end-of-file.
242     </P>
243     <P>
244 ph10 286 <b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>, <b>--regexp=</b><i>pattern</i>
245     Specify a pattern to be matched. This option can be used multiple times in
246     order to specify several patterns. It can also be used as a way of specifying a
247     single pattern that starts with a hyphen. When <b>-e</b> is used, no argument
248     pattern is taken from the command line; all arguments are treated as file
249     names. There is an overall maximum of 100 patterns. They are applied to each
250     line in the order in which they are defined until one matches (or fails to
251     match if <b>-v</b> is used). If <b>-f</b> is used with <b>-e</b>, the command line
252     patterns are matched first, followed by the patterns from the file, independent
253     of the order in which these options are specified. Note that multiple use of
254     <b>-e</b> is not the same as a single pattern with alternatives. For example,
255     X|Y finds the first character in a line that is X or Y, whereas if the two
256     patterns are given separately, <b>pcregrep</b> finds X if it is present, even if
257     it follows Y in the line. It finds Y only if there is no X in the line. This
258     really matters only if you are using <b>-o</b> to show the part(s) of the line
259     that matched.
260 nigel 87 </P>
261     <P>
262 nigel 77 <b>--exclude</b>=<i>pattern</i>
263     When <b>pcregrep</b> is searching the files in a directory as a consequence of
264 ph10 345 the <b>-r</b> (recursive search) option, any regular files whose names match the
265     pattern are excluded. Subdirectories are not excluded by this option; they are
266 ph10 572 searched recursively, subject to the <b>--exclude-dir</b> and
267 ph10 345 <b>--include_dir</b> options. The pattern is a PCRE regular expression, and is
268     matched against the final component of the file name (not the entire path). If
269     a file name matches both <b>--include</b> and <b>--exclude</b>, it is excluded.
270     There is no short form for this option.
271 nigel 77 </P>
272     <P>
273 ph10 572 <b>--exclude-dir</b>=<i>pattern</i>
274 ph10 345 When <b>pcregrep</b> is searching the contents of a directory as a consequence
275     of the <b>-r</b> (recursive search) option, any subdirectories whose names match
276     the pattern are excluded. (Note that the \fP--exclude\fP option does not affect
277     subdirectories.) The pattern is a PCRE regular expression, and is matched
278     against the final component of the name (not the entire path). If a
279 ph10 572 subdirectory name matches both <b>--include-dir</b> and <b>--exclude-dir</b>, it
280 ph10 345 is excluded. There is no short form for this option.
281     </P>
282     <P>
283 nigel 87 <b>-F</b>, <b>--fixed-strings</b>
284     Interpret each pattern as a list of fixed strings, separated by newlines,
285     instead of as a regular expression. The <b>-w</b> (match as a word) and <b>-x</b>
286     (match whole line) options can be used with <b>-F</b>. They apply to each of the
287     fixed strings. A line is selected if any of the fixed strings are found in it
288     (subject to <b>-w</b> or <b>-x</b>, if present).
289 nigel 63 </P>
290     <P>
291 nigel 87 <b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
292     Read a number of patterns from the file, one per line, and match them against
293     each line of input. A data line is output if any of the patterns match it. The
294     filename can be given as "-" to refer to the standard input. When <b>-f</b> is
295     used, patterns specified on the command line using <b>-e</b> may also be
296     present; they are tested before the file's patterns. However, no other pattern
297 ph10 954 is taken from the command line; all arguments are treated as the names of paths
298     to be searched. There is an overall maximum of 100 patterns. Trailing white
299     space is removed from each line, and blank lines are ignored. An empty file
300     contains no patterns and therefore matches nothing. See also the comments about
301     multiple patterns versus a single pattern with alternatives in the description
302     of <b>-e</b> above.
303 nigel 63 </P>
304     <P>
305 ph10 954 <b>--file-list</b>=<i>filename</i>
306     Read a list of files to be searched from the given file, one per line. Trailing
307     white space is removed from each line, and blank lines are ignored. These files
308     are searched before any others that may be listed on the command line. The
309     filename can be given as "-" to refer to the standard input. If <b>--file</b>
310     and <b>--file-list</b> are both specified as "-", patterns are read first. This
311     is useful only when the standard input is a terminal, from which further lines
312     (the list of files) can be read after an end-of-file indication.
313     </P>
314     <P>
315 ph10 286 <b>--file-offsets</b>
316     Instead of showing lines or parts of lines that match, show each match as an
317     offset from the start of the file and a length, separated by a comma. In this
318     mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b>
319     options are ignored. If there is more than one match in a line, each of them is
320     shown separately. This option is mutually exclusive with <b>--line-offsets</b>
321     and <b>--only-matching</b>.
322     </P>
323     <P>
324 nigel 87 <b>-H</b>, <b>--with-filename</b>
325     Force the inclusion of the filename at the start of output lines when searching
326     a single file. By default, the filename is not shown in this case. For matching
327 ph10 392 lines, the filename is followed by a colon; for context lines, a hyphen
328     separator is used. If a line number is also being output, it follows the file
329     name.
330 nigel 87 </P>
331     <P>
332     <b>-h</b>, <b>--no-filename</b>
333     Suppress the output filenames when searching multiple files. By default,
334     filenames are shown when multiple files are searched. For matching lines, the
335 ph10 392 filename is followed by a colon; for context lines, a hyphen separator is used.
336     If a line number is also being output, it follows the file name.
337 nigel 87 </P>
338     <P>
339     <b>--help</b>
340 ph10 286 Output a help message, giving brief details of the command options and file
341     type support, and then exit.
342 nigel 87 </P>
343     <P>
344 ph10 954 <b>-I</b>
345     Treat binary files as never matching. This is equivalent to
346     <b>--binary-files</b>=<i>without-match</i>.
347     </P>
348     <P>
349 nigel 87 <b>-i</b>, <b>--ignore-case</b>
350 nigel 63 Ignore upper/lower case distinctions during comparisons.
351     </P>
352     <P>
353 nigel 77 <b>--include</b>=<i>pattern</i>
354     When <b>pcregrep</b> is searching the files in a directory as a consequence of
355 ph10 345 the <b>-r</b> (recursive search) option, only those regular files whose names
356     match the pattern are included. Subdirectories are always included and searched
357 ph10 572 recursively, subject to the \fP--include-dir\fP and <b>--exclude-dir</b>
358 ph10 345 options. The pattern is a PCRE regular expression, and is matched against the
359     final component of the file name (not the entire path). If a file name matches
360     both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no short
361     form for this option.
362 nigel 77 </P>
363     <P>
364 ph10 572 <b>--include-dir</b>=<i>pattern</i>
365 ph10 345 When <b>pcregrep</b> is searching the contents of a directory as a consequence
366     of the <b>-r</b> (recursive search) option, only those subdirectories whose
367     names match the pattern are included. (Note that the <b>--include</b> option
368     does not affect subdirectories.) The pattern is a PCRE regular expression, and
369     is matched against the final component of the name (not the entire path). If a
370 ph10 572 subdirectory name matches both <b>--include-dir</b> and <b>--exclude-dir</b>, it
371 ph10 345 is excluded. There is no short form for this option.
372     </P>
373     <P>
374 nigel 87 <b>-L</b>, <b>--files-without-match</b>
375     Instead of outputting lines from the files, just output the names of the files
376     that do not contain any lines that would have been output. Each file name is
377     output once, on a separate line.
378 nigel 77 </P>
379     <P>
380 nigel 87 <b>-l</b>, <b>--files-with-matches</b>
381     Instead of outputting lines from the files, just output the names of the files
382     containing lines that would have been output. Each file name is output
383 ph10 429 once, on a separate line. Searching normally stops as soon as a matching line
384 ph10 461 is found in a file. However, if the <b>-c</b> (count) option is also used,
385     matching continues in order to obtain the correct count, and those files that
386     have at least one match are listed along with their counts. Using this option
387 ph10 429 with <b>-c</b> is a way of suppressing the listing of files with no matches.
388 nigel 63 </P>
389     <P>
390 nigel 77 <b>--label</b>=<i>name</i>
391     This option supplies a name to be used for the standard input when file names
392 nigel 87 are being output. If not supplied, "(standard input)" is used. There is no
393 nigel 77 short form for this option.
394     </P>
395     <P>
396 ph10 535 <b>--line-buffered</b>
397     When this option is given, input is read and processed line by line, and the
398     output is flushed after each write. By default, input is read in large chunks,
399     unless <b>pcregrep</b> can determine that it is reading from a terminal (which
400     is currently possible only in Unix environments). Output to terminal is
401     normally automatically flushed by the operating system. This option can be
402     useful when the input or output is attached to a pipe and you do not want
403     <b>pcregrep</b> to buffer up large amounts of data. However, its use will affect
404     performance, and the <b>-M</b> (multiline) option ceases to work.
405     </P>
406     <P>
407 ph10 286 <b>--line-offsets</b>
408     Instead of showing lines or parts of lines that match, show each match as a
409     line number, the offset from the start of the line, and a length. The line
410     number is terminated by a colon (as usual; see the <b>-n</b> option), and the
411     offset and length are separated by a comma. In this mode, no context is shown.
412     That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are ignored. If there is
413     more than one match in a line, each of them is shown separately. This option is
414     mutually exclusive with <b>--file-offsets</b> and <b>--only-matching</b>.
415     </P>
416     <P>
417 nigel 87 <b>--locale</b>=<i>locale-name</i>
418     This option specifies a locale to be used for pattern matching. It overrides
419     the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
420     locale is specified, the PCRE library's default (usually the "C" locale) is
421     used. There is no short form for this option.
422     </P>
423     <P>
424 ph10 579 <b>--match-limit</b>=<i>number</i>
425 ph10 567 Processing some regular expression patterns can require a very large amount of
426     memory, leading in some cases to a program crash if not enough is available.
427 ph10 579 Other patterns may take a very long time to search for all possible matching
428 ph10 567 strings. The <b>pcre_exec()</b> function that is called by <b>pcregrep</b> to do
429 ph10 579 the matching has two parameters that can limit the resources that it uses.
430 ph10 567 <br>
431     <br>
432     The <b>--match-limit</b> option provides a means of limiting resource usage
433     when processing patterns that are not going to match, but which have a very
434     large number of possibilities in their search trees. The classic example is a
435     pattern that uses nested unlimited repeats. Internally, PCRE uses a function
436     called <b>match()</b> which it calls repeatedly (sometimes recursively). The
437 ph10 583 limit set by <b>--match-limit</b> is imposed on the number of times this
438 ph10 567 function is called during a match, which has the effect of limiting the amount
439     of backtracking that can take place.
440     <br>
441     <br>
442     The <b>--recursion-limit</b> option is similar to <b>--match-limit</b>, but
443     instead of limiting the total number of times that <b>match()</b> is called, it
444     limits the depth of recursive calls, which in turn limits the amount of memory
445     that can be used. The recursion depth is a smaller number than the total number
446     of calls, because not all calls to <b>match()</b> are recursive. This limit is
447     of use only if it is set smaller than <b>--match-limit</b>.
448     <br>
449     <br>
450 ph10 579 There are no short forms for these options. The default settings are specified
451 ph10 567 when the PCRE library is compiled, with the default default being 10 million.
452     </P>
453     <P>
454 nigel 87 <b>-M</b>, <b>--multiline</b>
455 nigel 77 Allow patterns to match more than one line. When this option is given, patterns
456     may usefully contain literal newline characters and internal occurrences of ^
457 ph10 589 and $ characters. The output for a successful match may consist of more than
458     one line, the last of which is the one in which the match ended. If the matched
459     string ends with a newline sequence the output ends at the end of that line.
460     <br>
461     <br>
462     When this option is set, the PCRE library is called in "multiline" mode.
463 nigel 77 There is a limit to the number of lines that can be matched, imposed by the way
464     that <b>pcregrep</b> buffers the input file as it scans it. However,
465     <b>pcregrep</b> ensures that at least 8K characters or the rest of the document
466     (whichever is the shorter) are available for forward matching, and similarly
467     the previous 8K characters (or all the previous characters, if fewer than 8K)
468 ph10 535 are guaranteed to be available for lookbehind assertions. This option does not
469     work when input is read line by line (see \fP--line-buffered\fP.)
470 nigel 77 </P>
471     <P>
472 ph10 567 <b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
473 ph10 150 The PCRE library supports five different conventions for indicating
474 nigel 91 the ends of lines. They are the single-character sequences CR (carriage return)
475 ph10 150 and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
476     which recognizes any of the preceding three types, and an "any" convention, in
477 nigel 93 which any Unicode line ending sequence is assumed to end a line. The Unicode
478     sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
479 ph10 654 (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
480 ph10 150 PS (paragraph separator, U+2029).
481 nigel 93 <br>
482     <br>
483     When the PCRE library is built, a default line-ending sequence is specified.
484     This is normally the standard sequence for the operating system. Unless
485     otherwise specified by this option, <b>pcregrep</b> uses the library's default.
486 ph10 150 The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
487     makes it possible to use <b>pcregrep</b> on files that have come from other
488     environments without having to modify their line endings. If the data that is
489     being scanned does not agree with the convention set by this option,
490     <b>pcregrep</b> may behave in strange ways.
491 nigel 91 </P>
492     <P>
493 nigel 87 <b>-n</b>, <b>--line-number</b>
494     Precede each output line by its line number in the file, followed by a colon
495 ph10 392 for matching lines or a hyphen for context lines. If the filename is also being
496     output, it precedes the line number. This option is forced if
497     <b>--line-offsets</b> is used.
498 nigel 63 </P>
499     <P>
500 ph10 691 <b>--no-jit</b>
501     If the PCRE library is built with support for just-in-time compiling (which
502     speeds up matching), <b>pcregrep</b> automatically makes use of this, unless it
503     was explicitly disabled at build time. This option can be used to disable the
504     use of JIT at run time. It is provided for testing and working round problems.
505     It should never be needed in normal use.
506     </P>
507     <P>
508 nigel 87 <b>-o</b>, <b>--only-matching</b>
509 ph10 567 Show only the part of the line that matched a pattern instead of the whole
510     line. In this mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and
511     <b>-C</b> options are ignored. If there is more than one match in a line, each
512     of them is shown separately. If <b>-o</b> is combined with <b>-v</b> (invert the
513     sense of the match to find non-matching lines), no output is generated, but the
514     return code is set appropriately. If the matched portion of the line is empty,
515     nothing is output unless the file name or line number are being printed, in
516     which case they are shown on an otherwise empty line. This option is mutually
517     exclusive with <b>--file-offsets</b> and <b>--line-offsets</b>.
518 nigel 77 </P>
519     <P>
520 ph10 567 <b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
521 ph10 579 Show only the part of the line that matched the capturing parentheses of the
522 ph10 567 given number. Up to 32 capturing parentheses are supported. Because these
523     options can be given without an argument (see above), if an argument is
524     present, it must be given in the same shell item, for example, -o3 or
525 ph10 579 --only-matching=2. The comments given for the non-argument case above also
526     apply to this case. If the specified capturing parentheses do not exist in the
527     pattern, or were not set in the match, nothing is output unless the file name
528 ph10 567 or line number are being printed.
529     </P>
530     <P>
531 nigel 87 <b>-q</b>, <b>--quiet</b>
532     Work quietly, that is, display nothing except error messages. The exit
533     status indicates whether or not any matches were found.
534     </P>
535     <P>
536     <b>-r</b>, <b>--recursive</b>
537 nigel 77 If any given path is a directory, recursively scan the files it contains,
538 nigel 87 taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
539     directory is read as a normal file; in some operating systems this gives an
540     immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
541     option to "recurse".
542 nigel 63 </P>
543     <P>
544 ph10 567 <b>--recursion-limit</b>=<i>number</i>
545     See <b>--match-limit</b> above.
546     </P>
547     <P>
548 nigel 87 <b>-s</b>, <b>--no-messages</b>
549 nigel 77 Suppress error messages about non-existent or unreadable files. Such files are
550     quietly skipped. However, the return code is still 2, even if matches were
551     found in other files.
552 nigel 63 </P>
553     <P>
554 nigel 87 <b>-u</b>, <b>--utf-8</b>
555 nigel 63 Operate in UTF-8 mode. This option is available only if PCRE has been compiled
556 nigel 87 with UTF-8 support. Both patterns and subject lines must be valid strings of
557     UTF-8 characters.
558 nigel 63 </P>
559     <P>
560 nigel 87 <b>-V</b>, <b>--version</b>
561 nigel 77 Write the version numbers of <b>pcregrep</b> and the PCRE library that is being
562     used to the standard error stream.
563     </P>
564     <P>
565 nigel 87 <b>-v</b>, <b>--invert-match</b>
566     Invert the sense of the match, so that lines which do <i>not</i> match any of
567     the patterns are the ones that are found.
568 nigel 63 </P>
569     <P>
570 nigel 87 <b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
571     Force the patterns to match only whole words. This is equivalent to having \b
572 nigel 77 at the start and end of the pattern.
573     </P>
574     <P>
575 ph10 148 <b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
576 nigel 87 Force the patterns to be anchored (each must start matching at the beginning of
577     a line) and in addition, require them to match entire lines. This is
578 nigel 63 equivalent to having ^ and $ characters at the start and end of each
579 nigel 87 alternative branch in every pattern.
580 nigel 63 </P>
581 ph10 954 <br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
582 nigel 63 <P>
583 nigel 87 The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
584     order, for a locale. The first one that is set is used. This can be overridden
585     by the <b>--locale</b> option. If no locale is set, the PCRE library's default
586     (usually the "C" locale) is used.
587 nigel 77 </P>
588 ph10 954 <br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
589 nigel 77 <P>
590 nigel 91 The <b>-N</b> (<b>--newline</b>) option allows <b>pcregrep</b> to scan files with
591     different newline conventions from the default. However, the setting of this
592     option does not affect the way in which <b>pcregrep</b> writes information to
593     the standard error and output streams. It uses the string "\n" in C
594     <b>printf()</b> calls to indicate newlines, relying on the C I/O library to
595     convert this to an appropriate sequence if the output is sent to a file.
596     </P>
597 ph10 954 <br><a name="SEC8" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
598 nigel 91 <P>
599 ph10 572 Many of the short and long forms of <b>pcregrep</b>'s options are the same
600 ph10 954 as in the GNU <b>grep</b> program. Any long option of the form
601 nigel 87 <b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
602 ph10 954 (PCRE terminology). However, the <b>--file-list</b>, <b>--file-offsets</b>,
603     <b>--include-dir</b>, <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>,
604     <b>-M</b>, <b>--multiline</b>, <b>-N</b>, <b>--newline</b>,
605     <b>--recursion-limit</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
606     <b>pcregrep</b>, as is the use of the <b>--only-matching</b> option with a
607     capturing parentheses number.
608 ph10 572 </P>
609     <P>
610     Although most of the common options work the same way, a few are different in
611     <b>pcregrep</b>. For example, the <b>--include</b> option's argument is a glob
612     for GNU <b>grep</b>, but a regular expression for <b>pcregrep</b>. If both the
613 ph10 461 <b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
614 ph10 429 without counts, but <b>pcregrep</b> gives the counts.
615 nigel 87 </P>
616 ph10 954 <br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
617 nigel 87 <P>
618 nigel 77 There are four different ways in which an option with data can be specified.
619 ph10 572 If a short form option is used, the data may follow immediately, or (with one
620     exception) in the next command line item. For example:
621 nigel 77 <pre>
622     -f/some/file
623     -f /some/file
624 nigel 75 </pre>
625 ph10 579 The exception is the <b>-o</b> option, which may appear with or without data.
626     Because of this, if data is present, it must follow immediately in the same
627 ph10 572 item, for example -o3.
628     </P>
629     <P>
630 nigel 77 If a long form option is used, the data may appear in the same command line
631 ph10 572 item, separated by an equals character, or (with two exceptions) it may appear
632 nigel 87 in the next command line item. For example:
633 nigel 77 <pre>
634     --file=/some/file
635     --file /some/file
636 nigel 87 </pre>
637     Note, however, that if you want to supply a file name beginning with ~ as data
638     in a shell command, and have the shell expand ~ to a home directory, you must
639     separate the file name from the option, because the shell does not treat ~
640     specially unless it is at the start of an item.
641 nigel 63 </P>
642     <P>
643 ph10 572 The exceptions to the above are the <b>--colour</b> (or <b>--color</b>) and
644     <b>--only-matching</b> options, for which the data is optional. If one of these
645     options does have data, it must be given in the first form, using an equals
646 ph10 579 character. Otherwise <b>pcregrep</b> will assume that it has no data.
647 nigel 87 </P>
648 ph10 954 <br><a name="SEC10" href="#TOC1">MATCHING ERRORS</a><br>
649 nigel 87 <P>
650     It is possible to supply a regular expression that takes a very long time to
651     fail to match certain lines. Such patterns normally involve nested indefinite
652     repeats, for example: (a+)*\d when matched against a line of a's with no final
653     digit. The PCRE matching function has a resource limit that causes it to abort
654     in these circumstances. If this happens, <b>pcregrep</b> outputs an error
655     message and the line that caused the problem to the standard error stream. If
656     there are more than 20 such errors, <b>pcregrep</b> gives up.
657     </P>
658 ph10 572 <P>
659     The <b>--match-limit</b> option of <b>pcregrep</b> can be used to set the overall
660     resource limit; there is a second option called <b>--recursion-limit</b> that
661 ph10 579 sets a limit on the amount of memory (usually stack) that is used (see the
662 ph10 572 discussion of these options above).
663     </P>
664 ph10 954 <br><a name="SEC11" href="#TOC1">DIAGNOSTICS</a><br>
665 nigel 87 <P>
666 nigel 63 Exit status is 0 if any matches were found, 1 if no matches were found, and 2
667 ph10 654 for syntax errors, overlong lines, non-existent or inaccessible files (even if
668     matches were found in other files) or too many matching errors. Using the
669     <b>-s</b> option to suppress error messages about inaccessible files does not
670     affect the return code.
671 nigel 63 </P>
672 ph10 954 <br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
673 nigel 63 <P>
674 nigel 93 <b>pcrepattern</b>(3), <b>pcretest</b>(1).
675     </P>
676 ph10 954 <br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
677 nigel 93 <P>
678 nigel 77 Philip Hazel
679 nigel 63 <br>
680     University Computing Service
681     <br>
682 nigel 93 Cambridge CB2 3QH, England.
683 ph10 99 <br>
684 nigel 63 </P>
685 ph10 954 <br><a name="SEC14" href="#TOC1">REVISION</a><br>
686 nigel 63 <P>
687 ph10 954 Last updated: 04 March 2012
688 nigel 63 <br>
689 ph10 954 Copyright &copy; 1997-2012 University of Cambridge.
690 ph10 99 <br>
691 nigel 75 <p>
692     Return to the <a href="index.html">PCRE index page</a>.
693     </p>

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12