/[pcre]/code/trunk/doc/pcre.3
ViewVC logotype

Diff of /code/trunk/doc/pcre.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 51 by nigel, Sat Feb 24 21:39:37 2007 UTC revision 53 by nigel, Sat Feb 24 21:39:42 2007 UTC
# Line 92  contain the major and minor release numb Line 92  contain the major and minor release numb
92  use these to include support for different releases.  use these to include support for different releases.
93    
94  The functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and \fBpcre_exec()\fR  The functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and \fBpcre_exec()\fR
95  are used for compiling and matching regular expressions.  are used for compiling and matching regular expressions. A sample program that
96    demonstrates the simplest way of using them is given in the file
97    \fIpcredemo.c\fR. The last section of this man page describes how to run it.
98    
99  The functions \fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and  The functions \fBpcre_copy_substring()\fR, \fBpcre_get_substring()\fR, and
100  \fBpcre_get_substring_list()\fR are convenience functions for extracting  \fBpcre_get_substring_list()\fR are convenience functions for extracting
# Line 129  the same compiled pattern can safely be Line 131  the same compiled pattern can safely be
131  The function \fBpcre_compile()\fR is called to compile a pattern into an  The function \fBpcre_compile()\fR is called to compile a pattern into an
132  internal form. The pattern is a C string terminated by a binary zero, and  internal form. The pattern is a C string terminated by a binary zero, and
133  is passed in the argument \fIpattern\fR. A pointer to a single block of memory  is passed in the argument \fIpattern\fR. A pointer to a single block of memory
134  that is obtained via \fBpcre_malloc\fR is returned. This contains the  that is obtained via \fBpcre_malloc\fR is returned. This contains the compiled
135  compiled code and related data. The \fBpcre\fR type is defined for this for  code and related data. The \fBpcre\fR type is defined for the returned block;
136  convenience, but in fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the  this is a typedef for a structure whose contents are not externally defined. It
137  contents of the block are not externally defined. It is up to the caller to  is up to the caller to free the memory when it is no longer required.
138  free the memory when it is no longer required.  
139  .PP  Although the compiled code of a PCRE regex is relocatable, that is, it does not
140    depend on memory location, the complete \fBpcre\fR data block is not
141    fully relocatable, because it contains a copy of the \fItableptr\fR argument,
142    which is an address (see below).
143    
144  The size of a compiled pattern is roughly proportional to the length of the  The size of a compiled pattern is roughly proportional to the length of the
145  pattern string, except that each character class (other than those containing  pattern string, except that each character class (other than those containing
146  just a single character, negated or not) requires 33 bytes, and repeat  just a single character, negated or not) requires 33 bytes, and repeat
147  quantifiers with a minimum greater than one or a bounded maximum cause the  quantifiers with a minimum greater than one or a bounded maximum cause the
148  relevant portions of the compiled pattern to be replicated.  relevant portions of the compiled pattern to be replicated.
149  .PP  
150  The \fIoptions\fR argument contains independent bits that affect the  The \fIoptions\fR argument contains independent bits that affect the
151  compilation. It should be zero if no options are required. Some of the options,  compilation. It should be zero if no options are required. Some of the options,
152  in particular, those that are compatible with Perl, can also be set and unset  in particular, those that are compatible with Perl, can also be set and unset
# Line 149  below). For these options, the contents Line 155  below). For these options, the contents
155  their initial settings at the start of compilation and execution. The  their initial settings at the start of compilation and execution. The
156  PCRE_ANCHORED option can be set at the time of matching as well as at compile  PCRE_ANCHORED option can be set at the time of matching as well as at compile
157  time.  time.
158  .PP  
159  If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately.  If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately.
160  Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns  Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns
161  NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual  NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual
162  error message. The offset from the start of the pattern to the character where  error message. The offset from the start of the pattern to the character where
163  the error was discovered is placed in the variable pointed to by  the error was discovered is placed in the variable pointed to by
164  \fIerroffset\fR, which must not be NULL. If it is, an immediate error is given.  \fIerroffset\fR, which must not be NULL. If it is, an immediate error is given.
165  .PP  
166  If the final argument, \fItableptr\fR, is NULL, PCRE uses a default set of  If the final argument, \fItableptr\fR, is NULL, PCRE uses a default set of
167  character tables which are built when it is compiled, using the default C  character tables which are built when it is compiled, using the default C
168  locale. Otherwise, \fItableptr\fR must be the result of a call to  locale. Otherwise, \fItableptr\fR must be the result of a call to
169  \fBpcre_maketables()\fR. See the section on locale support below.  \fBpcre_maketables()\fR. See the section on locale support below.
170  .PP  
171    This code fragment shows a typical straightforward call to \fBpcre_compile()\fR:
172    
173      pcre *re;
174      const char *error;
175      int erroffset;
176      re = pcre_compile(
177        "^A.*Z",          /* the pattern */
178        0,                /* default options */
179        &error,           /* for error message */
180        &erroffset,       /* for error offset */
181        NULL);            /* use default character tables */
182    
183  The following option bits are defined in the header file:  The following option bits are defined in the header file:
184    
185    PCRE_ANCHORED    PCRE_ANCHORED
# Line 248  Details of exactly what it entails are g Line 266  Details of exactly what it entails are g
266  When a pattern is going to be used several times, it is worth spending more  When a pattern is going to be used several times, it is worth spending more
267  time analyzing it in order to speed up the time taken for matching. The  time analyzing it in order to speed up the time taken for matching. The
268  function \fBpcre_study()\fR takes a pointer to a compiled pattern as its first  function \fBpcre_study()\fR takes a pointer to a compiled pattern as its first
269  argument, and returns a pointer to a \fBpcre_extra\fR block (another \fBvoid\fR  argument, and returns a pointer to a \fBpcre_extra\fR block (another typedef
270  typedef) containing additional information about the pattern; this can be  for a structure with hidden contents) containing additional information about
271  passed to \fBpcre_exec()\fR. If no additional information is available, NULL  the pattern; this can be passed to \fBpcre_exec()\fR. If no additional
272  is returned.  information is available, NULL is returned.
273    
274  The second argument contains option bits. At present, no options are defined  The second argument contains option bits. At present, no options are defined
275  for \fBpcre_study()\fR, and this argument should always be zero.  for \fBpcre_study()\fR, and this argument should always be zero.
# Line 260  The third argument for \fBpcre_study()\f Line 278  The third argument for \fBpcre_study()\f
278  studying succeeds (even if no data is returned), the variable it points to is  studying succeeds (even if no data is returned), the variable it points to is
279  set to NULL. Otherwise it points to a textual error message.  set to NULL. Otherwise it points to a textual error message.
280    
281    This is a typical call to \fBpcre_study\fR():
282    
283      pcre_extra *pe;
284      pe = pcre_study(
285        re,             /* result of pcre_compile() */
286        0,              /* no options exist */
287        &error);        /* set to NULL or points to a message */
288    
289  At present, studying a pattern is useful only for non-anchored patterns that do  At present, studying a pattern is useful only for non-anchored patterns that do
290  not have a single fixed starting character. A bitmap of possible starting  not have a single fixed starting character. A bitmap of possible starting
291  characters is created.  characters is created.
# Line 309  the following negative numbers: Line 335  the following negative numbers:
335    PCRE_ERROR_BADMAGIC   the "magic number" was not found    PCRE_ERROR_BADMAGIC   the "magic number" was not found
336    PCRE_ERROR_BADOPTION  the value of \fIwhat\fR was invalid    PCRE_ERROR_BADOPTION  the value of \fIwhat\fR was invalid
337    
338    Here is a typical call of \fBpcre_fullinfo()\fR, to obtain the length of the
339    compiled pattern:
340    
341      int rc;
342      unsigned long int length;
343      rc = pcre_fullinfo(
344        re,               /* result of pcre_compile() */
345        pe,               /* result of pcre_study(), or NULL */
346        PCRE_INFO_SIZE,   /* what is required */
347        &length);         /* where to put the data */
348    
349  The possible values for the third argument are defined in \fBpcre.h\fR, and are  The possible values for the third argument are defined in \fBpcre.h\fR, and are
350  as follows:  as follows:
351    
352    PCRE_INFO_OPTIONS    PCRE_INFO_OPTIONS
353    
354  Return a copy of the options with which the pattern was compiled. The fourth  Return a copy of the options with which the pattern was compiled. The fourth
355  argument should point to au \fBunsigned long int\fR variable. These option bits  argument should point to an \fBunsigned long int\fR variable. These option bits
356  are those specified in the call to \fBpcre_compile()\fR, modified by any  are those specified in the call to \fBpcre_compile()\fR, modified by any
357  top-level option settings within the pattern itself, and with the PCRE_ANCHORED  top-level option settings within the pattern itself, and with the PCRE_ANCHORED
358  bit forcibly set if the form of the pattern implies that it can match only at  bit forcibly set if the form of the pattern implies that it can match only at
# Line 396  pre-compiled pattern, which is passed in Line 433  pre-compiled pattern, which is passed in
433  pattern has been studied, the result of the study should be passed in the  pattern has been studied, the result of the study should be passed in the
434  \fIextra\fR argument. Otherwise this must be NULL.  \fIextra\fR argument. Otherwise this must be NULL.
435    
436    Here is an example of a simple call to \fBpcre_exec()\fR:
437    
438      int rc;
439      int ovector[30];
440      rc = pcre_exec(
441        re,             /* result of pcre_compile() */
442        NULL,           /* we didn't study the pattern */
443        "some string",  /* the subject string */
444        11,             /* the length of the subject string */
445        0,              /* start at offset 0 in the subject */
446        0,              /* default options */
447        ovector,        /* vector for substring information */
448        30);            /* number of elements in the vector */
449    
450  The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose  The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose
451  unused bits must be zero. However, if a pattern was compiled with  unused bits must be zero. However, if a pattern was compiled with
452  PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it  PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it
# Line 437  below) and trying an ordinary match agai Line 488  below) and trying an ordinary match agai
488    
489  The subject string is passed as a pointer in \fIsubject\fR, a length in  The subject string is passed as a pointer in \fIsubject\fR, a length in
490  \fIlength\fR, and a starting offset in \fIstartoffset\fR. Unlike the pattern  \fIlength\fR, and a starting offset in \fIstartoffset\fR. Unlike the pattern
491  string, it may contain binary zero characters. When the starting offset is  string, the subject may contain binary zero characters. When the starting
492  zero, the search for a match starts at the beginning of the subject, and this  offset is zero, the search for a match starts at the beginning of the subject,
493  is by far the most common case.  and this is by far the most common case.
494    
495  A non-zero starting offset is useful when searching for another match in the  A non-zero starting offset is useful when searching for another match in the
496  same subject by calling \fBpcre_exec()\fR again after a previous success.  same subject by calling \fBpcre_exec()\fR again after a previous success.
# Line 626  There are some size limitations in PCRE Line 677  There are some size limitations in PCRE
677  practice be relevant.  practice be relevant.
678  The maximum length of a compiled pattern is 65539 (sic) bytes.  The maximum length of a compiled pattern is 65539 (sic) bytes.
679  All values in repeating quantifiers must be less than 65536.  All values in repeating quantifiers must be less than 65536.
680  The maximum number of capturing subpatterns is 99.  There maximum number of capturing subpatterns is 65535.
681  The maximum number of all parenthesized subpatterns, including capturing  There is no limit to the number of non-capturing subpatterns, but the maximum
682    depth of nesting of all kinds of parenthesized subpattern, including capturing
683  subpatterns, assertions, and other types of subpattern, is 200.  subpatterns, assertions, and other types of subpattern, is 200.
684    
685  The maximum length of a subject string is the largest positive number that an  The maximum length of a subject string is the largest positive number that an
# Line 949  PCRE_MULTILINE is set. Line 1001  PCRE_MULTILINE is set.
1001    
1002  Note that the sequences \\A, \\Z, and \\z can be used to match the start and  Note that the sequences \\A, \\Z, and \\z can be used to match the start and
1003  end of the subject in both modes, and if all branches of a pattern start with  end of the subject in both modes, and if all branches of a pattern start with
1004  \\A is it always anchored, whether PCRE_MULTILINE is set or not.  \\A it is always anchored, whether PCRE_MULTILINE is set or not.
1005    
1006    
1007  .SH FULL STOP (PERIOD, DOT)  .SH FULL STOP (PERIOD, DOT)
# Line 1053  negation, which is indicated by a ^ char Line 1105  negation, which is indicated by a ^ char
1105    
1106    [12[:^digit:]]    [12[:^digit:]]
1107    
1108  matches "1", "2", or any non-digit. PCRE (and Perl) also recogize the POSIX  matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX
1109  syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not  syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
1110  supported, and an error is given if they are encountered.  supported, and an error is given if they are encountered.
1111    
# Line 1151  For example, if the string "the red king Line 1203  For example, if the string "the red king
1203    the ((red|white) (king|queen))    the ((red|white) (king|queen))
1204    
1205  the captured substrings are "red king", "red", and "king", and are numbered 1,  the captured substrings are "red king", "red", and "king", and are numbered 1,
1206  2, and 3.  2, and 3, respectively.
1207    
1208  The fact that plain parentheses fulfil two functions is not always helpful.  The fact that plain parentheses fulfil two functions is not always helpful.
1209  There are often times when a grouping subpattern is required without a  There are often times when a grouping subpattern is required without a
# Line 1792  The following UTF-8 features of Perl 5.6 Line 1844  The following UTF-8 features of Perl 5.6
1844    
1845  2. The use of Unicode tables and properties and escapes \\p, \\P, and \\X.  2. The use of Unicode tables and properties and escapes \\p, \\P, and \\X.
1846    
1847    
1848    .SH SAMPLE PROGRAM
1849    The code below is a simple, complete demonstration program, to get you started
1850    with using PCRE. This code is also supplied in the file \fIpcredemo.c\fR in the
1851    PCRE distribution.
1852    
1853    The program compiles the regular expression that is its first argument, and
1854    matches it against the subject string in its second argument. No options are
1855    set, and default character tables are used. If matching succeeds, the program
1856    outputs the portion of the subject that matched, together with the contents of
1857    any captured substrings.
1858    
1859    On a Unix system that has PCRE installed in \fI/usr/local\fR, you can compile
1860    the demonstration program using a command like this:
1861    
1862      gcc -o pcredemo pcredemo.c -I/usr/local/include -L/usr/local/lib -lpcre
1863    
1864    Then you can run simple tests like this:
1865    
1866      ./pcredemo 'cat|dog' 'the cat sat on the mat'
1867    
1868    Note that there is a much more comprehensive test program, called
1869    \fBpcretest\fR, which supports many more facilities for testing regular
1870    expressions. The \fBpcredemo\fR program is provided as a simple coding example.
1871    
1872    On some operating systems (e.g. Solaris) you may get an error like this when
1873    you try to run \fBpcredemo\fR:
1874    
1875      ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
1876    
1877    This is caused by the way shared library support works on those systems. You
1878    need to add
1879    
1880      -R/usr/local/lib
1881    
1882    to the compile command to get round this problem. Here's the code:
1883    
1884      #include <stdio.h>
1885      #include <string.h>
1886      #include <pcre.h>
1887    
1888      #define OVECCOUNT 30    /* should be a multiple of 3 */
1889    
1890      int main(int argc, char **argv)
1891      {
1892      pcre *re;
1893      const char *error;
1894      int erroffset;
1895      int ovector[OVECCOUNT];
1896      int rc, i;
1897    
1898      if (argc != 3)
1899        {
1900        printf("Two arguments required: a regex and a "
1901          "subject string\\n");
1902        return 1;
1903        }
1904    
1905      /* Compile the regular expression in the first argument */
1906    
1907      re = pcre_compile(
1908        argv[1],     /* the pattern */
1909        0,           /* default options */
1910        &error,      /* for error message */
1911        &erroffset,  /* for error offset */
1912        NULL);       /* use default character tables */
1913    
1914      /* Compilation failed: print the error message and exit */
1915    
1916      if (re == NULL)
1917        {
1918        printf("PCRE compilation failed at offset %d: %s\\n",
1919          erroffset, error);
1920        return 1;
1921        }
1922    
1923      /* Compilation succeeded: match the subject in the second
1924         argument */
1925    
1926      rc = pcre_exec(
1927        re,          /* the compiled pattern */
1928        NULL,        /* we didn't study the pattern */
1929        argv[2],     /* the subject string */
1930        (int)strlen(argv[2]), /* the length of the subject */
1931        0,           /* start at offset 0 in the subject */
1932        0,           /* default options */
1933        ovector,     /* vector for substring information */
1934        OVECCOUNT);  /* number of elements in the vector */
1935    
1936      /* Matching failed: handle error cases */
1937    
1938      if (rc < 0)
1939        {
1940        switch(rc)
1941          {
1942          case PCRE_ERROR_NOMATCH: printf("No match\\n"); break;
1943          /*
1944          Handle other special cases if you like
1945          */
1946          default: printf("Matching error %d\\n", rc); break;
1947          }
1948        return 1;
1949        }
1950    
1951      /* Match succeded */
1952    
1953      printf("Match succeeded\\n");
1954    
1955      /* The output vector wasn't big enough */
1956    
1957      if (rc == 0)
1958        {
1959        rc = OVECCOUNT/3;
1960        printf("ovector only has room for %d captured "
1961          substrings\\n", rc - 1);
1962        }
1963    
1964      /* Show substrings stored in the output vector */
1965    
1966      for (i = 0; i < rc; i++)
1967        {
1968        char *substring_start = argv[2] + ovector[2*i];
1969        int substring_length = ovector[2*i+1] - ovector[2*i];
1970        printf("%2d: %.*s\\n", i, substring_length,
1971          substring_start);
1972        }
1973    
1974      return 0;
1975      }
1976    
1977    
1978  .SH AUTHOR  .SH AUTHOR
1979  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel <ph10@cam.ac.uk>
1980  .br  .br
# Line 1803  Cambridge CB2 3QG, England. Line 1986  Cambridge CB2 3QG, England.
1986  .br  .br
1987  Phone: +44 1223 334714  Phone: +44 1223 334714
1988    
1989  Last updated: 28 August 2000,  Last updated: 15 August 2001
 .br  
   the 250th anniversary of the death of J.S. Bach.  
1990  .br  .br
1991  Copyright (c) 1997-2000 University of Cambridge.  Copyright (c) 1997-2001 University of Cambridge.

Legend:
Removed from v.51  
changed lines
  Added in v.53

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12