/[pcre]/code/trunk/doc/html/pcreapi.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcreapi.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 74 by nigel, Sat Feb 24 21:40:30 2007 UTC revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC
# Line 3  Line 3 
3  <title>pcreapi specification</title>  <title>pcreapi specification</title>
4  </head>  </head>
5  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6  This HTML document has been generated automatically from the original man page.  <h1>pcreapi man page</h1>
7  If there is any nonsense in it, please consult the man page, in case the  <p>
8  conversion went wrong.<br>  Return to the <a href="index.html">PCRE index page</a>.
9    </p>
10    <p>
11    This page is part of the PCRE HTML documentation. It was generated automatically
12    from the original man page. If there is any nonsense in it, please consult the
13    man page, in case the conversion went wrong.
14    <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">SYNOPSIS OF PCRE API</a>  <li><a name="TOC1" href="#SEC1">PCRE NATIVE API</a>
17  <li><a name="TOC2" href="#SEC2">PCRE API</a>  <li><a name="TOC2" href="#SEC2">PCRE API OVERVIEW</a>
18  <li><a name="TOC3" href="#SEC3">MULTITHREADING</a>  <li><a name="TOC3" href="#SEC3">MULTITHREADING</a>
19  <li><a name="TOC4" href="#SEC4">CHECKING BUILD-TIME OPTIONS</a>  <li><a name="TOC4" href="#SEC4">SAVING PRECOMPILED PATTERNS FOR LATER USE</a>
20  <li><a name="TOC5" href="#SEC5">COMPILING A PATTERN</a>  <li><a name="TOC5" href="#SEC5">CHECKING BUILD-TIME OPTIONS</a>
21  <li><a name="TOC6" href="#SEC6">STUDYING A PATTERN</a>  <li><a name="TOC6" href="#SEC6">COMPILING A PATTERN</a>
22  <li><a name="TOC7" href="#SEC7">LOCALE SUPPORT</a>  <li><a name="TOC7" href="#SEC7">STUDYING A PATTERN</a>
23  <li><a name="TOC8" href="#SEC8">INFORMATION ABOUT A PATTERN</a>  <li><a name="TOC8" href="#SEC8">LOCALE SUPPORT</a>
24  <li><a name="TOC9" href="#SEC9">OBSOLETE INFO FUNCTION</a>  <li><a name="TOC9" href="#SEC9">INFORMATION ABOUT A PATTERN</a>
25  <li><a name="TOC10" href="#SEC10">MATCHING A PATTERN</a>  <li><a name="TOC10" href="#SEC10">OBSOLETE INFO FUNCTION</a>
26  <li><a name="TOC11" href="#SEC11">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>  <li><a name="TOC11" href="#SEC11">MATCHING A PATTERN</a>
27  <li><a name="TOC12" href="#SEC12">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>  <li><a name="TOC12" href="#SEC12">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a>
28    <li><a name="TOC13" href="#SEC13">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
29  </ul>  </ul>
30  <br><a name="SEC1" href="#TOC1">SYNOPSIS OF PCRE API</a><br>  <br><a name="SEC1" href="#TOC1">PCRE NATIVE API</a><br>
31  <P>  <P>
32  <b>#include &#60;pcre.h&#62;</b>  <b>#include &#60;pcre.h&#62;</b>
33  </P>  </P>
# Line 106  conversion went wrong.
Line 113  conversion went wrong.
113  <P>  <P>
114  <b>int (*pcre_callout)(pcre_callout_block *);</b>  <b>int (*pcre_callout)(pcre_callout_block *);</b>
115  </P>  </P>
116  <br><a name="SEC2" href="#TOC1">PCRE API</a><br>  <br><a name="SEC2" href="#TOC1">PCRE API OVERVIEW</a><br>
117  <P>  <P>
118  PCRE has its own native API, which is described in this document. There is also  PCRE has its own native API, which is described in this document. There is also
119  a set of wrapper functions that correspond to the POSIX regular expression API.  a set of wrapper functions that correspond to the POSIX regular expression API.
120  These are described in the <b>pcreposix</b> documentation.  These are described in the
121    <a href="pcreposix.html"><b>pcreposix</b></a>
122    documentation.
123  </P>  </P>
124  <P>  <P>
125  The native API function prototypes are defined in the header file <b>pcre.h</b>,  The native API function prototypes are defined in the header file <b>pcre.h</b>,
126  and on Unix systems the library itself is called <b>libpcre.a</b>, so can be  and on Unix systems the library itself is called <b>libpcre</b>. It can
127  accessed by adding <b>-lpcre</b> to the command for linking an application which  normally be accessed by adding <b>-lpcre</b> to the command for linking an
128  calls it. The header file defines the macros PCRE_MAJOR and PCRE_MINOR to  application that uses PCRE. The header file defines the macros PCRE_MAJOR and
129  contain the major and minor release numbers for the library. Applications can  PCRE_MINOR to contain the major and minor release numbers for the library.
130  use these to include support for different releases.  Applications can use these to include support for different releases of PCRE.
131  </P>  </P>
132  <P>  <P>
133  The functions <b>pcre_compile()</b>, <b>pcre_study()</b>, and <b>pcre_exec()</b>  The functions <b>pcre_compile()</b>, <b>pcre_study()</b>, and <b>pcre_exec()</b>
134  are used for compiling and matching regular expressions. A sample program that  are used for compiling and matching regular expressions. A sample program that
135  demonstrates the simplest way of using them is given in the file  demonstrates the simplest way of using them is provided in the file called
136  <i>pcredemo.c</i>. The <b>pcresample</b> documentation describes how to run it.  <i>pcredemo.c</i> in the source distribution. The
137  </P>  <a href="pcresample.html"><b>pcresample</b></a>
138  <P>  documentation describes how to run it.
 There are convenience functions for extracting captured substrings from a  
 matched subject string. They are:  
139  </P>  </P>
140  <P>  <P>
141    In addition to the main compiling and matching functions, there are convenience
142    functions for extracting captured substrings from a matched subject string.
143    They are:
144  <pre>  <pre>
145    <b>pcre_copy_substring()</b>    <b>pcre_copy_substring()</b>
146    <b>pcre_copy_named_substring()</b>    <b>pcre_copy_named_substring()</b>
147    <b>pcre_get_substring()</b>    <b>pcre_get_substring()</b>
148    <b>pcre_get_named_substring()</b>    <b>pcre_get_named_substring()</b>
149    <b>pcre_get_substring_list()</b>    <b>pcre_get_substring_list()</b>
150  </PRE>    <b>pcre_get_stringnumber()</b>
151  </P>  </pre>
 <P>  
152  <b>pcre_free_substring()</b> and <b>pcre_free_substring_list()</b> are also  <b>pcre_free_substring()</b> and <b>pcre_free_substring_list()</b> are also
153  provided, to free the memory used for extracted strings.  provided, to free the memory used for extracted strings.
154  </P>  </P>
155  <P>  <P>
156  The function <b>pcre_maketables()</b> is used (optionally) to build a set of  The function <b>pcre_maketables()</b> is used to build a set of character tables
157  character tables in the current locale for passing to <b>pcre_compile()</b>.  in the current locale for passing to <b>pcre_compile()</b> or <b>pcre_exec()</b>.
158    This is an optional facility that is provided for specialist use. Most
159    commonly, no special tables are passed, in which case internal tables that are
160    generated when PCRE is built are used.
161  </P>  </P>
162  <P>  <P>
163  The function <b>pcre_fullinfo()</b> is used to find out information about a  The function <b>pcre_fullinfo()</b> is used to find out information about a
164  compiled pattern; <b>pcre_info()</b> is an obsolete version which returns only  compiled pattern; <b>pcre_info()</b> is an obsolete version that returns only
165  some of the available information, but is retained for backwards compatibility.  some of the available information, but is retained for backwards compatibility.
166  The function <b>pcre_version()</b> returns a pointer to a string containing the  The function <b>pcre_version()</b> returns a pointer to a string containing the
167  version of PCRE and its date of release.  version of PCRE and its date of release.
168  </P>  </P>
169  <P>  <P>
170  The global variables <b>pcre_malloc</b> and <b>pcre_free</b> initially contain  The global variables <b>pcre_malloc</b> and <b>pcre_free</b> initially contain
171  the entry points of the standard <b>malloc()</b> and <b>free()</b> functions  the entry points of the standard <b>malloc()</b> and <b>free()</b> functions,
172  respectively. PCRE calls the memory management functions via these variables,  respectively. PCRE calls the memory management functions via these variables,
173  so a calling program can replace them if it wishes to intercept the calls. This  so a calling program can replace them if it wishes to intercept the calls. This
174  should be done before calling any PCRE functions.  should be done before calling any PCRE functions.
# Line 175  freed), and always for memory blocks of Line 187  freed), and always for memory blocks of
187  <P>  <P>
188  The global variable <b>pcre_callout</b> initially contains NULL. It can be set  The global variable <b>pcre_callout</b> initially contains NULL. It can be set
189  by the caller to a "callout" function, which PCRE will then call at specified  by the caller to a "callout" function, which PCRE will then call at specified
190  points during a matching operation. Details are given in the <b>pcrecallout</b>  points during a matching operation. Details are given in the
191    <a href="pcrecallout.html"><b>pcrecallout</b></a>
192  documentation.  documentation.
193  </P>  </P>
194  <br><a name="SEC3" href="#TOC1">MULTITHREADING</a><br>  <br><a name="SEC3" href="#TOC1">MULTITHREADING</a><br>
# Line 189  callout function pointed to by pcre_c Line 202  callout function pointed to by pcre_c
202  The compiled form of a regular expression is not altered during matching, so  The compiled form of a regular expression is not altered during matching, so
203  the same compiled pattern can safely be used by several threads at once.  the same compiled pattern can safely be used by several threads at once.
204  </P>  </P>
205  <br><a name="SEC4" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>  <br><a name="SEC4" href="#TOC1">SAVING PRECOMPILED PATTERNS FOR LATER USE</a><br>
206    <P>
207    The compiled form of a regular expression can be saved and re-used at a later
208    time, possibly by a different program, and even on a host other than the one on
209    which it was compiled. Details are given in the
210    <a href="pcreprecompile.html"><b>pcreprecompile</b></a>
211    documentation.
212    </P>
213    <br><a name="SEC5" href="#TOC1">CHECKING BUILD-TIME OPTIONS</a><br>
214  <P>  <P>
215  <b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>  <b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
216  </P>  </P>
# Line 203  documentation has more details about the Line 224  documentation has more details about the
224  The first argument for <b>pcre_config()</b> is an integer, specifying which  The first argument for <b>pcre_config()</b> is an integer, specifying which
225  information is required; the second argument is a pointer to a variable into  information is required; the second argument is a pointer to a variable into
226  which the information is placed. The following information is available:  which the information is placed. The following information is available:
 </P>  
 <P>  
227  <pre>  <pre>
228    PCRE_CONFIG_UTF8    PCRE_CONFIG_UTF8
229  </PRE>  </pre>
 </P>  
 <P>  
230  The output is an integer that is set to one if UTF-8 support is available;  The output is an integer that is set to one if UTF-8 support is available;
231  otherwise it is set to zero.  otherwise it is set to zero.
232  </P>  <pre>
233  <P>    PCRE_CONFIG_UNICODE_PROPERTIES
234    </pre>
235    The output is an integer that is set to one if support for Unicode character
236    properties is available; otherwise it is set to zero.
237  <pre>  <pre>
238    PCRE_CONFIG_NEWLINE    PCRE_CONFIG_NEWLINE
239  </PRE>  </pre>
 </P>  
 <P>  
240  The output is an integer that is set to the value of the code that is used for  The output is an integer that is set to the value of the code that is used for
241  the newline character. It is either linefeed (10) or carriage return (13), and  the newline character. It is either linefeed (10) or carriage return (13), and
242  should normally be the standard character for your operating system.  should normally be the standard character for your operating system.
 </P>  
 <P>  
243  <pre>  <pre>
244    PCRE_CONFIG_LINK_SIZE    PCRE_CONFIG_LINK_SIZE
245  </PRE>  </pre>
 </P>  
 <P>  
246  The output is an integer that contains the number of bytes used for internal  The output is an integer that contains the number of bytes used for internal
247  linkage in compiled regular expressions. The value is 2, 3, or 4. Larger values  linkage in compiled regular expressions. The value is 2, 3, or 4. Larger values
248  allow larger regular expressions to be compiled, at the expense of slower  allow larger regular expressions to be compiled, at the expense of slower
249  matching. The default value of 2 is sufficient for all but the most massive  matching. The default value of 2 is sufficient for all but the most massive
250  patterns, since it allows the compiled pattern to be up to 64K in size.  patterns, since it allows the compiled pattern to be up to 64K in size.
 </P>  
 <P>  
251  <pre>  <pre>
252    PCRE_CONFIG_POSIX_MALLOC_THRESHOLD    PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
253  </PRE>  </pre>
 </P>  
 <P>  
254  The output is an integer that contains the threshold above which the POSIX  The output is an integer that contains the threshold above which the POSIX
255  interface uses <b>malloc()</b> for output vectors. Further details are given in  interface uses <b>malloc()</b> for output vectors. Further details are given in
256  the <b>pcreposix</b> documentation.  the
257  </P>  <a href="pcreposix.html"><b>pcreposix</b></a>
258  <P>  documentation.
259  <pre>  <pre>
260    PCRE_CONFIG_MATCH_LIMIT    PCRE_CONFIG_MATCH_LIMIT
261  </PRE>  </pre>
 </P>  
 <P>  
262  The output is an integer that gives the default limit for the number of  The output is an integer that gives the default limit for the number of
263  internal matching function calls in a <b>pcre_exec()</b> execution. Further  internal matching function calls in a <b>pcre_exec()</b> execution. Further
264  details are given with <b>pcre_exec()</b> below.  details are given with <b>pcre_exec()</b> below.
 </P>  
 <P>  
265  <pre>  <pre>
266    PCRE_CONFIG_STACKRECURSE    PCRE_CONFIG_STACKRECURSE
267  </PRE>  </pre>
 </P>  
 <P>  
268  The output is an integer that is set to one if internal recursion is  The output is an integer that is set to one if internal recursion is
269  implemented by recursive function calls that use the stack to remember their  implemented by recursive function calls that use the stack to remember their
270  state. This is the usual way that PCRE is compiled. The output is zero if PCRE  state. This is the usual way that PCRE is compiled. The output is zero if PCRE
# Line 268  was compiled to use blocks of data on th Line 272  was compiled to use blocks of data on th
272  calls. In this case, <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> are  calls. In this case, <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> are
273  called to manage memory blocks on the heap, thus avoiding the use of the stack.  called to manage memory blocks on the heap, thus avoiding the use of the stack.
274  </P>  </P>
275  <br><a name="SEC5" href="#TOC1">COMPILING A PATTERN</a><br>  <br><a name="SEC6" href="#TOC1">COMPILING A PATTERN</a><br>
276  <P>  <P>
277  <b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>  <b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
278  <b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>  <b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
# Line 277  called to manage memory blocks on the he Line 281  called to manage memory blocks on the he
281  <P>  <P>
282  The function <b>pcre_compile()</b> is called to compile a pattern into an  The function <b>pcre_compile()</b> is called to compile a pattern into an
283  internal form. The pattern is a C string terminated by a binary zero, and  internal form. The pattern is a C string terminated by a binary zero, and
284  is passed in the argument <i>pattern</i>. A pointer to a single block of memory  is passed in the <i>pattern</i> argument. A pointer to a single block of memory
285  that is obtained via <b>pcre_malloc</b> is returned. This contains the compiled  that is obtained via <b>pcre_malloc</b> is returned. This contains the compiled
286  code and related data. The <b>pcre</b> type is defined for the returned block;  code and related data. The <b>pcre</b> type is defined for the returned block;
287  this is a typedef for a structure whose contents are not externally defined. It  this is a typedef for a structure whose contents are not externally defined. It
# Line 286  is up to the caller to free the memory w Line 290  is up to the caller to free the memory w
290  <P>  <P>
291  Although the compiled code of a PCRE regex is relocatable, that is, it does not  Although the compiled code of a PCRE regex is relocatable, that is, it does not
292  depend on memory location, the complete <b>pcre</b> data block is not  depend on memory location, the complete <b>pcre</b> data block is not
293  fully relocatable, because it contains a copy of the <i>tableptr</i> argument,  fully relocatable, because it may contain a copy of the <i>tableptr</i>
294  which is an address (see below).  argument, which is an address (see below).
295  </P>  </P>
296  <P>  <P>
297  The <i>options</i> argument contains independent bits that affect the  The <i>options</i> argument contains independent bits that affect the
298  compilation. It should be zero if no options are required. Some of the options,  compilation. It should be zero if no options are required. The available
299  in particular, those that are compatible with Perl, can also be set and unset  options are described below. Some of them, in particular, those that are
300  from within the pattern (see the detailed description of regular expressions  compatible with Perl, can also be set and unset from within the pattern (see
301  in the <b>pcrepattern</b> documentation). For these options, the contents of the  the detailed description in the
302  <i>options</i> argument specifies their initial settings at the start of  <a href="pcrepattern.html"><b>pcrepattern</b></a>
303  compilation and execution. The PCRE_ANCHORED option can be set at the time of  documentation). For these options, the contents of the <i>options</i> argument
304  matching as well as at compile time.  specifies their initial settings at the start of compilation and execution. The
305    PCRE_ANCHORED option can be set at the time of matching as well as at compile
306    time.
307  </P>  </P>
308  <P>  <P>
309  If <i>errptr</i> is NULL, <b>pcre_compile()</b> returns NULL immediately.  If <i>errptr</i> is NULL, <b>pcre_compile()</b> returns NULL immediately.
# Line 309  the error was discovered is placed in th Line 315  the error was discovered is placed in th
315  </P>  </P>
316  <P>  <P>
317  If the final argument, <i>tableptr</i>, is NULL, PCRE uses a default set of  If the final argument, <i>tableptr</i>, is NULL, PCRE uses a default set of
318  character tables which are built when it is compiled, using the default C  character tables that are built when PCRE is compiled, using the default C
319  locale. Otherwise, <i>tableptr</i> must be the result of a call to  locale. Otherwise, <i>tableptr</i> must be an address that is the result of a
320  <b>pcre_maketables()</b>. See the section on locale support below.  call to <b>pcre_maketables()</b>. This value is stored with the compiled
321    pattern, and used again by <b>pcre_exec()</b>, unless another table pointer is
322    passed to it. For more discussion, see the section on locale support below.
323  </P>  </P>
324  <P>  <P>
325  This code fragment shows a typical straightforward call to <b>pcre_compile()</b>:  This code fragment shows a typical straightforward call to <b>pcre_compile()</b>:
 </P>  
 <P>  
326  <pre>  <pre>
327    pcre *re;    pcre *re;
328    const char *error;    const char *error;
# Line 327  This code fragment shows a typical strai Line 333  This code fragment shows a typical strai
333      &error,           /* for error message */      &error,           /* for error message */
334      &erroffset,       /* for error offset */      &erroffset,       /* for error offset */
335      NULL);            /* use default character tables */      NULL);            /* use default character tables */
336  </PRE>  </pre>
337  </P>  The following names for option bits are defined in the <b>pcre.h</b> header
338  <P>  file:
 The following option bits are defined:  
 </P>  
 <P>  
339  <pre>  <pre>
340    PCRE_ANCHORED    PCRE_ANCHORED
341  </PRE>  </pre>
 </P>  
 <P>  
342  If this bit is set, the pattern is forced to be "anchored", that is, it is  If this bit is set, the pattern is forced to be "anchored", that is, it is
343  constrained to match only at the first matching point in the string which is  constrained to match only at the first matching point in the string that is
344  being searched (the "subject string"). This effect can also be achieved by  being searched (the "subject string"). This effect can also be achieved by
345  appropriate constructs in the pattern itself, which is the only way to do it in  appropriate constructs in the pattern itself, which is the only way to do it in
346  Perl.  Perl.
347  </P>  <pre>
348  <P>    PCRE_AUTO_CALLOUT
349    </pre>
350    If this bit is set, <b>pcre_compile()</b> automatically inserts callout items,
351    all with number 255, before each pattern item. For discussion of the callout
352    facility, see the
353    <a href="pcrecallout.html"><b>pcrecallout</b></a>
354    documentation.
355  <pre>  <pre>
356    PCRE_CASELESS    PCRE_CASELESS
357  </PRE>  </pre>
 </P>  
 <P>  
358  If this bit is set, letters in the pattern match both upper and lower case  If this bit is set, letters in the pattern match both upper and lower case
359  letters. It is equivalent to Perl's /i option, and it can be changed within a  letters. It is equivalent to Perl's /i option, and it can be changed within a
360  pattern by a (?i) option setting.  pattern by a (?i) option setting. When running in UTF-8 mode, case support for
361  </P>  high-valued characters is available only when PCRE is built with Unicode
362  <P>  character property support.
363  <pre>  <pre>
364    PCRE_DOLLAR_ENDONLY    PCRE_DOLLAR_ENDONLY
365  </PRE>  </pre>
 </P>  
 <P>  
366  If this bit is set, a dollar metacharacter in the pattern matches only at the  If this bit is set, a dollar metacharacter in the pattern matches only at the
367  end of the subject string. Without this option, a dollar also matches  end of the subject string. Without this option, a dollar also matches
368  immediately before the final character if it is a newline (but not before any  immediately before the final character if it is a newline (but not before any
369  other newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is  other newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is
370  set. There is no equivalent to this option in Perl, and no way to set it within  set. There is no equivalent to this option in Perl, and no way to set it within
371  a pattern.  a pattern.
 </P>  
 <P>  
372  <pre>  <pre>
373    PCRE_DOTALL    PCRE_DOTALL
374  </PRE>  </pre>
 </P>  
 <P>  
375  If this bit is set, a dot metacharater in the pattern matches all characters,  If this bit is set, a dot metacharater in the pattern matches all characters,
376  including newlines. Without it, newlines are excluded. This option is  including newlines. Without it, newlines are excluded. This option is
377  equivalent to Perl's /s option, and it can be changed within a pattern by a  equivalent to Perl's /s option, and it can be changed within a pattern by a
378  (?s) option setting. A negative class such as [^a] always matches a newline  (?s) option setting. A negative class such as [^a] always matches a newline
379  character, independent of the setting of this option.  character, independent of the setting of this option.
 </P>  
 <P>  
380  <pre>  <pre>
381    PCRE_EXTENDED    PCRE_EXTENDED
382  </PRE>  </pre>
 </P>  
 <P>  
383  If this bit is set, whitespace data characters in the pattern are totally  If this bit is set, whitespace data characters in the pattern are totally
384  ignored except when escaped or inside a character class. Whitespace does not  ignored except when escaped or inside a character class. Whitespace does not
385  include the VT character (code 11). In addition, characters between an  include the VT character (code 11). In addition, characters between an
# Line 397  This option makes it possible to include Line 392  This option makes it possible to include
392  Note, however, that this applies only to data characters. Whitespace characters  Note, however, that this applies only to data characters. Whitespace characters
393  may never appear within special character sequences in a pattern, for example  may never appear within special character sequences in a pattern, for example
394  within the sequence (?( which introduces a conditional subpattern.  within the sequence (?( which introduces a conditional subpattern.
 </P>  
 <P>  
395  <pre>  <pre>
396    PCRE_EXTRA    PCRE_EXTRA
397  </PRE>  </pre>
 </P>  
 <P>  
398  This option was invented in order to turn on additional functionality of PCRE  This option was invented in order to turn on additional functionality of PCRE
399  that is incompatible with Perl, but it is currently of very little use. When  that is incompatible with Perl, but it is currently of very little use. When
400  set, any backslash in a pattern that is followed by a letter that has no  set, any backslash in a pattern that is followed by a letter that has no
# Line 412  expansion. By default, as in Perl, a bac Line 403  expansion. By default, as in Perl, a bac
403  special meaning is treated as a literal. There are at present no other features  special meaning is treated as a literal. There are at present no other features
404  controlled by this option. It can also be set by a (?X) option setting within a  controlled by this option. It can also be set by a (?X) option setting within a
405  pattern.  pattern.
 </P>  
 <P>  
406  <pre>  <pre>
407    PCRE_MULTILINE    PCRE_MULTILINE
408  </PRE>  </pre>
409  </P>  By default, PCRE treats the subject string as consisting of a single line of
410  <P>  characters (even if it actually contains newlines). The "start of line"
 By default, PCRE treats the subject string as consisting of a single "line" of  
 characters (even if it actually contains several newlines). The "start of line"  
411  metacharacter (^) matches only at the start of the string, while the "end of  metacharacter (^) matches only at the start of the string, while the "end of
412  line" metacharacter ($) matches only at the end of the string, or before a  line" metacharacter ($) matches only at the end of the string, or before a
413  terminating newline (unless PCRE_DOLLAR_ENDONLY is set). This is the same as  terminating newline (unless PCRE_DOLLAR_ENDONLY is set). This is the same as
# Line 433  string, respectively, as well as at the Line 420  string, respectively, as well as at the
420  to Perl's /m option, and it can be changed within a pattern by a (?m) option  to Perl's /m option, and it can be changed within a pattern by a (?m) option
421  setting. If there are no "\n" characters in a subject string, or no  setting. If there are no "\n" characters in a subject string, or no
422  occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect.  occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect.
 </P>  
 <P>  
423  <pre>  <pre>
424    PCRE_NO_AUTO_CAPTURE    PCRE_NO_AUTO_CAPTURE
425  </PRE>  </pre>
 </P>  
 <P>  
426  If this option is set, it disables the use of numbered capturing parentheses in  If this option is set, it disables the use of numbered capturing parentheses in
427  the pattern. Any opening parenthesis that is not followed by ? behaves as if it  the pattern. Any opening parenthesis that is not followed by ? behaves as if it
428  were followed by ?: but named parentheses can still be used for capturing (and  were followed by ?: but named parentheses can still be used for capturing (and
429  they acquire numbers in the usual way). There is no equivalent of this option  they acquire numbers in the usual way). There is no equivalent of this option
430  in Perl.  in Perl.
 </P>  
 <P>  
431  <pre>  <pre>
432    PCRE_UNGREEDY    PCRE_UNGREEDY
433  </PRE>  </pre>
 </P>  
 <P>  
434  This option inverts the "greediness" of the quantifiers so that they are not  This option inverts the "greediness" of the quantifiers so that they are not
435  greedy by default, but become greedy if followed by "?". It is not compatible  greedy by default, but become greedy if followed by "?". It is not compatible
436  with Perl. It can also be set by a (?U) option setting within the pattern.  with Perl. It can also be set by a (?U) option setting within the pattern.
 </P>  
 <P>  
437  <pre>  <pre>
438    PCRE_UTF8    PCRE_UTF8
439  </PRE>  </pre>
 </P>  
 <P>  
440  This option causes PCRE to regard both the pattern and the subject as strings  This option causes PCRE to regard both the pattern and the subject as strings
441  of UTF-8 characters instead of single-byte character strings. However, it is  of UTF-8 characters instead of single-byte character strings. However, it is
442  available only if PCRE has been built to include UTF-8 support. If not, the use  available only when PCRE is built to include UTF-8 support. If not, the use
443  of this option provokes an error. Details of how this option changes the  of this option provokes an error. Details of how this option changes the
444  behaviour of PCRE are given in the  behaviour of PCRE are given in the
445  <a href="pcre.html#utf8support">section on UTF-8 support</a>  <a href="pcre.html#utf8support">section on UTF-8 support</a>
446  in the main  in the main
447  <a href="pcre.html"><b>pcre</b></a>  <a href="pcre.html"><b>pcre</b></a>
448  page.  page.
 </P>  
 <P>  
449  <pre>  <pre>
450    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
451  </PRE>  </pre>
 </P>  
 <P>  
452  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
453  automatically checked. If an invalid UTF-8 sequence of bytes is found,  automatically checked. If an invalid UTF-8 sequence of bytes is found,
454  <b>pcre_compile()</b> returns an error. If you already know that your pattern is  <b>pcre_compile()</b> returns an error. If you already know that your pattern is
455  valid, and you want to skip this check for performance reasons, you can set the  valid, and you want to skip this check for performance reasons, you can set the
456  PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid  PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid
457  UTF-8 string as a pattern is undefined. It may cause your program to crash.  UTF-8 string as a pattern is undefined. It may cause your program to crash.
458  Note that there is a similar option for suppressing the checking of subject  Note that this option can also be passed to <b>pcre_exec()</b>, to suppress the
459  strings passed to <b>pcre_exec()</b>.  UTF-8 validity checking of subject strings.
460  </P>  </P>
461  <br><a name="SEC6" href="#TOC1">STUDYING A PATTERN</a><br>  <br><a name="SEC7" href="#TOC1">STUDYING A PATTERN</a><br>
462  <P>  <P>
463  <b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>  <b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
464  <b>const char **<i>errptr</i>);</b>  <b>const char **<i>errptr</i>);</b>
465  </P>  </P>
466  <P>  <P>
467  When a pattern is going to be used several times, it is worth spending more  If a compiled pattern is going to be used several times, it is worth spending
468  time analyzing it in order to speed up the time taken for matching. The  more time analyzing it in order to speed up the time taken for matching. The
469  function <b>pcre_study()</b> takes a pointer to a compiled pattern as its first  function <b>pcre_study()</b> takes a pointer to a compiled pattern as its first
470  argument. If studing the pattern produces additional information that will help  argument. If studying the pattern produces additional information that will
471  speed up matching, <b>pcre_study()</b> returns a pointer to a <b>pcre_extra</b>  help speed up matching, <b>pcre_study()</b> returns a pointer to a
472  block, in which the <i>study_data</i> field points to the results of the study.  <b>pcre_extra</b> block, in which the <i>study_data</i> field points to the
473    results of the study.
474  </P>  </P>
475  <P>  <P>
476  The returned value from a <b>pcre_study()</b> can be passed directly to  The returned value from <b>pcre_study()</b> can be passed directly to
477  <b>pcre_exec()</b>. However, the <b>pcre_extra</b> block also contains other  <b>pcre_exec()</b>. However, a <b>pcre_extra</b> block also contains other
478  fields that can be set by the caller before the block is passed; these are  fields that can be set by the caller before the block is passed; these are
479  described below. If studying the pattern does not produce any additional  described
480  information, <b>pcre_study()</b> returns NULL. In that circumstance, if the  <a href="#extradata">below</a>
481  calling program wants to pass some of the other fields to <b>pcre_exec()</b>, it  in the section on matching a pattern.
482  must set up its own <b>pcre_extra</b> block.  </P>
483    <P>
484    If studying the pattern does not produce any additional information,
485    <b>pcre_study()</b> returns NULL. In that circumstance, if the calling program
486    wants to pass any of the other fields to <b>pcre_exec()</b>, it must set up its
487    own <b>pcre_extra</b> block.
488  </P>  </P>
489  <P>  <P>
490  The second argument contains option bits. At present, no options are defined  The second argument of <b>pcre_study()</b> contains option bits. At present, no
491  for <b>pcre_study()</b>, and this argument should always be zero.  options are defined, and this argument should always be zero.
492  </P>  </P>
493  <P>  <P>
494  The third argument for <b>pcre_study()</b> is a pointer for an error message. If  The third argument for <b>pcre_study()</b> is a pointer for an error message. If
# Line 522  be sure that it has run successfully. Line 499  be sure that it has run successfully.
499  </P>  </P>
500  <P>  <P>
501  This is a typical call to <b>pcre_study</b>():  This is a typical call to <b>pcre_study</b>():
 </P>  
 <P>  
502  <pre>  <pre>
503    pcre_extra *pe;    pcre_extra *pe;
504    pe = pcre_study(    pe = pcre_study(
505      re,             /* result of pcre_compile() */      re,             /* result of pcre_compile() */
506      0,              /* no options exist */      0,              /* no options exist */
507      &error);        /* set to NULL or points to a message */      &error);        /* set to NULL or points to a message */
508  </PRE>  </pre>
 </P>  
 <P>  
509  At present, studying a pattern is useful only for non-anchored patterns that do  At present, studying a pattern is useful only for non-anchored patterns that do
510  not have a single fixed starting character. A bitmap of possible starting  not have a single fixed starting character. A bitmap of possible starting
511  characters is created.  bytes is created.
512  </P>  <a name="localesupport"></a></P>
513  <a name="localesupport"></a><br><a name="SEC7" href="#TOC1">LOCALE SUPPORT</a><br>  <br><a name="SEC8" href="#TOC1">LOCALE SUPPORT</a><br>
514  <P>  <P>
515  PCRE handles caseless matching, and determines whether characters are letters,  PCRE handles caseless matching, and determines whether characters are letters,
516  digits, or whatever, by reference to a set of tables. When running in UTF-8  digits, or whatever, by reference to a set of tables, indexed by character
517  mode, this applies only to characters with codes less than 256. The library  value. (When running in UTF-8 mode, this applies only to characters with codes
518  contains a default set of tables that is created in the default C locale when  less than 128. Higher-valued codes never match escapes such as \w or \d, but
519  PCRE is compiled. This is used when the final argument of <b>pcre_compile()</b>  can be tested with \p if PCRE is built with Unicode character property
520  is NULL, and is sufficient for many applications.  support.)
521  </P>  </P>
522  <P>  <P>
523  An alternative set of tables can, however, be supplied. Such tables are built  An internal set of tables is created in the default C locale when PCRE is
524  by calling the <b>pcre_maketables()</b> function, which has no arguments, in the  built. This is used when the final argument of <b>pcre_compile()</b> is NULL,
525  relevant locale. The result can then be passed to <b>pcre_compile()</b> as often  and is sufficient for many applications. An alternative set of tables can,
526  as necessary. For example, to build and use tables that are appropriate for the  however, be supplied. These may be created in a different locale from the
527  French locale (where accented characters with codes greater than 128 are  default. As more and more applications change to using Unicode, the need for
528  treated as letters), the following code could be used:  this locale support is expected to die away.
529  </P>  </P>
530  <P>  <P>
531    External tables are built by calling the <b>pcre_maketables()</b> function,
532    which has no arguments, in the relevant locale. The result can then be passed
533    to <b>pcre_compile()</b> or <b>pcre_exec()</b> as often as necessary. For
534    example, to build and use tables that are appropriate for the French locale
535    (where accented characters with values greater than 128 are treated as letters),
536    the following code could be used:
537  <pre>  <pre>
538    setlocale(LC_CTYPE, "fr");    setlocale(LC_CTYPE, "fr_FR");
539    tables = pcre_maketables();    tables = pcre_maketables();
540    re = pcre_compile(..., tables);    re = pcre_compile(..., tables);
541  </PRE>  </pre>
542    When <b>pcre_maketables()</b> runs, the tables are built in memory that is
543    obtained via <b>pcre_malloc</b>. It is the caller's responsibility to ensure
544    that the memory containing the tables remains available for as long as it is
545    needed.
546  </P>  </P>
547  <P>  <P>
548  The tables are built in memory that is obtained via <b>pcre_malloc</b>. The  The pointer that is passed to <b>pcre_compile()</b> is saved with the compiled
 pointer that is passed to <b>pcre_compile</b> is saved with the compiled  
549  pattern, and the same tables are used via this pointer by <b>pcre_study()</b>  pattern, and the same tables are used via this pointer by <b>pcre_study()</b>
550  and <b>pcre_exec()</b>. Thus, for any single pattern, compilation, studying and  and normally also by <b>pcre_exec()</b>. Thus, by default, for any single
551  matching all happen in the same locale, but different patterns can be compiled  pattern, compilation, studying and matching all happen in the same locale, but
552  in different locales. It is the caller's responsibility to ensure that the  different patterns can be compiled in different locales.
 memory containing the tables remains available for as long as it is needed.  
553  </P>  </P>
554  <br><a name="SEC8" href="#TOC1">INFORMATION ABOUT A PATTERN</a><br>  <P>
555    It is possible to pass a table pointer or NULL (indicating the use of the
556    internal tables) to <b>pcre_exec()</b>. Although not intended for this purpose,
557    this facility could be used to match a pattern in a different locale from the
558    one in which it was compiled. Passing table pointers at run time is discussed
559    below in the section on matching a pattern.
560    </P>
561    <br><a name="SEC9" href="#TOC1">INFORMATION ABOUT A PATTERN</a><br>
562  <P>  <P>
563  <b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>  <b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
564  <b>int <i>what</i>, void *<i>where</i>);</b>  <b>int <i>what</i>, void *<i>where</i>);</b>
# Line 587  the pattern was not studied. The third a Line 575  the pattern was not studied. The third a
575  information is required, and the fourth argument is a pointer to a variable  information is required, and the fourth argument is a pointer to a variable
576  to receive the data. The yield of the function is zero for success, or one of  to receive the data. The yield of the function is zero for success, or one of
577  the following negative numbers:  the following negative numbers:
 </P>  
 <P>  
578  <pre>  <pre>
579    PCRE_ERROR_NULL       the argument <i>code</i> was NULL    PCRE_ERROR_NULL       the argument <i>code</i> was NULL
580                          the argument <i>where</i> was NULL                          the argument <i>where</i> was NULL
581    PCRE_ERROR_BADMAGIC   the "magic number" was not found    PCRE_ERROR_BADMAGIC   the "magic number" was not found
582    PCRE_ERROR_BADOPTION  the value of <i>what</i> was invalid    PCRE_ERROR_BADOPTION  the value of <i>what</i> was invalid
583  </PRE>  </pre>
584  </P>  The "magic number" is placed at the start of each compiled pattern as an simple
585  <P>  check against passing an arbitrary memory pointer. Here is a typical call of
586  Here is a typical call of <b>pcre_fullinfo()</b>, to obtain the length of the  <b>pcre_fullinfo()</b>, to obtain the length of the compiled pattern:
 compiled pattern:  
 </P>  
 <P>  
587  <pre>  <pre>
588    int rc;    int rc;
589    unsigned long int length;    unsigned long int length;
# Line 609  compiled pattern: Line 592  compiled pattern:
592      pe,               /* result of pcre_study(), or NULL */      pe,               /* result of pcre_study(), or NULL */
593      PCRE_INFO_SIZE,   /* what is required */      PCRE_INFO_SIZE,   /* what is required */
594      &length);         /* where to put the data */      &length);         /* where to put the data */
595  </PRE>  </pre>
 </P>  
 <P>  
596  The possible values for the third argument are defined in <b>pcre.h</b>, and are  The possible values for the third argument are defined in <b>pcre.h</b>, and are
597  as follows:  as follows:
 </P>  
 <P>  
598  <pre>  <pre>
599    PCRE_INFO_BACKREFMAX    PCRE_INFO_BACKREFMAX
600  </PRE>  </pre>
 </P>  
 <P>  
601  Return the number of the highest back reference in the pattern. The fourth  Return the number of the highest back reference in the pattern. The fourth
602  argument should point to an <b>int</b> variable. Zero is returned if there are  argument should point to an <b>int</b> variable. Zero is returned if there are
603  no back references.  no back references.
 </P>  
 <P>  
604  <pre>  <pre>
605    PCRE_INFO_CAPTURECOUNT    PCRE_INFO_CAPTURECOUNT
606  </PRE>  </pre>
 </P>  
 <P>  
607  Return the number of capturing subpatterns in the pattern. The fourth argument  Return the number of capturing subpatterns in the pattern. The fourth argument
608  should point to an \fbint\fR variable.  should point to an <b>int</b> variable.
609  </P>  <pre>
610  <P>    PCRE_INFO_DEFAULTTABLES
611    </pre>
612    Return a pointer to the internal default character tables within PCRE. The
613    fourth argument should point to an <b>unsigned char *</b> variable. This
614    information call is provided for internal use by the <b>pcre_study()</b>
615    function. External callers can cause PCRE to use its internal tables by passing
616    a NULL table pointer.
617  <pre>  <pre>
618    PCRE_INFO_FIRSTBYTE    PCRE_INFO_FIRSTBYTE
619  </PRE>  </pre>
 </P>  
 <P>  
620  Return information about the first byte of any matched string, for a  Return information about the first byte of any matched string, for a
621  non-anchored pattern. (This option used to be called PCRE_INFO_FIRSTCHAR; the  non-anchored pattern. (This option used to be called PCRE_INFO_FIRSTCHAR; the
622  old name is still recognized for backwards compatibility.)  old name is still recognized for backwards compatibility.)
623  </P>  </P>
624  <P>  <P>
625  If there is a fixed first byte, e.g. from a pattern such as (cat|cow|coyote),  If there is a fixed first byte, for example, from a pattern such as
626  it is returned in the integer pointed to by <i>where</i>. Otherwise, if either  (cat|cow|coyote), it is returned in the integer pointed to by <i>where</i>.
627  </P>  Otherwise, if either
628  <P>  <br>
629    <br>
630  (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch  (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
631  starts with "^", or  starts with "^", or
632  </P>  <br>
633  <P>  <br>
634  (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set  (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
635  (if it were set, the pattern would be anchored),  (if it were set, the pattern would be anchored),
636  </P>  <br>
637  <P>  <br>
638  -1 is returned, indicating that the pattern matches only at the start of a  -1 is returned, indicating that the pattern matches only at the start of a
639  subject string or after any newline within the string. Otherwise -2 is  subject string or after any newline within the string. Otherwise -2 is
640  returned. For anchored patterns, -2 is returned.  returned. For anchored patterns, -2 is returned.
 </P>  
 <P>  
641  <pre>  <pre>
642    PCRE_INFO_FIRSTTABLE    PCRE_INFO_FIRSTTABLE
643  </PRE>  </pre>
 </P>  
 <P>  
644  If the pattern was studied, and this resulted in the construction of a 256-bit  If the pattern was studied, and this resulted in the construction of a 256-bit
645  table indicating a fixed set of bytes for the first byte in any matching  table indicating a fixed set of bytes for the first byte in any matching
646  string, a pointer to the table is returned. Otherwise NULL is returned. The  string, a pointer to the table is returned. Otherwise NULL is returned. The
647  fourth argument should point to an <b>unsigned char *</b> variable.  fourth argument should point to an <b>unsigned char *</b> variable.
 </P>  
 <P>  
648  <pre>  <pre>
649    PCRE_INFO_LASTLITERAL    PCRE_INFO_LASTLITERAL
650  </PRE>  </pre>
 </P>  
 <P>  
651  Return the value of the rightmost literal byte that must exist in any matched  Return the value of the rightmost literal byte that must exist in any matched
652  string, other than at its start, if such a byte has been recorded. The fourth  string, other than at its start, if such a byte has been recorded. The fourth
653  argument should point to an <b>int</b> variable. If there is no such byte, -1 is  argument should point to an <b>int</b> variable. If there is no such byte, -1 is
# Line 685  returned. For anchored patterns, a last Line 655  returned. For anchored patterns, a last
655  follows something of variable length. For example, for the pattern  follows something of variable length. For example, for the pattern
656  /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value  /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value
657  is -1.  is -1.
 </P>  
 <P>  
658  <pre>  <pre>
659    PCRE_INFO_NAMECOUNT    PCRE_INFO_NAMECOUNT
660    PCRE_INFO_NAMEENTRYSIZE    PCRE_INFO_NAMEENTRYSIZE
661    PCRE_INFO_NAMETABLE    PCRE_INFO_NAMETABLE
662  </PRE>  </pre>
 </P>  
 <P>  
663  PCRE supports the use of named as well as numbered capturing parentheses. The  PCRE supports the use of named as well as numbered capturing parentheses. The
664  names are just an additional way of identifying the parentheses, which still  names are just an additional way of identifying the parentheses, which still
665  acquire a number. A caller that wants to extract data from a named subpattern  acquire numbers. A convenience function called <b>pcre_get_named_substring()</b>
666  must convert the name to a number in order to access the correct pointers in  is provided for extracting an individual captured substring by name. It is also
667  the output vector (described with <b>pcre_exec()</b> below). In order to do  possible to extract the data directly, by first converting the name to a number
668  this, it must first use these three values to obtain the name-to-number mapping  in order to access the correct pointers in the output vector (described with
669  table for the pattern.  <b>pcre_exec()</b> below). To do the conversion, you need to use the
670    name-to-number map, which is described by these three values.
671  </P>  </P>
672  <P>  <P>
673  The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives  The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives
# Line 712  are the number of the capturing parenthe Line 679  are the number of the capturing parenthe
679  rest of the entry is the corresponding name, zero terminated. The names are in  rest of the entry is the corresponding name, zero terminated. The names are in
680  alphabetical order. For example, consider the following pattern (assume  alphabetical order. For example, consider the following pattern (assume
681  PCRE_EXTENDED is set, so white space - including newlines - is ignored):  PCRE_EXTENDED is set, so white space - including newlines - is ignored):
 </P>  
 <P>  
682  <pre>  <pre>
683    (?P&#60;date&#62; (?P&#60;year&#62;(\d\d)?\d\d) -    (?P&#60;date&#62; (?P&#60;year&#62;(\d\d)?\d\d) - (?P&#60;month&#62;\d\d) - (?P&#60;day&#62;\d\d) )
684    (?P&#60;month&#62;\d\d) - (?P&#60;day&#62;\d\d) )  </pre>
 </PRE>  
 </P>  
 <P>  
685  There are four named subpatterns, so the table has four entries, and each entry  There are four named subpatterns, so the table has four entries, and each entry
686  in the table is eight bytes long. The table is as follows, with non-printing  in the table is eight bytes long. The table is as follows, with non-printing
687  bytes shows in hex, and undefined bytes shown as ??:  bytes shows in hexadecimal, and undefined bytes shown as ??:
 </P>  
 <P>  
688  <pre>  <pre>
689    00 01 d  a  t  e  00 ??    00 01 d  a  t  e  00 ??
690    00 05 d  a  y  00 ?? ??    00 05 d  a  y  00 ?? ??
691    00 04 m  o  n  t  h  00    00 04 m  o  n  t  h  00
692    00 02 y  e  a  r  00 ??    00 02 y  e  a  r  00 ??
693  </PRE>  </pre>
694  </P>  When writing code to extract data from named subpatterns using the
695  <P>  name-to-number map, remember that the length of each entry is likely to be
696  When writing code to extract data from named subpatterns, remember that the  different for each compiled pattern.
 length of each entry may be different for each compiled pattern.  
 </P>  
 <P>  
697  <pre>  <pre>
698    PCRE_INFO_OPTIONS    PCRE_INFO_OPTIONS
699  </PRE>  </pre>
 </P>  
 <P>  
700  Return a copy of the options with which the pattern was compiled. The fourth  Return a copy of the options with which the pattern was compiled. The fourth
701  argument should point to an <b>unsigned long int</b> variable. These option bits  argument should point to an <b>unsigned long int</b> variable. These option bits
702  are those specified in the call to <b>pcre_compile()</b>, modified by any  are those specified in the call to <b>pcre_compile()</b>, modified by any
# Line 750  top-level option settings within the pat Line 705  top-level option settings within the pat
705  <P>  <P>
706  A pattern is automatically anchored by PCRE if all of its top-level  A pattern is automatically anchored by PCRE if all of its top-level
707  alternatives begin with one of the following:  alternatives begin with one of the following:
 </P>  
 <P>  
708  <pre>  <pre>
709    ^     unless PCRE_MULTILINE is set    ^     unless PCRE_MULTILINE is set
710    \A    always    \A    always
711    \G    always    \G    always
712    .*    if PCRE_DOTALL is set and there are no back    .*    if PCRE_DOTALL is set and there are no back references to the subpattern in which .* appears
713            references to the subpattern in which .* appears  </pre>
 </PRE>  
 </P>  
 <P>  
714  For such patterns, the PCRE_ANCHORED bit is set in the options returned by  For such patterns, the PCRE_ANCHORED bit is set in the options returned by
715  <b>pcre_fullinfo()</b>.  <b>pcre_fullinfo()</b>.
 </P>  
 <P>  
716  <pre>  <pre>
717    PCRE_INFO_SIZE    PCRE_INFO_SIZE
718  </PRE>  </pre>
 </P>  
 <P>  
719  Return the size of the compiled pattern, that is, the value that was passed as  Return the size of the compiled pattern, that is, the value that was passed as
720  the argument to <b>pcre_malloc()</b> when PCRE was getting memory in which to  the argument to <b>pcre_malloc()</b> when PCRE was getting memory in which to
721  place the compiled data. The fourth argument should point to a <b>size_t</b>  place the compiled data. The fourth argument should point to a <b>size_t</b>
722  variable.  variable.
 </P>  
 <P>  
723  <pre>  <pre>
724    PCRE_INFO_STUDYSIZE    PCRE_INFO_STUDYSIZE
725  </PRE>  </pre>
726  </P>  Return the size of the data block pointed to by the <i>study_data</i> field in
 <P>  
 Returns the size of the data block pointed to by the <i>study_data</i> field in  
727  a <b>pcre_extra</b> block. That is, it is the value that was passed to  a <b>pcre_extra</b> block. That is, it is the value that was passed to
728  <b>pcre_malloc()</b> when PCRE was getting memory into which to place the data  <b>pcre_malloc()</b> when PCRE was getting memory into which to place the data
729  created by <b>pcre_study()</b>. The fourth argument should point to a  created by <b>pcre_study()</b>. The fourth argument should point to a
730  <b>size_t</b> variable.  <b>size_t</b> variable.
731  </P>  </P>
732  <br><a name="SEC9" href="#TOC1">OBSOLETE INFO FUNCTION</a><br>  <br><a name="SEC10" href="#TOC1">OBSOLETE INFO FUNCTION</a><br>
733  <P>  <P>
734  <b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b>  <b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b>
735  <b>*<i>firstcharptr</i>);</b>  <b>*<i>firstcharptr</i>);</b>
# Line 798  restrictive to return all the available Line 740  restrictive to return all the available
740  programs should use <b>pcre_fullinfo()</b> instead. The yield of  programs should use <b>pcre_fullinfo()</b> instead. The yield of
741  <b>pcre_info()</b> is the number of capturing subpatterns, or one of the  <b>pcre_info()</b> is the number of capturing subpatterns, or one of the
742  following negative numbers:  following negative numbers:
 </P>  
 <P>  
743  <pre>  <pre>
744    PCRE_ERROR_NULL       the argument <i>code</i> was NULL    PCRE_ERROR_NULL       the argument <i>code</i> was NULL
745    PCRE_ERROR_BADMAGIC   the "magic number" was not found    PCRE_ERROR_BADMAGIC   the "magic number" was not found
746  </PRE>  </pre>
 </P>  
 <P>  
747  If the <i>optptr</i> argument is not NULL, a copy of the options with which the  If the <i>optptr</i> argument is not NULL, a copy of the options with which the
748  pattern was compiled is placed in the integer it points to (see  pattern was compiled is placed in the integer it points to (see
749  PCRE_INFO_OPTIONS above).  PCRE_INFO_OPTIONS above).
# Line 815  If the pattern is not anchored and the < Line 753  If the pattern is not anchored and the <
753  it is used to pass back information about the first character of any matched  it is used to pass back information about the first character of any matched
754  string (see PCRE_INFO_FIRSTBYTE above).  string (see PCRE_INFO_FIRSTBYTE above).
755  </P>  </P>
756  <br><a name="SEC10" href="#TOC1">MATCHING A PATTERN</a><br>  <br><a name="SEC11" href="#TOC1">MATCHING A PATTERN</a><br>
757  <P>  <P>
758  <b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>  <b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
759  <b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>  <b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
# Line 823  string (see PCRE_INFO_FIRSTBYTE above). Line 761  string (see PCRE_INFO_FIRSTBYTE above).
761  </P>  </P>
762  <P>  <P>
763  The function <b>pcre_exec()</b> is called to match a subject string against a  The function <b>pcre_exec()</b> is called to match a subject string against a
764  pre-compiled pattern, which is passed in the <i>code</i> argument. If the  compiled pattern, which is passed in the <i>code</i> argument. If the
765  pattern has been studied, the result of the study should be passed in the  pattern has been studied, the result of the study should be passed in the
766  <i>extra</i> argument.  <i>extra</i> argument.
767  </P>  </P>
768  <P>  <P>
769  Here is an example of a simple call to <b>pcre_exec()</b>:  In most applications, the pattern will have been compiled (and optionally
770    studied) in the same process that calls <b>pcre_exec()</b>. However, it is
771    possible to save compiled patterns and study data, and then use them later
772    in different processes, possibly even on different hosts. For a discussion
773    about this, see the
774    <a href="pcreprecompile.html"><b>pcreprecompile</b></a>
775    documentation.
776  </P>  </P>
777  <P>  <P>
778    Here is an example of a simple call to <b>pcre_exec()</b>:
779  <pre>  <pre>
780    int rc;    int rc;
781    int ovector[30];    int ovector[30];
# Line 841  Here is an example of a simple call to < Line 786  Here is an example of a simple call to <
786      11,             /* the length of the subject string */      11,             /* the length of the subject string */
787      0,              /* start at offset 0 in the subject */      0,              /* start at offset 0 in the subject */
788      0,              /* default options */      0,              /* default options */
789      ovector,        /* vector for substring information */      ovector,        /* vector of integers for substring information */
790      30);            /* number of elements in the vector */      30);            /* number of elements in the vector (NOT size in bytes) */
791  </PRE>  <a name="extradata"></a></PRE>
792  </P>  </P>
793    <br><b>
794    Extra data for <b>pcre_exec()</b>
795    </b><br>
796  <P>  <P>
797  If the <i>extra</i> argument is not NULL, it must point to a <b>pcre_extra</b>  If the <i>extra</i> argument is not NULL, it must point to a <b>pcre_extra</b>
798  data block. The <b>pcre_study()</b> function returns such a block (when it  data block. The <b>pcre_study()</b> function returns such a block (when it
799  doesn't return NULL), but you can also create one for yourself, and pass  doesn't return NULL), but you can also create one for yourself, and pass
800  additional information in it. The fields in the block are as follows:  additional information in it. The fields in a <b>pcre_extra</b> block are as
801  </P>  follows:
 <P>  
802  <pre>  <pre>
803    unsigned long int <i>flags</i>;    unsigned long int <i>flags</i>;
804    void *<i>study_data</i>;    void *<i>study_data</i>;
805    unsigned long int <i>match_limit</i>;    unsigned long int <i>match_limit</i>;
806    void *<i>callout_data</i>;    void *<i>callout_data</i>;
807  </PRE>    const unsigned char *<i>tables</i>;
808  </P>  </pre>
 <P>  
809  The <i>flags</i> field is a bitmap that specifies which of the other fields  The <i>flags</i> field is a bitmap that specifies which of the other fields
810  are set. The flag bits are:  are set. The flag bits are:
 </P>  
 <P>  
811  <pre>  <pre>
812    PCRE_EXTRA_STUDY_DATA    PCRE_EXTRA_STUDY_DATA
813    PCRE_EXTRA_MATCH_LIMIT    PCRE_EXTRA_MATCH_LIMIT
814    PCRE_EXTRA_CALLOUT_DATA    PCRE_EXTRA_CALLOUT_DATA
815  </PRE>    PCRE_EXTRA_TABLES
816  </P>  </pre>
 <P>  
817  Other flag bits should be set to zero. The <i>study_data</i> field is set in the  Other flag bits should be set to zero. The <i>study_data</i> field is set in the
818  <b>pcre_extra</b> block that is returned by <b>pcre_study()</b>, together with  <b>pcre_extra</b> block that is returned by <b>pcre_study()</b>, together with
819  the appropriate flag bit. You should not set this yourself, but you can add to  the appropriate flag bit. You should not set this yourself, but you may add to
820  the block by setting the other fields.  the block by setting the other fields and their corresponding flag bits.
821  </P>  </P>
822  <P>  <P>
823  The <i>match_limit</i> field provides a means of preventing PCRE from using up a  The <i>match_limit</i> field provides a means of preventing PCRE from using up a
824  vast amount of resources when running patterns that are not going to match,  vast amount of resources when running patterns that are not going to match,
825  but which have a very large number of possibilities in their search trees. The  but which have a very large number of possibilities in their search trees. The
826  classic example is the use of nested unlimited repeats. Internally, PCRE uses a  classic example is the use of nested unlimited repeats.
827  function called <b>match()</b> which it calls repeatedly (sometimes  </P>
828  recursively). The limit is imposed on the number of times this function is  <P>
829  called during a match, which has the effect of limiting the amount of recursion  Internally, PCRE uses a function called <b>match()</b> which it calls repeatedly
830  and backtracking that can take place. For patterns that are not anchored, the  (sometimes recursively). The limit is imposed on the number of times this
831  count starts from zero for each position in the subject string.  function is called during a match, which has the effect of limiting the amount
832    of recursion and backtracking that can take place. For patterns that are not
833    anchored, the count starts from zero for each position in the subject string.
834  </P>  </P>
835  <P>  <P>
836  The default limit for the library can be set when PCRE is built; the default  The default limit for the library can be set when PCRE is built; the default
837  default is 10 million, which handles all but the most extreme cases. You can  default is 10 million, which handles all but the most extreme cases. You can
838  reduce the default by suppling <b>pcre_exec()</b> with a \fRpcre_extra\fR block  reduce the default by suppling <b>pcre_exec()</b> with a <b>pcre_extra</b> block
839  in which <i>match_limit</i> is set to a smaller value, and  in which <i>match_limit</i> is set to a smaller value, and
840  PCRE_EXTRA_MATCH_LIMIT is set in the <i>flags</i> field. If the limit is  PCRE_EXTRA_MATCH_LIMIT is set in the <i>flags</i> field. If the limit is
841  exceeded, <b>pcre_exec()</b> returns PCRE_ERROR_MATCHLIMIT.  exceeded, <b>pcre_exec()</b> returns PCRE_ERROR_MATCHLIMIT.
842  </P>  </P>
843  <P>  <P>
844  The <i>pcre_callout</i> field is used in conjunction with the "callout" feature,  The <i>pcre_callout</i> field is used in conjunction with the "callout" feature,
845  which is described in the <b>pcrecallout</b> documentation.  which is described in the
846  </P>  <a href="pcrecallout.html"><b>pcrecallout</b></a>
847  <P>  documentation.
 The PCRE_ANCHORED option can be passed in the <i>options</i> argument, whose  
 unused bits must be zero. This limits <b>pcre_exec()</b> to matching at the  
 first matching position. However, if a pattern was compiled with PCRE_ANCHORED,  
 or turned out to be anchored by virtue of its contents, it cannot be made  
 unachored at matching time.  
 </P>  
 <P>  
 When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8  
 string is automatically checked, and the value of <i>startoffset</i> is also  
 checked to ensure that it points to the start of a UTF-8 character. If an  
 invalid UTF-8 sequence of bytes is found, <b>pcre_exec()</b> returns the error  
 PCRE_ERROR_BADUTF8. If <i>startoffset</i> contains an invalid value,  
 PCRE_ERROR_BADUTF8_OFFSET is returned.  
 </P>  
 <P>  
 If you already know that your subject is valid, and you want to skip these  
 checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when  
 calling <b>pcre_exec()</b>. You might want to do this for the second and  
 subsequent calls to <b>pcre_exec()</b> if you are making repeated calls to find  
 all the matches in a single subject string. However, you should be sure that  
 the value of <i>startoffset</i> points to the start of a UTF-8 character. When  
 PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a  
 subject, or a value of <i>startoffset</i> that does not point to the start of a  
 UTF-8 character, is undefined. Your program may crash.  
 </P>  
 <P>  
 There are also three further options that can be set only at matching time:  
848  </P>  </P>
849  <P>  <P>
850    The <i>tables</i> field is used to pass a character tables pointer to
851    <b>pcre_exec()</b>; this overrides the value that is stored with the compiled
852    pattern. A non-NULL value is stored with the compiled pattern only if custom
853    tables were supplied to <b>pcre_compile()</b> via its <i>tableptr</i> argument.
854    If NULL is passed to <b>pcre_exec()</b> using this mechanism, it forces PCRE's
855    internal tables to be used. This facility is helpful when re-using patterns
856    that have been saved after compiling with an external set of tables, because
857    the external tables might be at a different address when <b>pcre_exec()</b> is
858    called. See the
859    <a href="pcreprecompile.html"><b>pcreprecompile</b></a>
860    documentation for a discussion of saving compiled patterns for later use.
861    </P>
862    <br><b>
863    Option bits for <b>pcre_exec()</b>
864    </b><br>
865    <P>
866    The unused bits of the <i>options</i> argument for <b>pcre_exec()</b> must be
867    zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NOTBOL,
868    PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.
869    <pre>
870      PCRE_ANCHORED
871    </pre>
872    The PCRE_ANCHORED option limits <b>pcre_exec()</b> to matching at the first
873    matching position. If a pattern was compiled with PCRE_ANCHORED, or turned out
874    to be anchored by virtue of its contents, it cannot be made unachored at
875    matching time.
876  <pre>  <pre>
877    PCRE_NOTBOL    PCRE_NOTBOL
878  </PRE>  </pre>
879  </P>  This option specifies that first character of the subject string is not the
880  <P>  beginning of a line, so the circumflex metacharacter should not match before
881  The first character of the string is not the beginning of a line, so the  it. Setting this without PCRE_MULTILINE (at compile time) causes circumflex
882  circumflex metacharacter should not match before it. Setting this without  never to match. This option affects only the behaviour of the circumflex
883  PCRE_MULTILINE (at compile time) causes circumflex never to match.  metacharacter. It does not affect \A.
 </P>  
 <P>  
884  <pre>  <pre>
885    PCRE_NOTEOL    PCRE_NOTEOL
886  </PRE>  </pre>
887  </P>  This option specifies that the end of the subject string is not the end of a
888  <P>  line, so the dollar metacharacter should not match it nor (except in multiline
889  The end of the string is not the end of a line, so the dollar metacharacter  mode) a newline immediately before it. Setting this without PCRE_MULTILINE (at
890  should not match it nor (except in multiline mode) a newline immediately before  compile time) causes dollar never to match. This option affects only the
891  it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never  behaviour of the dollar metacharacter. It does not affect \Z or \z.
 to match.  
 </P>  
 <P>  
892  <pre>  <pre>
893    PCRE_NOTEMPTY    PCRE_NOTEMPTY
894  </PRE>  </pre>
 </P>  
 <P>  
895  An empty string is not considered to be a valid match if this option is set. If  An empty string is not considered to be a valid match if this option is set. If
896  there are alternatives in the pattern, they are tried. If all the alternatives  there are alternatives in the pattern, they are tried. If all the alternatives
897  match the empty string, the entire match fails. For example, if the pattern  match the empty string, the entire match fails. For example, if the pattern
 </P>  
 <P>  
898  <pre>  <pre>
899    a?b?    a?b?
900  </PRE>  </pre>
 </P>  
 <P>  
901  is applied to a string not beginning with "a" or "b", it matches the empty  is applied to a string not beginning with "a" or "b", it matches the empty
902  string at the start of the subject. With PCRE_NOTEMPTY set, this match is not  string at the start of the subject. With PCRE_NOTEMPTY set, this match is not
903  valid, so PCRE searches further into the string for occurrences of "a" or "b".  valid, so PCRE searches further into the string for occurrences of "a" or "b".
# Line 974  Perl has no direct equivalent of PCRE_NO Line 907  Perl has no direct equivalent of PCRE_NO
907  of a pattern match of the empty string within its <b>split()</b> function, and  of a pattern match of the empty string within its <b>split()</b> function, and
908  when using the /g modifier. It is possible to emulate Perl's behaviour after  when using the /g modifier. It is possible to emulate Perl's behaviour after
909  matching a null string by first trying the match again at the same offset with  matching a null string by first trying the match again at the same offset with
910  PCRE_NOTEMPTY set, and then if that fails by advancing the starting offset (see  PCRE_NOTEMPTY and PCRE_ANCHORED, and then if that fails by advancing the
911  below) and trying an ordinary match again.  starting offset (see below) and trying an ordinary match again. There is some
912    code that demonstrates how to do this in the <i>pcredemo.c</i> sample program.
913    <pre>
914      PCRE_NO_UTF8_CHECK
915    </pre>
916    When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
917    string is automatically checked when <b>pcre_exec()</b> is subsequently called.
918    The value of <i>startoffset</i> is also checked to ensure that it points to the
919    start of a UTF-8 character. If an invalid UTF-8 sequence of bytes is found,
920    <b>pcre_exec()</b> returns the error PCRE_ERROR_BADUTF8. If <i>startoffset</i>
921    contains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.
922  </P>  </P>
923  <P>  <P>
924  The subject string is passed to <b>pcre_exec()</b> as a pointer in  If you already know that your subject is valid, and you want to skip these
925  <i>subject</i>, a length in <i>length</i>, and a starting byte offset in  checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when
926  <i>startoffset</i>. Unlike the pattern string, the subject may contain binary  calling <b>pcre_exec()</b>. You might want to do this for the second and
927  zero bytes. When the starting offset is zero, the search for a match starts at  subsequent calls to <b>pcre_exec()</b> if you are making repeated calls to find
928  the beginning of the subject, and this is by far the most common case.  all the matches in a single subject string. However, you should be sure that
929    the value of <i>startoffset</i> points to the start of a UTF-8 character. When
930    PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a
931    subject, or a value of <i>startoffset</i> that does not point to the start of a
932    UTF-8 character, is undefined. Your program may crash.
933    <pre>
934      PCRE_PARTIAL
935    </pre>
936    This option turns on the partial matching feature. If the subject string fails
937    to match the pattern, but at some point during the matching process the end of
938    the subject was reached (that is, the subject partially matches the pattern and
939    the failure to match occurred only because there were not enough subject
940    characters), <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL instead of
941    PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is used, there are restrictions on what
942    may appear in the pattern. These are discussed in the
943    <a href="pcrepartial.html"><b>pcrepartial</b></a>
944    documentation.
945  </P>  </P>
946    <br><b>
947    The string to be matched by <b>pcre_exec()</b>
948    </b><br>
949  <P>  <P>
950  If the pattern was compiled with the PCRE_UTF8 option, the subject must be a  The subject string is passed to <b>pcre_exec()</b> as a pointer in
951  sequence of bytes that is a valid UTF-8 string, and the starting offset must  <i>subject</i>, a length in <i>length</i>, and a starting byte offset in
952  point to the beginning of a UTF-8 character. If an invalid UTF-8 string or  <i>startoffset</i>. In UTF-8 mode, the byte offset must point to the start of a
953  offset is passed, an error (either PCRE_ERROR_BADUTF8 or  UTF-8 character. Unlike the pattern string, the subject may contain binary zero
954  PCRE_ERROR_BADUTF8_OFFSET) is returned, unless the option PCRE_NO_UTF8_CHECK is  bytes. When the starting offset is zero, the search for a match starts at the
955  set, in which case PCRE's behaviour is not defined.  beginning of the subject, and this is by far the most common case.
956  </P>  </P>
957  <P>  <P>
958  A non-zero starting offset is useful when searching for another match in the  A non-zero starting offset is useful when searching for another match in the
# Line 998  same subject by calling pcre_exec() Line 960  same subject by calling pcre_exec()
960  Setting <i>startoffset</i> differs from just passing over a shortened string and  Setting <i>startoffset</i> differs from just passing over a shortened string and
961  setting PCRE_NOTBOL in the case of a pattern that begins with any kind of  setting PCRE_NOTBOL in the case of a pattern that begins with any kind of
962  lookbehind. For example, consider the pattern  lookbehind. For example, consider the pattern
 </P>  
 <P>  
963  <pre>  <pre>
964    \Biss\B    \Biss\B
965  </PRE>  </pre>
 </P>  
 <P>  
966  which finds occurrences of "iss" in the middle of words. (\B matches only if  which finds occurrences of "iss" in the middle of words. (\B matches only if
967  the current position in the subject is not a word boundary.) When applied to  the current position in the subject is not a word boundary.) When applied to
968  the string "Mississipi" the first call to <b>pcre_exec()</b> finds the first  the string "Mississipi" the first call to <b>pcre_exec()</b> finds the first
# Line 1017  behind the starting point to discover th Line 975  behind the starting point to discover th
975  </P>  </P>
976  <P>  <P>
977  If a non-zero starting offset is passed when the pattern is anchored, one  If a non-zero starting offset is passed when the pattern is anchored, one
978  attempt to match at the given offset is tried. This can only succeed if the  attempt to match at the given offset is made. This can only succeed if the
979  pattern does not require the match to be at the start of the subject.  pattern does not require the match to be at the start of the subject.
980  </P>  </P>
981    <br><b>
982    How <b>pcre_exec()</b> returns captured substrings
983    </b><br>
984  <P>  <P>
985  In general, a pattern matches a certain portion of the subject, and in  In general, a pattern matches a certain portion of the subject, and in
986  addition, further substrings from the subject may be picked out by parts of the  addition, further substrings from the subject may be picked out by parts of the
# Line 1031  kinds of parenthesized subpattern that d Line 992  kinds of parenthesized subpattern that d
992  <P>  <P>
993  Captured substrings are returned to the caller via a vector of integer offsets  Captured substrings are returned to the caller via a vector of integer offsets
994  whose address is passed in <i>ovector</i>. The number of elements in the vector  whose address is passed in <i>ovector</i>. The number of elements in the vector
995  is passed in <i>ovecsize</i>. The first two-thirds of the vector is used to pass  is passed in <i>ovecsize</i>, which must be a non-negative number. <b>Note</b>:
996  back captured substrings, each substring using a pair of integers. The  this argument is NOT the size of <i>ovector</i> in bytes.
 remaining third of the vector is used as workspace by <b>pcre_exec()</b> while  
 matching capturing subpatterns, and is not available for passing back  
 information. The length passed in <i>ovecsize</i> should always be a multiple of  
 three. If it is not, it is rounded down.  
997  </P>  </P>
998  <P>  <P>
999  When a match has been successful, information about captured substrings is  The first two-thirds of the vector is used to pass back captured substrings,
1000  returned in pairs of integers, starting at the beginning of <i>ovector</i>, and  each substring using a pair of integers. The remaining third of the vector is
1001    used as workspace by <b>pcre_exec()</b> while matching capturing subpatterns,
1002    and is not available for passing back information. The length passed in
1003    <i>ovecsize</i> should always be a multiple of three. If it is not, it is
1004    rounded down.
1005    </P>
1006    <P>
1007    When a match is successful, information about captured substrings is returned
1008    in pairs of integers, starting at the beginning of <i>ovector</i>, and
1009  continuing up to two-thirds of its length at the most. The first element of a  continuing up to two-thirds of its length at the most. The first element of a
1010  pair is set to the offset of the first character in a substring, and the second  pair is set to the offset of the first character in a substring, and the second
1011  is set to the offset of the first character after the end of a substring. The  is set to the offset of the first character after the end of a substring. The
# Line 1064  values corresponding to the unused subpa Line 1029  values corresponding to the unused subpa
1029  </P>  </P>
1030  <P>  <P>
1031  If a capturing subpattern is matched repeatedly, it is the last portion of the  If a capturing subpattern is matched repeatedly, it is the last portion of the
1032  string that it matched that gets returned.  string that it matched that is returned.
1033  </P>  </P>
1034  <P>  <P>
1035  If the vector is too small to hold all the captured substrings, it is used as  If the vector is too small to hold all the captured substring offsets, it is
1036  far as possible (up to two-thirds of its length), and the function returns a  used as far as possible (up to two-thirds of its length), and the function
1037  value of zero. In particular, if the substring offsets are not of interest,  returns a value of zero. In particular, if the substring offsets are not of
1038  <b>pcre_exec()</b> may be called with <i>ovector</i> passed as NULL and  interest, <b>pcre_exec()</b> may be called with <i>ovector</i> passed as NULL and
1039  <i>ovecsize</i> as zero. However, if the pattern contains back references and  <i>ovecsize</i> as zero. However, if the pattern contains back references and
1040  the <i>ovector</i> isn't big enough to remember the related substrings, PCRE has  the <i>ovector</i> is not big enough to remember the related substrings, PCRE
1041  to get additional memory for use during matching. Thus it is usually advisable  has to get additional memory for use during matching. Thus it is usually
1042  to supply an <i>ovector</i>.  advisable to supply an <i>ovector</i>.
1043  </P>  </P>
1044  <P>  <P>
1045  Note that <b>pcre_info()</b> can be used to find out how many capturing  Note that <b>pcre_info()</b> can be used to find out how many capturing
# Line 1082  subpatterns there are in a compiled patt Line 1047  subpatterns there are in a compiled patt
1047  <i>ovector</i> that will allow for <i>n</i> captured substrings, in addition to  <i>ovector</i> that will allow for <i>n</i> captured substrings, in addition to
1048  the offsets of the substring matched by the whole pattern, is (<i>n</i>+1)*3.  the offsets of the substring matched by the whole pattern, is (<i>n</i>+1)*3.
1049  </P>  </P>
1050    <br><b>
1051    Return values from <b>pcre_exec()</b>
1052    </b><br>
1053  <P>  <P>
1054  If <b>pcre_exec()</b> fails, it returns a negative number. The following are  If <b>pcre_exec()</b> fails, it returns a negative number. The following are
1055  defined in the header file:  defined in the header file:
 </P>  
 <P>  
1056  <pre>  <pre>
1057    PCRE_ERROR_NOMATCH        (-1)    PCRE_ERROR_NOMATCH        (-1)
1058  </PRE>  </pre>
 </P>  
 <P>  
1059  The subject string did not match the pattern.  The subject string did not match the pattern.
 </P>  
 <P>  
1060  <pre>  <pre>
1061    PCRE_ERROR_NULL           (-2)    PCRE_ERROR_NULL           (-2)
1062  </PRE>  </pre>
 </P>  
 <P>  
1063  Either <i>code</i> or <i>subject</i> was passed as NULL, or <i>ovector</i> was  Either <i>code</i> or <i>subject</i> was passed as NULL, or <i>ovector</i> was
1064  NULL and <i>ovecsize</i> was not zero.  NULL and <i>ovecsize</i> was not zero.
 </P>  
 <P>  
1065  <pre>  <pre>
1066    PCRE_ERROR_BADOPTION      (-3)    PCRE_ERROR_BADOPTION      (-3)
1067  </PRE>  </pre>
 </P>  
 <P>  
1068  An unrecognized bit was set in the <i>options</i> argument.  An unrecognized bit was set in the <i>options</i> argument.
 </P>  
 <P>  
1069  <pre>  <pre>
1070    PCRE_ERROR_BADMAGIC       (-4)    PCRE_ERROR_BADMAGIC       (-4)
1071  </PRE>  </pre>
 </P>  
 <P>  
1072  PCRE stores a 4-byte "magic number" at the start of the compiled code, to catch  PCRE stores a 4-byte "magic number" at the start of the compiled code, to catch
1073  the case when it is passed a junk pointer. This is the error it gives when the  the case when it is passed a junk pointer and to detect when a pattern that was
1074  magic number isn't present.  compiled in an environment of one endianness is run in an environment with the
1075  </P>  other endianness. This is the error that PCRE gives when the magic number is
1076  <P>  not present.
1077  <pre>  <pre>
1078    PCRE_ERROR_UNKNOWN_NODE   (-5)    PCRE_ERROR_UNKNOWN_NODE   (-5)
1079  </PRE>  </pre>
 </P>  
 <P>  
1080  While running the pattern match, an unknown item was encountered in the  While running the pattern match, an unknown item was encountered in the
1081  compiled pattern. This error could be caused by a bug in PCRE or by overwriting  compiled pattern. This error could be caused by a bug in PCRE or by overwriting
1082  of the compiled pattern.  of the compiled pattern.
 </P>  
 <P>  
1083  <pre>  <pre>
1084    PCRE_ERROR_NOMEMORY       (-6)    PCRE_ERROR_NOMEMORY       (-6)
1085  </PRE>  </pre>
 </P>  
 <P>  
1086  If a pattern contains back references, but the <i>ovector</i> that is passed to  If a pattern contains back references, but the <i>ovector</i> that is passed to
1087  <b>pcre_exec()</b> is not big enough to remember the referenced substrings, PCRE  <b>pcre_exec()</b> is not big enough to remember the referenced substrings, PCRE
1088  gets a block of memory at the start of matching to use for this purpose. If the  gets a block of memory at the start of matching to use for this purpose. If the
1089  call via <b>pcre_malloc()</b> fails, this error is given. The memory is freed at  call via <b>pcre_malloc()</b> fails, this error is given. The memory is
1090  the end of matching.  automatically freed at the end of matching.
 </P>  
 <P>  
1091  <pre>  <pre>
1092    PCRE_ERROR_NOSUBSTRING    (-7)    PCRE_ERROR_NOSUBSTRING    (-7)
1093  </PRE>  </pre>
 </P>  
 <P>  
1094  This error is used by the <b>pcre_copy_substring()</b>,  This error is used by the <b>pcre_copy_substring()</b>,
1095  <b>pcre_get_substring()</b>, and <b>pcre_get_substring_list()</b> functions (see  <b>pcre_get_substring()</b>, and <b>pcre_get_substring_list()</b> functions (see
1096  below). It is never returned by <b>pcre_exec()</b>.  below). It is never returned by <b>pcre_exec()</b>.
 </P>  
 <P>  
1097  <pre>  <pre>
1098    PCRE_ERROR_MATCHLIMIT     (-8)    PCRE_ERROR_MATCHLIMIT     (-8)
1099  </PRE>  </pre>
 </P>  
 <P>  
1100  The recursion and backtracking limit, as specified by the <i>match_limit</i>  The recursion and backtracking limit, as specified by the <i>match_limit</i>
1101  field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the  field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the
1102  description above.  description above.
 </P>  
 <P>  
1103  <pre>  <pre>
1104    PCRE_ERROR_CALLOUT        (-9)    PCRE_ERROR_CALLOUT        (-9)
1105  </PRE>  </pre>
 </P>  
 <P>  
1106  This error is never generated by <b>pcre_exec()</b> itself. It is provided for  This error is never generated by <b>pcre_exec()</b> itself. It is provided for
1107  use by callout functions that want to yield a distinctive error code. See the  use by callout functions that want to yield a distinctive error code. See the
1108  <b>pcrecallout</b> documentation for details.  <a href="pcrecallout.html"><b>pcrecallout</b></a>
1109  </P>  documentation for details.
 <P>  
1110  <pre>  <pre>
1111    PCRE_ERROR_BADUTF8        (-10)    PCRE_ERROR_BADUTF8        (-10)
1112  </PRE>  </pre>
 </P>  
 <P>  
1113  A string that contains an invalid UTF-8 byte sequence was passed as a subject.  A string that contains an invalid UTF-8 byte sequence was passed as a subject.
 </P>  
 <P>  
1114  <pre>  <pre>
1115    PCRE_ERROR_BADUTF8_OFFSET (-11)    PCRE_ERROR_BADUTF8_OFFSET (-11)
1116  </PRE>  </pre>
 </P>  
 <P>  
1117  The UTF-8 byte sequence that was passed as a subject was valid, but the value  The UTF-8 byte sequence that was passed as a subject was valid, but the value
1118  of <i>startoffset</i> did not point to the beginning of a UTF-8 character.  of <i>startoffset</i> did not point to the beginning of a UTF-8 character.
1119    <pre>
1120      PCRE_ERROR_PARTIAL (-12)
1121    </pre>
1122    The subject string did not match, but it did match partially. See the
1123    <a href="pcrepartial.html"><b>pcrepartial</b></a>
1124    documentation for details of partial matching.
1125    <pre>
1126      PCRE_ERROR_BAD_PARTIAL (-13)
1127    </pre>
1128    The PCRE_PARTIAL option was used with a compiled pattern containing items that
1129    are not supported for partial matching. See the
1130    <a href="pcrepartial.html"><b>pcrepartial</b></a>
1131    documentation for details of partial matching.
1132    <pre>
1133      PCRE_ERROR_INTERNAL (-14)
1134    </pre>
1135    An unexpected internal error has occurred. This error could be caused by a bug
1136    in PCRE or by overwriting of the compiled pattern.
1137    <pre>
1138      PCRE_ERROR_BADCOUNT (-15)
1139    </pre>
1140    This error is given if the value of the <i>ovecsize</i> argument is negative.
1141  </P>  </P>
1142  <br><a name="SEC11" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>  <br><a name="SEC12" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br>
1143  <P>  <P>
1144  <b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>  <b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
1145  <b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>  <b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
# Line 1218  a C string. Line 1167  a C string.
1167  </P>  </P>
1168  <P>  <P>
1169  The first three arguments are the same for all three of these functions:  The first three arguments are the same for all three of these functions:
1170  <i>subject</i> is the subject string which has just been successfully matched,  <i>subject</i> is the subject string that has just been successfully matched,
1171  <i>ovector</i> is a pointer to the vector of integer offsets that was passed to  <i>ovector</i> is a pointer to the vector of integer offsets that was passed to
1172  <b>pcre_exec()</b>, and <i>stringcount</i> is the number of substrings that were  <b>pcre_exec()</b>, and <i>stringcount</i> is the number of substrings that were
1173  captured by the match, including the substring that matched the entire regular  captured by the match, including the substring that matched the entire regular
1174  expression. This is the value returned by <b>pcre_exec</b> if it is greater than  expression. This is the value returned by <b>pcre_exec()</b> if it is greater
1175  zero. If <b>pcre_exec()</b> returned zero, indicating that it ran out of space  than zero. If <b>pcre_exec()</b> returned zero, indicating that it ran out of
1176  in <i>ovector</i>, the value passed as <i>stringcount</i> should be the size of  space in <i>ovector</i>, the value passed as <i>stringcount</i> should be the
1177  the vector divided by three.  number of elements in the vector divided by three.
1178  </P>  </P>
1179  <P>  <P>
1180  The functions <b>pcre_copy_substring()</b> and <b>pcre_get_substring()</b>  The functions <b>pcre_copy_substring()</b> and <b>pcre_get_substring()</b>
1181  extract a single substring, whose number is given as <i>stringnumber</i>. A  extract a single substring, whose number is given as <i>stringnumber</i>. A
1182  value of zero extracts the substring that matched the entire pattern, while  value of zero extracts the substring that matched the entire pattern, whereas
1183  higher values extract the captured substrings. For <b>pcre_copy_substring()</b>,  higher values extract the captured substrings. For <b>pcre_copy_substring()</b>,
1184  the string is placed in <i>buffer</i>, whose length is given by  the string is placed in <i>buffer</i>, whose length is given by
1185  <i>buffersize</i>, while for <b>pcre_get_substring()</b> a new block of memory is  <i>buffersize</i>, while for <b>pcre_get_substring()</b> a new block of memory is
1186  obtained via <b>pcre_malloc</b>, and its address is returned via  obtained via <b>pcre_malloc</b>, and its address is returned via
1187  <i>stringptr</i>. The yield of the function is the length of the string, not  <i>stringptr</i>. The yield of the function is the length of the string, not
1188  including the terminating zero, or one of  including the terminating zero, or one of
 </P>  
 <P>  
1189  <pre>  <pre>
1190    PCRE_ERROR_NOMEMORY       (-6)    PCRE_ERROR_NOMEMORY       (-6)
1191  </PRE>  </pre>
 </P>  
 <P>  
1192  The buffer was too small for <b>pcre_copy_substring()</b>, or the attempt to get  The buffer was too small for <b>pcre_copy_substring()</b>, or the attempt to get
1193  memory failed for <b>pcre_get_substring()</b>.  memory failed for <b>pcre_get_substring()</b>.
 </P>  
 <P>  
1194  <pre>  <pre>
1195    PCRE_ERROR_NOSUBSTRING    (-7)    PCRE_ERROR_NOSUBSTRING    (-7)
1196  </PRE>  </pre>
 </P>  
 <P>  
1197  There is no substring whose number is <i>stringnumber</i>.  There is no substring whose number is <i>stringnumber</i>.
1198  </P>  </P>
1199  <P>  <P>
1200  The <b>pcre_get_substring_list()</b> function extracts all available substrings  The <b>pcre_get_substring_list()</b> function extracts all available substrings
1201  and builds a list of pointers to them. All this is done in a single block of  and builds a list of pointers to them. All this is done in a single block of
1202  memory which is obtained via <b>pcre_malloc</b>. The address of the memory block  memory that is obtained via <b>pcre_malloc</b>. The address of the memory block
1203  is returned via <i>listptr</i>, which is also the start of the list of string  is returned via <i>listptr</i>, which is also the start of the list of string
1204  pointers. The end of the list is marked by a NULL pointer. The yield of the  pointers. The end of the list is marked by a NULL pointer. The yield of the
1205  function is zero if all went well, or  function is zero if all went well, or
 </P>  
 <P>  
1206  <pre>  <pre>
1207    PCRE_ERROR_NOMEMORY       (-6)    PCRE_ERROR_NOMEMORY       (-6)
1208  </PRE>  </pre>
 </P>  
 <P>  
1209  if the attempt to get the memory block failed.  if the attempt to get the memory block failed.
1210  </P>  </P>
1211  <P>  <P>
# Line 1290  linked via a special interface to anothe Line 1227  linked via a special interface to anothe
1227  <b>pcre_free</b> directly; it is for these cases that the functions are  <b>pcre_free</b> directly; it is for these cases that the functions are
1228  provided.  provided.
1229  </P>  </P>
1230  <br><a name="SEC12" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>  <br><a name="SEC13" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a><br>
1231    <P>
1232    <b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
1233    <b>const char *<i>name</i>);</b>
1234    </P>
1235  <P>  <P>
1236  <b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>  <b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
1237  <b>const char *<i>subject</i>, int *<i>ovector</i>,</b>  <b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
# Line 1298  provided. Line 1239  provided.
1239  <b>char *<i>buffer</i>, int <i>buffersize</i>);</b>  <b>char *<i>buffer</i>, int <i>buffersize</i>);</b>
1240  </P>  </P>
1241  <P>  <P>
 <b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>  
 <b>const char *<i>name</i>);</b>  
 </P>  
 <P>  
1242  <b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>  <b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
1243  <b>const char *<i>subject</i>, int *<i>ovector</i>,</b>  <b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
1244  <b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>  <b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
1245  <b>const char **<i>stringptr</i>);</b>  <b>const char **<i>stringptr</i>);</b>
1246  </P>  </P>
1247  <P>  <P>
1248  To extract a substring by name, you first have to find associated number. This  To extract a substring by name, you first have to find associated number.
1249  can be done by calling <b>pcre_get_stringnumber()</b>. The first argument is the  For example, for this pattern
 compiled pattern, and the second is the name. For example, for this pattern  
 </P>  
 <P>  
1250  <pre>  <pre>
1251    ab(?&#60;xxx&#62;\d+)...    (a+)b(?&#60;xxx&#62;\d+)...
1252  </PRE>  </pre>
1253    the number of the subpattern called "xxx" is 2. You can find the number from
1254    the name by calling <b>pcre_get_stringnumber()</b>. The first argument is the
1255    compiled pattern, and the second is the name. The yield of the function is the
1256    subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no subpattern of
1257    that name.
1258  </P>  </P>
1259  <P>  <P>
1260  the number of the subpattern called "xxx" is 1. Given the number, you can then  Given the number, you can extract the substring directly, or use one of the
1261  extract the substring directly, or use one of the functions described in the  functions described in the previous section. For convenience, there are also
1262  previous section. For convenience, there are also two functions that do the  two functions that do the whole job.
 whole job.  
1263  </P>  </P>
1264  <P>  <P>
1265  Most of the arguments of <i>pcre_copy_named_substring()</i> and  Most of the arguments of <i>pcre_copy_named_substring()</i> and
1266  <i>pcre_get_named_substring()</i> are the same as those for the functions that  <i>pcre_get_named_substring()</i> are the same as those for the similarly named
1267  extract by number, and so are not re-described here. There are just two  functions that extract by number. As these are described in the previous
1268  differences.  section, they are not re-described here. There are just two differences:
1269  </P>  </P>
1270  <P>  <P>
1271  First, instead of a substring number, a substring name is given. Second, there  First, instead of a substring number, a substring name is given. Second, there
# Line 1341  then call pcre_copy_substring() o Line 1279  then call pcre_copy_substring() o
1279  appropriate.  appropriate.
1280  </P>  </P>
1281  <P>  <P>
1282  Last updated: 09 December 2003  Last updated: 09 September 2004
1283  <br>  <br>
1284  Copyright &copy; 1997-2003 University of Cambridge.  Copyright &copy; 1997-2004 University of Cambridge.
1285    <p>
1286    Return to the <a href="index.html">PCRE index page</a>.
1287    </p>

Legend:
Removed from v.74  
changed lines
  Added in v.75

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12