--- code/trunk/doc/pcreapi.3 2010/05/26 10:54:18 526 +++ code/trunk/doc/pcreapi.3 2010/06/25 14:42:00 548 @@ -765,7 +765,8 @@ 50 [this code is not in use] 51 octal value is greater than \e377 (not in UTF-8 mode) 52 internal error: overran compiling workspace - 53 internal error: previously-checked referenced subpattern not found + 53 internal error: previously-checked referenced subpattern + not found 54 DEFINE group contains more than one branch 55 repeating a DEFINE group is not allowed 56 inconsistent NEWLINE options @@ -778,7 +779,8 @@ 62 subpattern name expected 63 digit expected after (?+ 64 ] is an invalid data character in JavaScript compatibility mode - 65 different names for subpatterns of the same number are not allowed + 65 different names for subpatterns of the same number are + not allowed 66 (*MARK) must have an argument 67 this version of PCRE is not compiled with PCRE_UCP support .sp @@ -846,6 +848,16 @@ single fixed starting character. A bitmap of possible starting bytes is created. This speeds up finding a position in the subject at which to start matching. +.P +The two optimizations just described can be disabled by setting the +PCRE_NO_START_OPTIMIZE option when calling \fBpcre_exec()\fP or +\fBpcre_dfa_exec()\fP. You might want to do this if your pattern contains +callouts, or make use of (*MARK), and you make use of these in cases where +matching fails. See the discussion of PCRE_NO_START_OPTIMIZE +.\" HTML +.\" +below. +.\" . . .\" HTML @@ -1443,12 +1455,46 @@ PCRE_NO_START_OPTIMIZE .sp There are a number of optimizations that \fBpcre_exec()\fP uses at the start of -a match, in order to speed up the process. For example, if it is known that a -match must start with a specific character, it searches the subject for that -character, and fails immediately if it cannot find it, without actually running -the main matching function. When callouts are in use, these optimizations can -cause them to be skipped. This option disables the "start-up" optimizations, -causing performance to suffer, but ensuring that the callouts do occur. +a match, in order to speed up the process. For example, if it is known that an +unanchored match must start with a specific character, it searches the subject +for that character, and fails immediately if it cannot find it, without +actually running the main matching function. This means that a special item +such as (*COMMIT) at the start of a pattern is not considered until after a +suitable starting point for the match has been found. When callouts or (*MARK) +items are in use, these "start-up" optimizations can cause them to be skipped +if the pattern is never actually used. The start-up optimizations are in effect +a pre-scan of the subject that takes place before the pattern is run. +.P +The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, possibly +causing performance to suffer, but ensuring that in cases where the result is +"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK) +are considered at every possible starting position in the subject string. +Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation. +Consider the pattern +.sp + (*COMMIT)ABC +.sp +When this is compiled, PCRE records the fact that a match must start with the +character "A". Suppose the subject string is "DEFABC". The start-up +optimization scans along the subject, finds "A" and runs the first match +attempt from there. The (*COMMIT) item means that the pattern must match the +current starting position, which in this case, it does. However, if the same +match is run with PCRE_NO_START_OPTIMIZE set, the initial scan along the +subject string does not happen. The first match attempt is run starting from +"D" and when this fails, (*COMMIT) prevents any further matches being tried, so +the overall result is "no match". If the pattern is studied, more start-up +optimizations may be used. For example, a minimum length for the subject may be +recorded. Consider the pattern +.sp + (*MARK:A)(X|Y) +.sp +The minimum length for a match is one character. If the subject is "ABC", there +will be attempts to match "ABC", "BC", "C", and then finally an empty string. +If the pattern is studied, the final attempt does not take place, because PCRE +knows that the subject is too short, and so the (*MARK) is never encountered. +In this case, studying the pattern does not affect the overall match result, +which is still "no match", but it does affect the auxiliary information that is +returned. .sp PCRE_NO_UTF8_CHECK .sp @@ -1643,6 +1689,10 @@ gets a block of memory at the start of matching to use for this purpose. If the call via \fBpcre_malloc()\fP fails, this error is given. The memory is automatically freed at the end of matching. +.P +This error is also given if \fBpcre_stack_malloc()\fP fails in +\fBpcre_exec()\fP. This can happen only when PCRE has been compiled with +\fB--disable-stack-for-recursion\fP. .sp PCRE_ERROR_NOSUBSTRING (-7) .sp @@ -1991,9 +2041,10 @@ The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, -PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, -and PCRE_DFA_RESTART. All but the last four of these are exactly the same as -for \fBpcre_exec()\fP, so their description is not repeated here. +PCRE_NO_UTF8_CHECK, PCRE_BSR_ANYCRLF, PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, +PCRE_PARTIAL_HARD, PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. +All but the last four of these are exactly the same as for \fBpcre_exec()\fP, +so their description is not repeated here. .sp PCRE_PARTIAL_HARD PCRE_PARTIAL_SOFT @@ -2127,6 +2178,6 @@ .rs .sp .nf -Last updated: 26 May 2010 +Last updated: 21 June 2010 Copyright (c) 1997-2010 University of Cambridge. .fi