/[pcre]/code/trunk/doc/html/pcre.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcre.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 535 by ph10, Thu Jun 3 19:18:24 2010 UTC revision 654 by ph10, Tue Aug 2 11:00:40 2011 UTC
# Line 30  support for one or two .NET and Onigurum Line 30  support for one or two .NET and Onigurum
30  for requesting some minor changes that give better JavaScript compatibility.  for requesting some minor changes that give better JavaScript compatibility.
31  </P>  </P>
32  <P>  <P>
33  The current implementation of PCRE corresponds approximately with Perl  The current implementation of PCRE corresponds approximately with Perl 5.12,
34  5.10/5.11, including support for UTF-8 encoded strings and Unicode general  including support for UTF-8 encoded strings and Unicode general category
35  category properties. However, UTF-8 and Unicode support has to be explicitly  properties. However, UTF-8 and Unicode support has to be explicitly enabled; it
36  enabled; it is not the default. The Unicode tables correspond to Unicode  is not the default. The Unicode tables correspond to Unicode release 6.0.0.
 release 5.2.0.  
37  </P>  </P>
38  <P>  <P>
39  In addition to the Perl-compatible matching function, PCRE contains an  In addition to the Perl-compatible matching function, PCRE contains an
# Line 208  the whole surrogate thing is a fudge for Line 207  the whole surrogate thing is a fudge for
207  UTF-8.)  UTF-8.)
208  </P>  </P>
209  <P>  <P>
210  If an invalid UTF-8 string is passed to PCRE, an error return  If an invalid UTF-8 string is passed to PCRE, an error return is given. At
211  (PCRE_ERROR_BADUTF8) is given. In some situations, you may already know that  compile time, the only additional information is the offset to the first byte
212  your strings are valid, and therefore want to skip these checks in order to  of the failing character. The runtime functions (<b>pcre_exec()</b> and
213  improve performance. If you set the PCRE_NO_UTF8_CHECK flag at compile time or  <b>pcre_dfa_exec()</b>), pass back this information as well as a more detailed
214  at run time, PCRE assumes that the pattern or subject it is given  reason code if the caller has provided memory in which to do this.
215  (respectively) contains only valid UTF-8 codes. In this case, it does not  </P>
216  diagnose an invalid UTF-8 string.  <P>
217    In some situations, you may already know that your strings are valid, and
218    therefore want to skip these checks in order to improve performance. If you set
219    the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that
220    the pattern or subject it is given (respectively) contains only valid UTF-8
221    codes. In this case, it does not diagnose an invalid UTF-8 string.
222  </P>  </P>
223  <P>  <P>
224  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
# Line 260  test characters of any code value, but, Line 264  test characters of any code value, but,
264  recognizes as digits, spaces, or word characters remain the same set as before,  recognizes as digits, spaces, or word characters remain the same set as before,
265  all with values less than 256. This remains true even when PCRE is built to  all with values less than 256. This remains true even when PCRE is built to
266  include Unicode property support, because to do otherwise would slow down PCRE  include Unicode property support, because to do otherwise would slow down PCRE
267  in many common cases. Note that this also applies to \b, because it is defined  in many common cases. Note in particular that this applies to \b and \B,
268  in terms of \w and \W. If you really want to test for a wider sense of, say,  because they are defined in terms of \w and \W. If you really want to test
269  "digit", you can use explicit Unicode property tests such as \p{Nd}.  for a wider sense of, say, "digit", you can use explicit Unicode property tests
270  Alternatively, if you set the PCRE_UCP option, the way that the character  such as \p{Nd}. Alternatively, if you set the PCRE_UCP option, the way that
271  escapes work is changed so that Unicode properties are used to determine which  the character escapes work is changed so that Unicode properties are used to
272  characters match. There are more details in the section on  determine which characters match. There are more details in the section on
273  <a href="pcrepattern.html#genericchartypes">generic character types</a>  <a href="pcrepattern.html#genericchartypes">generic character types</a>
274  in the  in the
275  <a href="pcrepattern.html"><b>pcrepattern</b></a>  <a href="pcrepattern.html"><b>pcrepattern</b></a>
# Line 276  documentation. Line 280  documentation.
280  low-valued characters, unless the PCRE_UCP option is set.  low-valued characters, unless the PCRE_UCP option is set.
281  </P>  </P>
282  <P>  <P>
283  8. However, the Perl 5.10 horizontal and vertical whitespace matching escapes  8. However, the horizontal and vertical whitespace matching escapes (\h, \H,
284  (\h, \H, \v, and \V) do match all the appropriate Unicode characters,  \v, and \V) do match all the appropriate Unicode characters, whether or not
285  whether or not PCRE_UCP is set.  PCRE_UCP is set.
286  </P>  </P>
287  <P>  <P>
288  9. Case-insensitive matching applies only to characters whose values are less  9. Case-insensitive matching applies only to characters whose values are less
# Line 286  than 128, unless PCRE is built with Unic Line 290  than 128, unless PCRE is built with Unic
290  property support is available, PCRE still uses its own character tables when  property support is available, PCRE still uses its own character tables when
291  checking the case of low-valued characters, so as not to degrade performance.  checking the case of low-valued characters, so as not to degrade performance.
292  The Unicode property information is used only for characters with higher  The Unicode property information is used only for characters with higher
293  values. Even when Unicode property support is available, PCRE supports  values. Furthermore, PCRE supports case-insensitive matching only when there is
294  case-insensitive matching only when there is a one-to-one mapping between a  a one-to-one mapping between a letter's cases. There are a small number of
295  letter's cases. There are a small number of many-to-one mappings in Unicode;  many-to-one mappings in Unicode; these are not supported by PCRE.
 these are not supported by PCRE.  
296  </P>  </P>
297  <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
298  <P>  <P>
# Line 307  two digits 10, at the domain cam.ac.uk. Line 310  two digits 10, at the domain cam.ac.uk.
310  </P>  </P>
311  <br><a name="SEC6" href="#TOC1">REVISION</a><br>  <br><a name="SEC6" href="#TOC1">REVISION</a><br>
312  <P>  <P>
313  Last updated: 12 May 2010  Last updated: 07 May 2011
314  <br>  <br>
315  Copyright &copy; 1997-2010 University of Cambridge.  Copyright &copy; 1997-2011 University of Cambridge.
316  <br>  <br>
317  <p>  <p>
318  Return to the <a href="index.html">PCRE index page</a>.  Return to the <a href="index.html">PCRE index page</a>.

Legend:
Removed from v.535  
changed lines
  Added in v.654

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12