/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 653 by ph10, Thu Jul 28 18:59:40 2011 UTC revision 654 by ph10, Tue Aug 2 11:00:40 2011 UTC
# Line 1548  in the main Line 1548  in the main
1548  .\"  .\"
1549  page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns  page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns
1550  the error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is  the error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is
1551  a truncated UTF-8 character at the end of the subject, PCRE_ERROR_SHORTUTF8. In  a truncated UTF-8 character at the end of the subject, PCRE_ERROR_SHORTUTF8. In
1552  both cases, information about the precise nature of the error may also be  both cases, information about the precise nature of the error may also be
1553  returned (see the descriptions of these errors in the section entitled \fIError  returned (see the descriptions of these errors in the section entitled \fIError
1554  return values from\fP \fBpcre_exec()\fP  return values from\fP \fBpcre_exec()\fP
# Line 1810  PCRE_ERROR_SHORTUTF8 is returned instead Line 1810  PCRE_ERROR_SHORTUTF8 is returned instead
1810  .sp  .sp
1811    PCRE_ERROR_BADUTF8_OFFSET (-11)    PCRE_ERROR_BADUTF8_OFFSET (-11)
1812  .sp  .sp
1813  The UTF-8 byte sequence that was passed as a subject was checked and found to  The UTF-8 byte sequence that was passed as a subject was checked and found to
1814  be valid (the PCRE_NO_UTF8_CHECK option was not set), but the value of  be valid (the PCRE_NO_UTF8_CHECK option was not set), but the value of
1815  \fIstartoffset\fP did not point to the beginning of a UTF-8 character or the  \fIstartoffset\fP did not point to the beginning of a UTF-8 character or the
1816  end of the subject.  end of the subject.
# Line 1865  retained for backwards compatibility. Line 1865  retained for backwards compatibility.
1865  .sp  .sp
1866    PCRE_ERROR_RECURSELOOP    (-26)    PCRE_ERROR_RECURSELOOP    (-26)
1867  .sp  .sp
1868  This error is returned when \fBpcre_exec()\fP detects a recursion loop within  This error is returned when \fBpcre_exec()\fP detects a recursion loop within
1869  the pattern. Specifically, it means that either the whole pattern or a  the pattern. Specifically, it means that either the whole pattern or a
1870  subpattern has been called recursively for the second time at the same position  subpattern has been called recursively for the second time at the same position
1871  in the subject string. Some simple patterns that might do this are detected and  in the subject string. Some simple patterns that might do this are detected and
1872  faulted at compile time, but more complicated cases, in particular mutual  faulted at compile time, but more complicated cases, in particular mutual
1873  recursions between two different subpatterns, cannot be detected until run  recursions between two different subpatterns, cannot be detected until run
# Line 1880  Error numbers -16 to -20 and -22 are not Line 1880  Error numbers -16 to -20 and -22 are not
1880  .SS "Reason codes for invalid UTF-8 strings"  .SS "Reason codes for invalid UTF-8 strings"
1881  .rs  .rs
1882  .sp  .sp
1883  When \fBpcre_exec()\fP returns either PCRE_ERROR_BADUTF8 or  When \fBpcre_exec()\fP returns either PCRE_ERROR_BADUTF8 or
1884  PCRE_ERROR_SHORTUTF8, and the size of the output vector (\fIovecsize\fP) is at  PCRE_ERROR_SHORTUTF8, and the size of the output vector (\fIovecsize\fP) is at
1885  least 2, the offset of the start of the invalid UTF-8 character is placed in  least 2, the offset of the start of the invalid UTF-8 character is placed in
1886  the first output vector element (\fIovector[0]\fP) and a reason code is placed  the first output vector element (\fIovector[0]\fP) and a reason code is placed
1887  in the second element (\fIovector[1]\fP). The reason codes are given names in  in the second element (\fIovector[1]\fP). The reason codes are given names in
1888  the \fBpcre.h\fP header file:  the \fBpcre.h\fP header file:
1889  .sp  .sp
# Line 1893  the \fBpcre.h\fP header file: Line 1893  the \fBpcre.h\fP header file:
1893    PCRE_UTF8_ERR4    PCRE_UTF8_ERR4
1894    PCRE_UTF8_ERR5    PCRE_UTF8_ERR5
1895  .sp  .sp
1896  The string ends with a truncated UTF-8 character; the code specifies how many  The string ends with a truncated UTF-8 character; the code specifies how many
1897  bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be  bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be
1898  no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279)  no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279)
1899  allows for up to 6 bytes, and this is checked first; hence the possibility of  allows for up to 6 bytes, and this is checked first; hence the possibility of
1900  4 or 5 missing bytes.  4 or 5 missing bytes.
1901  .sp  .sp
1902    PCRE_UTF8_ERR6    PCRE_UTF8_ERR6
# Line 1905  allows for up to 6 bytes, and this is ch Line 1905  allows for up to 6 bytes, and this is ch
1905    PCRE_UTF8_ERR9    PCRE_UTF8_ERR9
1906    PCRE_UTF8_ERR10    PCRE_UTF8_ERR10
1907  .sp  .sp
1908  The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the  The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the
1909  character do not have the binary value 0b10 (that is, either the most  character do not have the binary value 0b10 (that is, either the most
1910  significant bit is 0, or the next bit is 1).  significant bit is 0, or the next bit is 1).
1911  .sp  .sp
1912    PCRE_UTF8_ERR11    PCRE_UTF8_ERR11
1913    PCRE_UTF8_ERR12    PCRE_UTF8_ERR12
1914  .sp  .sp
1915  A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long;  A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long;
1916  these code points are excluded by RFC 3629.  these code points are excluded by RFC 3629.
1917  .sp  .sp
1918    PCRE_UTF8_ERR13    PCRE_UTF8_ERR13
1919  .sp  .sp
1920  A 4-byte character has a value greater than 0x10fff; these code points are  A 4-byte character has a value greater than 0x10fff; these code points are
1921  excluded by RFC 3629.  excluded by RFC 3629.
1922  .sp  .sp
1923    PCRE_UTF8_ERR14    PCRE_UTF8_ERR14
1924  .sp  .sp
1925  A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of  A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of
1926  code points are reserved by RFC 3629 for use with UTF-16, and so are excluded  code points are reserved by RFC 3629 for use with UTF-16, and so are excluded
1927  from UTF-8.  from UTF-8.
1928  .sp  .sp
1929    PCRE_UTF8_ERR15    PCRE_UTF8_ERR15
1930    PCRE_UTF8_ERR16    PCRE_UTF8_ERR16
1931    PCRE_UTF8_ERR17    PCRE_UTF8_ERR17
1932    PCRE_UTF8_ERR18    PCRE_UTF8_ERR18
1933    PCRE_UTF8_ERR19    PCRE_UTF8_ERR19
1934  .sp  .sp
1935  A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a  A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a
1936  value that can be represented by fewer bytes, which is invalid. For example,  value that can be represented by fewer bytes, which is invalid. For example,
1937  the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just  the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just
1938  one byte.  one byte.
1939  .sp  .sp
1940    PCRE_UTF8_ERR20    PCRE_UTF8_ERR20
1941  .sp  .sp
1942  The two most significant bits of the first byte of a character have the binary  The two most significant bits of the first byte of a character have the binary
1943  value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a  value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a
1944  byte can only validly occur as the second or subsequent byte of a multi-byte  byte can only validly occur as the second or subsequent byte of a multi-byte
1945  character.  character.
1946  .sp  .sp

Legend:
Removed from v.653  
changed lines
  Added in v.654

webmaster@exim.org
ViewVC Help
Powered by ViewVC 1.1.12