| 72 |
|
|
| 73 |
Returns: < 0 if the string is a valid UTF-8 string |
Returns: < 0 if the string is a valid UTF-8 string |
| 74 |
>= 0 otherwise; the value is the offset of the bad byte |
>= 0 otherwise; the value is the offset of the bad byte |
| 75 |
|
|
| 76 |
Bad bytes can be: |
Bad bytes can be: |
| 77 |
|
|
| 78 |
. An isolated byte whose most significant bits are 0x80, because this |
. An isolated byte whose most significant bits are 0x80, because this |
| 79 |
can only correctly appear within a UTF-8 character; |
can only correctly appear within a UTF-8 character; |
| 80 |
|
|
| 81 |
. A byte whose most significant bits are 0xc0, but whose other bits indicate |
. A byte whose most significant bits are 0xc0, but whose other bits indicate |
| 82 |
that there are more than 3 additional bytes (i.e. an RFC 2279 starting |
that there are more than 3 additional bytes (i.e. an RFC 2279 starting |
| 83 |
byte, which is no longer valid under RFC 3629); |
byte, which is no longer valid under RFC 3629); |
| 84 |
|
|
| 85 |
. |
. |
| 86 |
|
|
| 87 |
The returned offset may also be equal to the length of the string; this means |
The returned offset may also be equal to the length of the string; this means |
| 88 |
that one or more bytes is missing from the final UTF-8 character. |
that one or more bytes is missing from the final UTF-8 character. |
| 89 |
*/ |
*/ |
| 90 |
|
|