public final class Utf8 extends Object
The variant of UTF-8 implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1. One implication of this is that it rejects "non-shortest form" byte sequences, even though the JDK decoder may accept them.
Constructor and Description |
---|
Utf8() |
Modifier and Type | Method and Description |
---|---|
static int |
encodedLength(CharSequence sequence)
Returns the number of bytes in the UTF-8-encoded form of
sequence . |
private static int |
encodedLengthGeneral(CharSequence sequence,
int start) |
private static String |
unpairedSurrogateMsg(int i) |
public static int encodedLength(CharSequence sequence)
sequence
. For a string, this
method is equivalent to string.getBytes(UTF_8).length
, but is more efficient in both
time and space.IllegalArgumentException
- if sequence
contains ill-formed UTF-16 (unpaired
surrogates)private static int encodedLengthGeneral(CharSequence sequence, int start)
private static String unpairedSurrogateMsg(int i)
Copyright © 2022 ScalAgent D.T.. All rights reserved.