ByteBuffer.asCharBuffer() 使用什么字符集?
字符集的作用 ByteBuffer.asCharBuffer( ) 使用?在我的系统上它似乎将 3 个字节转换为 1 个字符。
相关说明,CharsetDecoder 与 ByteBuffer.asCharBuffer() 相关吗?
更新:关于我使用的 ByteBuffer 的实现,我正在调用 ByteBuffer.allocate(1024).asCharBuffer()
。我无法评论幕后使用的实现。
What Charset does ByteBuffer.asCharBuffer() use? It seems to convert 3 bytes to one character on my system.
On a related note, how does CharsetDecoder relate to ByteBuffer.asCharBuffer()?
UPDATE: With respect to what implementation of ByteBuffer I am using, I am invoking ByteBuffer.allocate(1024).asCharBuffer()
. I can't comment on what implementation gets used under the hood.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对于第一个问题 - 我相信它使用 Java 的本机字符编码(UTF-16)。
For the first question - I believe it uses native character encoding of Java (UTF-16).
据我了解,它没有任何用处。它只是假设它已经被正确解码为 Java 字符串,这意味着 UTF-16。这可以通过查看 HeapByteBuffer 的源代码来显示,其中返回的 charbuffer 最终调用(小尾数版本):
因此,这里处理的唯一事情是您负责的其余部分的尾数。这也意味着使用可以指定编码的 Decoder 类通常更有用。
As I understand it, it doesn't use anything. It just assumes it is already correctly decoded as a string for Java, which means UTF-16. This can be shown by looking at the source for the HeapByteBuffer, where the returned charbuffer finally calls (little endian version):
So the only thing that is handled here is the endianness for the rest you're responsible. Which also means it's usually much more useful to use the Decoder class where you can specify the encoding.
查看jdk7,
jdk/src/share/classes/java/nio
X-Buffer.java.template
将ByteBuffer.allocate()
映射到Heap-X-Buffer.java.template
Heap-X-Buffer.java.template
将ByteBuffer.asCharBuffer()
映射到ByteBufferAs-X-Buffer.java.template
ByteBuffer.asCharBuffer().toString()
调用CharBuffer.put(CharBuffer)
但我可以'不知道这会导致什么最终这可能会导致
Bits.makeChar()
定义为:但我不知道如何实现。
Looking at jdk7,
jdk/src/share/classes/java/nio
X-Buffer.java.template
mapsByteBuffer.allocate()
toHeap-X-Buffer.java.template
Heap-X-Buffer.java.template
mapsByteBuffer.asCharBuffer()
toByteBufferAs-X-Buffer.java.template
ByteBuffer.asCharBuffer().toString()
invokesCharBuffer.put(CharBuffer)
but I can't figure out where this leadsEventually this probably leads to
Bits.makeChar()
which is defined as:but I can't figure out how.
我想扩展 @Petteri H 的答案。确实,
asCharBuffer()
期望ByteBuffer
已经是 UTF-16 编码的。不执行进一步的编码转换。您可以使用下面的代码运行实验。首先,创建一个名为
test.txt
的纯文本文件,其中包含几行。该文件默认采用 UTf-8 编码。我们预计这会成为一个问题,因为
CharBuffer
将读取两个连续的字节来构造一个字符并为您提供垃圾值。稍后,我们将修复该问题。以下代码将简单地转储文件中的每个字符。注意:它将把每个双字节序列视为一个字符。
当您运行代码时,您将看到意外的字符:
现在,让我们使用 UTF-16 对同一文件进行编码。
更改 Java 代码以读取
test-fixed.txt
。然后再次运行它。现在,您将看到正确的输出。
有趣的是,
CharBuffer
跳过了test-fixed.txt
文件将具有的 BOM 标记。I wanted to expand on the answer by @Petteri H. It is true that
asCharBuffer()
expects theByteBuffer
to be already UTF-16 encoded. No further encoding conversion is performed. You can run an experiment using the code below.First, create a plain text file called
test.txt
with a few lines.This file will be UTf-8 encoded by default. We expect this to be a problem since
CharBuffer
will read two consecutive bytes to construct a character and give you garbage values. Later, we will fix the issue.The following code will simply dump each character from the file. Note: It will treat each double byte sequence as a character.
When you run the code you will see unexpected characters:
Now, let's encode the same file using UTF-16.
Change Java code to read
test-fixed.txt
. Then run it again.Now, you will see the right output.
It is interesting to note that
CharBuffer
skips the BOM marker whichtest-fixed.txt
file will have.