将二进制数据转换为字符串
如果我有一些二进制数据 D 并且将其转换为字符串 S。我希望将其转换回二进制数据我会得到 D。但这是错误的。
public class A {
public static void main(String[] args) throws IOException {
final byte[] bytes = new byte[]{-114, 104, -35};// In hex: 8E 68 DD
System.out.println(bytes.length); //prints 3
System.out.println(new String(bytes, "UTF-8").getBytes("UTF-8").length); //prints 7
}
}
为什么会出现这种情况?
If I have some binary data D And I convert it to string S. I expect than on converting it back to binary I will get D. But It's wrong.
public class A {
public static void main(String[] args) throws IOException {
final byte[] bytes = new byte[]{-114, 104, -35};// In hex: 8E 68 DD
System.out.println(bytes.length); //prints 3
System.out.println(new String(bytes, "UTF-8").getBytes("UTF-8").length); //prints 7
}
}
Why does this happens?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在字节数组和字符串之间进行转换并不是一对一的映射操作。阅读 docs,字符串实现使用 CharsetDecoder 将传入的字节数组转换为 unicode。输入字节数组中的第一个和最后一个字节不得映射到有效的 unicode 字符,因此它将替换为一些 替换字符串。
Converting between a byte array to a String and back again is not a one-to-one mapping operation. Reading the docs, the String implmentation uses the CharsetDecoder to convert the incoming byte array into unicode. The first and last bytes in your input byte array must not map to a valid unicode character, thus it replaces it with some replacement string.
您转换为字符串的字节实际上可能不会形成有效的字符串。如果 java 无法弄清楚每个字节的含义,它会尝试修复它们。这意味着当您转换回字节数组时,它不会与开始时相同。如果您尝试使用一组有效的字节,那么您应该会更成功。
It's likely that the bytes you're converting to a string don't actually form a valid string. If java can't figure out what you mean by each byte, it will attempt to fix them. This means that when you convert back to the byte array, it won't be the same as when you started. If you try with a valid set of bytes, then you should be more successful.
您的数据无法使用 UTF-8 编码解码为有效的 Unicode 字符。查看解码后的字符串。它由 3 个字符组成:
0xFFFD
、0x0068
和0xFFFD
。第一个和最后一个是“�” - Unicode 替换字符。我认为你需要选择其他编码。即“CP866”产生有效的字符串并转换回相同的数组。Your data can't be decoded into valid Unicode characters using UTF-8 encoding. Look at decoded string. It consists of 3 characters:
0xFFFD
,0x0068
and0xFFFD
. First and last are "�" - Unicode replacement characters. I think you need to choose other encoding. I.e. "CP866" produces valid string and converts back into same array.