字节数组到字符串并返回.. -127 的问题

发布于 2024-10-21 08:40:37 字数 668 浏览 2 评论 0原文

下面:

 scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127))).getBytes
 res12: Array[Byte] = Array(1, 2, 3, -1, -2, 63)

为什么-127转换成63?以及如何将其恢复为 -127

[编辑:] 下面的 Java 版本(以表明它不仅仅是一个“Scala 问题”)

c:\tmp>type Main.java
public class Main {
    public static void main(String [] args) {
        byte [] b = {1, 2, 3, -1, -2, -127};
        byte [] c = new String(b).getBytes();
        for (int i = 0; i < 6; i++){
            System.out.println("b:"+b[i]+"; c:"+c[i]);
        }
    }
}
c:\tmp>javac Main.java
c:\tmp>java Main
b:1; c:1
b:2; c:2
b:3; c:3
b:-1; c:-1
b:-2; c:-2
b:-127; c:63

In the following:

 scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127))).getBytes
 res12: Array[Byte] = Array(1, 2, 3, -1, -2, 63)

why is -127 converted to 63? and how do I get it back as -127

[EDIT:] Java version below (to show that its not just a "Scala problem")

c:\tmp>type Main.java
public class Main {
    public static void main(String [] args) {
        byte [] b = {1, 2, 3, -1, -2, -127};
        byte [] c = new String(b).getBytes();
        for (int i = 0; i < 6; i++){
            System.out.println("b:"+b[i]+"; c:"+c[i]);
        }
    }
}
c:\tmp>javac Main.java
c:\tmp>java Main
b:1; c:1
b:2; c:2
b:3; c:3
b:-1; c:-1
b:-2; c:-2
b:-127; c:63

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦与时光遇 2024-10-28 08:40:37

您调用的构造函数使得二进制到字符串的转换使用解码变得不明显:String(byte[] bytes, Charset charset)。你想要的是根本不使用解码。

幸运的是,有一个构造函数可以实现这一点:String(char[] value)

现在您已将数据存储在字符串中,但您希望它完全按原样返回。但你猜怎么着! getBytes(Charset charset) 是的,还会自动应用编码。幸运的是,有一个 toCharArray() 方法。

如果必须以字节开头并以字节结尾,则必须将 char 数组映射到字节:

(new String(Array[Byte](1,2,3,-1,-2,-127).map(_.toChar))).toCharArray.map(_.toByte)

因此,总结一下:在 StringArray[Byte] 之间进行转换涉及编码和解码。如果要将二进制数据放入字符串中,则必须在字符级别上进行。但请注意,这会给您一个垃圾字符串(即结果不会是格式良好的 UTF-16,正如 String 所期望的那样),因此您最好将其读出作为字符并将其转换回字节。

您可以将字节向上移动,例如添加 512;然后你会得到一堆有效的单个 Char 代码点。但这是用16位来表示每8个,编码效率为50%。 Base64 是序列化二进制数据的更好选择(8 位代表 6,效率为 75%)。

The constructor you're calling makes it non-obvious that binary-to-string conversions use a decoding: String(byte[] bytes, Charset charset). What you want is to use no decoding at all.

Fortunately, there's a constructor for that: String(char[] value).

Now you have the data in a string, but you want it back exactly as is. But guess what! getBytes(Charset charset) That's right, there's an encoding applied automatically also. Fortunately, there is a toCharArray() method.

If you must start with bytes and end with bytes, you then have to map the char arrays to bytes:

(new String(Array[Byte](1,2,3,-1,-2,-127).map(_.toChar))).toCharArray.map(_.toByte)

So, to summarize: converting between String and Array[Byte] involves encoding and decoding. If you want to put binary data in a string, you have to do it at the level of characters. Note, however, that this will give you a garbage string (i.e. the result will not be well-formed UTF-16, as String is expected to be), and so you'd better read it out as characters and convert it back to bytes.

You could shift the bytes up by, say, adding 512; then you'd get a bunch of valid single Char code points. But this is using 16 bits to represent every 8, a 50% encoding efficiency. Base64 is a better option for serializing binary data (8 bits to represent 6, 75% efficient).

雪落纷纷 2024-10-28 08:40:37

字符串用于存储文本而不是二进制数据。

在您的默认字符编码中,没有 -127 的字符,因此它将其替换为“?”或 63。

编辑:Base64 是最好的选择,更好的是不使用文本来存储二进制数据。这是可以做到的,但不能使用任何标准字符编码。即你必须自己进行编码。

要从字面上回答您的问题,您可以使用自己的字符编码。这是一个非常糟糕的主意,因为任何文本都可能以与您所看到的相同的方式进行编码和破坏。使用 Base64 通过使用在任何编码中都是安全的字符来避免这种情况。

byte[] bytes = new byte[256];
for (int i = 0; i < bytes.length; i++)
    bytes[i] = (byte) i;
String text = new String(bytes, 0);
byte[] bytes2 = new byte[text.length()];
for (int i = 0; i < bytes2.length; i++)
    bytes2[i] = (byte) text.charAt(i);
int count = 0;
for (int i = 0; i < bytes2.length; i++)
    if (bytes2[i] != (byte) i)
        System.out.println(i);
    else
        count++;
System.out.println(count + " bytes matched.");

String is for storing text not binary data.

In your default character encoding there is no charcter for -127 so it replaces it with '?' or 63.

EDIT: Base64 is the best option, even better would be to not use text to store binary data. It can be done, but not with any standard character encoding. i.e. you have to do the encoding yourself.

To answer your question literally, you can use your own character encoding. This is a very bad idea as any text is likely to get encoded and mangled in the same way as you have seen. Using Base64 avoids this by using characters which are safe in any encoding.

byte[] bytes = new byte[256];
for (int i = 0; i < bytes.length; i++)
    bytes[i] = (byte) i;
String text = new String(bytes, 0);
byte[] bytes2 = new byte[text.length()];
for (int i = 0; i < bytes2.length; i++)
    bytes2[i] = (byte) text.charAt(i);
int count = 0;
for (int i = 0; i < bytes2.length; i++)
    if (bytes2[i] != (byte) i)
        System.out.println(i);
    else
        count++;
System.out.println(count + " bytes matched.");
睫毛上残留的泪 2024-10-28 08:40:37

StringOps 有一个方法 getBytes,我认为这可能是人们真正想要的将 String 转换为 Array[Byte]

http://www.scala-lang.org/api/2.10.2/index.html#scala.collection.immutable.StringOps

StringOps has a method getBytes, I think that is probably what one actually wants for converting String to Array[Byte]

http://www.scala-lang.org/api/2.10.2/index.html#scala.collection.immutable.StringOps

無心 2024-10-28 08:40:37

使用正确的字符集:

scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127), "utf-16")).getBytes("utf-16")
res13: Array[Byte] = Array(-2, -1, 1, 2, 3, -1, -2, -127)

Use correct charset:

scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127), "utf-16")).getBytes("utf-16")
res13: Array[Byte] = Array(-2, -1, 1, 2, 3, -1, -2, -127)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文