对于这个 Java ByteBuffer 的行为有解释吗？

发布于 2024-12-15 06:54:32 字数 1370 浏览 4 评论 0原文

我需要将数值转换为字节数组。例如，要将 long 转换为字节数组，我有这个方法：

public static byte[] longToBytes(long l) {
  ByteBuffer buff = ByteBuffer.allocate(8);

  buff.order(ByteOrder.BIG_ENDIAN);

  buff.putLong(l);

  return buff.array();
}

它非常简单 - 获取一个 long，分配一个可以容纳它的数组，然后将其扔在那里。无论 l 的值是什么，我都会得到一个 8 字节数组，然后我可以按预期处理和使用它。就我而言，我正在创建自定义二进制格式，然后通过网络传输它。

当我使用值 773450364 调用此方法时，我得到一个数组 [0 0 0 0 46 25 -22 124]。我有代码也可以将字节数组转换回其数值：

public static Long bytesToLong(byte[] aBytes, int start) {
  byte[] b = new byte[8];

  b[0] = aBytes[start + 0];
  b[1] = aBytes[start + 1];
  b[2] = aBytes[start + 2];
  b[3] = aBytes[start + 3];
  b[4] = aBytes[start + 4];
  b[5] = aBytes[start + 5];
  b[6] = aBytes[start + 6];
  b[7] = aBytes[start + 7];

  ByteBuffer buf = ByteBuffer.wrap(b);
 return buf.getLong();
}

当我将数组从其他方法传递回此方法时，我得到 773450364，这是正确的。

现在，我通过 TCP 将该数组传输到另一个 Java 客户端。 java.io.InputStream.read() 方法的文档称，它返回 0 到 255 之间的 int 值，除非到达流末尾并且出现返回-1。但是，当我使用它填充字节数组时，我继续在接收端获取负值。我怀疑这与溢出有关（值 255 无法放入 Java 字节中，因此当我将其放入字节数组中时，它会溢出并变为负值）。

这让我想到了我的问题。负数的存在让我担心。现在，我正在开发应用程序的 Java 端，其中字节介于 -128 和 127 之间（包含 -128 和 127）。另一个端点可能是 C、C++、Python、Java、C#...谁知道呢。我不确定某些字节数组中负值的存在将如何影响处理。 除了记录此行为之外，我可以/应该做什么来让我自己和未来的开发人员更轻松地使用此系统，特别是在不是用 Java 编写的端点中？

原文

I need to convert numerical values into byte arrays. For example, to convert a long to a byte array, I have this method:

public static byte[] longToBytes(long l) {
  ByteBuffer buff = ByteBuffer.allocate(8);

  buff.order(ByteOrder.BIG_ENDIAN);

  buff.putLong(l);

  return buff.array();
}

It's pretty straightforward - take a long, allocate an array that can hold it, and throw it in there. Regardless of what the value of l is, I will get an 8 byte array back that I can then process and use as intended. In my case, I'm creating a custom binary format and then transmitting it over a network.

When I invoke this method with a value of 773450364, I get an array [0 0 0 0 46 25 -22 124] back. I have code that also converts byte arrays back into their numerical values:

public static Long bytesToLong(byte[] aBytes, int start) {
  byte[] b = new byte[8];

  b[0] = aBytes[start + 0];
  b[1] = aBytes[start + 1];
  b[2] = aBytes[start + 2];
  b[3] = aBytes[start + 3];
  b[4] = aBytes[start + 4];
  b[5] = aBytes[start + 5];
  b[6] = aBytes[start + 6];
  b[7] = aBytes[start + 7];

  ByteBuffer buf = ByteBuffer.wrap(b);
 return buf.getLong();
}

When I pass the array from the other method back into this method, I get 773450364, which is correct.

Now, I transmit this array over TCP to another Java client. The documentation for the java.io.InputStream.read() method says that it returns a int value between 0 and 255, unless the end of the stream is reached and a -1 is returned. However, when I use it to populate a byte array, I continue to get the negative values on the receiving side. I suspect this has to do with overflow (a value of 255 can not fit into a Java byte, so when I put it into the byte array, it overflows and becomes negative).

This brings me to my problem. The existance of the negative numbers concerns me. Right now, I'm developing the Java side of an application, where a byte is between -128 and 127 inclusive. The other endpoint might be in C, C++, Python, Java, C#...who knows. I'm not sure how the existance of a negative value in some byte arrays are going to affect processing. Other than documenting this behavior, what can/should I do to make it easier for myself and future developers working on this system, especially in endpoints that are not written in Java?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

二手情话 2024-12-22 06:54:32

Java 中的字节 以 8 位二进制补码格式表示。如果您有一个在 128 - 255 范围内的 int 并将其转换为 byte，那么它将变成一个带有 a 的 byte负值（-1 到 -128 之间）。

读取一个字节后，在将其转换为 byte 之前，必须检查它是否为 -1。该方法返回 int 而不是 byte 的原因是允许您在将其转换为 byte 之前检查流结束代码>.

另一件事：为什么要在 bytesToLong 方法中复制 aBytes 数组？您可以大大简化该方法并保存不必要的副本：

public static Long bytesToLong(byte[] aBytes, int start) {
    return ByteBuffer.wrap(aBytes, start, 8).order(ByteOrder.BIG_ENDIAN).getLong();
}

A byte in Java is represented in 8-bit two's complement format. If you have an int that is in the range 128 - 255 and you cast it to a byte, then it will become a byte with a negative value (between -1 and -128).

After reading a byte, you must check if it is -1 before you cast it to byte. The reason why the method returns an int rather than a byte is to allow you to check for end-of-stream before you convert it to a byte.

Another thing: Why are you copying the aBytes array in your bytesToLong method? You can simplify that method considerably and save the unncessary copy:

public static Long bytesToLong(byte[] aBytes, int start) {
    return ByteBuffer.wrap(aBytes, start, 8).order(ByteOrder.BIG_ENDIAN).getLong();
}

回复收藏 0 原文

垂暮老矣 2024-12-22 06:54:32

您的发送端点和接收端点当前都是用 Java 实现的。可以想象，您在发送端使用 OutputStream 并在接收端使用 InputStream。假设我们可以暂时信任底层套接字实现细节，我们将认为通过套接字发送的任何字节到达其目的地完全相同。

那么，当将某些内容转储到 OutputStream 中时，Java 级别实际上会发生什么？检查写入字节数组的方法的 JavaDoc，我们看到所有这些告诉我们的是字节正在通过流发送。那里没什么大不了的。但是当您检查文档中的以 int 作为参数的方法，您将看到它详细说明了该 int 实际是如何写出的：低 8 位作为字节通过流发送，而高 24 位（int有一个Java 中的 32 位表示形式）将被简单地忽略。

到接收方。你有一个输入流。除非您使用直接读取字节数组的方法之一，您将获得一个 int。就像文档所说的，int 要么是 0 到 255（含）之间的值，要么是 -1（如果已到达流末尾）。这是重要的一点。一方面，我们希望单个字节的所有可能的位模式都可以从 InputStream 中读取。但我们还必须有某种方法来检测读取何时不再能够返回有意义的值。这就是为什么该方法返回一个 int 而不是一个字节... -1 值是表示已到达流末尾的标志。如果您得到 -1 以外的任何值，则唯一感兴趣的是那些较低的 8 位。由于这些可以是任何位模式，因此它们的十进制值范围为 -128 到 127（包含 -128 和 127）。当您直接读入字节数组而不是每个 int 时，“修剪”将为您完成。所以你会看到这些负值是有道理的。也就是说，它们只是负数，因为 Java 将字节表示为有符号十进制的方式。唯一感兴趣的是实际的位模式。对于您关心的所有内容，它可以表示 0 到 255 或 1000 到 1255 之间的值。

一次使用一个字节的典型 InputStream 读取循环将如下所示：

InputStream ips = ...;
int read = 0;
while((read = ips.read()) != -1) {
    byte b = (byte)read;
    //b will now have a bit pattern ranging from 0x00 to 0xff in hex, or -128 to 127 in two-complement signed representation
}

运行时，以下内容（使用 Java 7 int 文字）将很有启发：

public class Main {

    public static void main(String[] args) {

        final int i1 = Ox00_00_00_fe;
        final int i1 = Ox80_00_00_fe;

        final byte b1 = (byte)i1;
        final byte b2 = (byte)i2;

        System.out.println(i1);
        System.out.println(i2);

        System.out.println(b1);
        System.out.println(b2);

        final int what = Ox12_34_56_fe;
        final byte the_f = (byte)what;

        System.out.println(what);
        System.out.println(the_f);

    }

}

从这里可以清楚地看出，从 int 转换为 byte 只会丢弃除最低有效 8 位之外的任何内容。所以int可以是正数也可以是负数，它不会对字节值产生任何影响。仅最后 8 位。

长话短说：您将从 InputStream 中获取正确的字节值。这里真正担心的是，如果客户端可以用任何编程语言编写并在任何平台上运行，您需要在文档中充分清楚地说明接收到的字节的含义以及它们是否是长字节，这是如何编码的。明确编码是在 Java 中使用 ByteBuffer 的 putLong 方法以特定的字节顺序完成的。只有这样，他们才能获得绝对确定如何解释这些字节的信息（与 Java 规范相结合）。

Both your sending and receiving endpoints are currently implemented in Java. Conceivably, you're using an OutputStream on the sending side and an InputStream on the receiving side. Assuming we can trust the underlying socket implementation details for a moment, we'll consider any byte sent over the socket to arrive at its destination exactly the same.

So what actually happens on Java level when dumping something into the OutputStream? When checking the JavaDoc for a method writing a byte array, we see that all this tells us is that bytes are being sent over the stream. Nothing major there. But when you check the doc for the method taking an int as argument, you'll see it details how this int is actually written out: the lower-order 8 bits are sent over the stream as a byte, while the higher-order 24 bits (int having a 32-bit representation in Java) are simply ignored.

Over to the receiving side. You've got an InputStream. Unless you use one of the methods reading directly into a byte array, you'll be given an int. Like the doc says, the int will either be a value between 0 and 255 inclusive, or -1 if the end of the stream has been reached. This is the important bit. On the one hand, we want every possible bit pattern of a single byte to be readable from an InputStream. But we must also have some way of detecting when a read no longer can return meaningful values. That's why that method returns an int instead of a byte... The -1 value is that flag saying the end of the stream was reached. If you get anything else than -1, the only thing of interest is those lower 8 bits. Since these can be any bit pattern, their decimal value will range from -128 to 127 inclusive. When you read directly into a byte array instead of int per int, that "trimming" is gonna be done for you. So it makes sense that you're gonna see those negative values. That said, they're only negative because of the way Java represents a byte as a signed decimal. The only thing of interest is the actual bit pattern. For all you care it could represent values 0 to 255 or 1000 to 1255.

A typical InputStream read loop that uses one byte at a time is gonna look like this:

InputStream ips = ...;
int read = 0;
while((read = ips.read()) != -1) {
    byte b = (byte)read;
    //b will now have a bit pattern ranging from 0x00 to 0xff in hex, or -128 to 127 in two-complement signed representation
}

When run, the following (uses Java 7 int literals) will be illuminating:

public class Main {

    public static void main(String[] args) {

        final int i1 = Ox00_00_00_fe;
        final int i1 = Ox80_00_00_fe;

        final byte b1 = (byte)i1;
        final byte b2 = (byte)i2;

        System.out.println(i1);
        System.out.println(i2);

        System.out.println(b1);
        System.out.println(b2);

        final int what = Ox12_34_56_fe;
        final byte the_f = (byte)what;

        System.out.println(what);
        System.out.println(the_f);

    }

}

As will be clear from this, casting from int to byte will simply ditch anything but the least significant 8 bits. So the int could be a positive or negative number, it won't have any bearing on the byte value. Only the last 8 bits.

Long story short: you're getting correct byte values from your InputStream. The real worry here is that if the client side could be written in any programming language and run on any platform, you'll need to make it abundantly clear in your documentation what the received bytes mean and if they're a long, how this is encoded. Make it clear that the encoding is done in Java, using ByteBuffer's putLong method in a specific endianness. Only then will they have the info (combined with Java specs) to be absolutely certain how to interpret those bytes.

回复收藏 0 原文