在 Java 中复制 C 结构体填充

发布于 2024-07-19 08:25:16 字数 390 浏览 10 评论 0原文

根据此处，C编译器在将结构写入二进制文件时会填充值。正如链接中的示例所示，当将这样的结构写入

struct {
 char c;
 int i;
} a;

二进制文件时，编译器通常会在 char 和 int 字段之间留下一个未命名的未使用的空洞，以确保 int 字段正确对齐。

如何使用不同的语言（在我的例子中是 Java）创建二进制输出文件（用 C 生成）的精确副本？

有没有一种自动方法在 Java 输出中应用 C 填充？或者我是否必须阅读编译器文档才能了解它是如何工作的（顺便说一句，编译器是 g++）。

原文

According to here, the C compiler will pad out values when writing a structure to a binary file. As the example in the link says, when writing a struct like this:

struct {
 char c;
 int i;
} a;

to a binary file, the compiler will usually leave an unnamed, unused hole between the char and int fields, to ensure that the int field is properly aligned.

How could I to create an exact replica of the binary output file (generated in C), using a different language (in my case, Java)?

Is there an automatic way to apply C padding in Java output? Or do I have to go through compiler documentation to see how it works (the compiler is g++ by the way).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

驱逐舰岛风号 2024-07-26 08:25:16

不要这样做，它很脆弱，并且会导致对齐和字节顺序错误。

对于外部数据，最好以字节为单位显式定义格式，并使用移位和掩码（而不是联合！）编写显式函数以在内部和外部格式之间进行转换。

回复收藏 0 原文

停顿的约定 2024-07-26 08:25:16

不仅在写入文件时如此，在内存中也是如此。事实上，结构体是在内存中填充的，如果结构体是逐字节写出的，就会导致填充出现在文件中。

一般来说，很难准确地复制精确的填充方案，尽管我猜一些启发式方法会让你走得很远。如果您有结构声明以进行分析，这会很有帮助。

通常，大于一个字符的字段将被对齐，以便它们在结构内的起始偏移量是其大小的倍数。这意味着 short 通常会在偶数偏移量上（可被 2 整除，假设 sizeof (short) == 2），而 double 则会偏移量可被 8 整除，依此类推。

更新：正是由于这样的原因（以及与字节序有关的原因），将整个结构转储到文件通常是一个坏主意。最好逐个字段执行此操作，如下所示：

put_char(out, a.c);
put_int(out, a.i);

假设 put 函数仅写入值所需的字节，这将向文件发出结构体的无填充版本，解决问题。还可以通过相应地编写这些函数来确保正确的、已知的字节排序。

This is true not only when writing to files, but also in memory. It is the fact that the struct is padded in memory, that leads to the padding showing up in the file, if the struct is written out byte-by-byte.

It is in general very hard to replicate with certainty the exact padding scheme, although I guess some heuristics would get you quite far. It helps if you have the struct declaration, for analysis.

Typically, fields larger than one char will be aligned so that their starting offset inside the structure is a multiple of their size. This means shorts will generally be on even offsets (divisible by 2, assuming sizeof (short) == 2), while doubles will be on offsets divisible by 8, and so on.

UPDATE: It is for reasons like this (and also reasons having to do with endianness) that it is generally a bad idea to dump whole structs out to files. It's better to do it field-by-field, like so:

put_char(out, a.c);
put_int(out, a.i);

Assuming the put-functions only write the bytes needed for the value, this will emit a padding-less version of the struct to the file, solving the problem. It is also possible to ensure a proper, known, byte-ordering by writing these functions accordingly.

回复收藏 0 原文

心的憧憬 2024-07-26 08:25:16

有没有一种自动应用C的方法
Java 输出中的填充？或者我有
查看编译器文档
看看它是如何工作的（编译器是
顺便说一句，g++）。

两者都不。相反，您显式指定数据/通信格式并实现该规范，而不是依赖 C 编译器的实现细节。您甚至不会从不同的 C 编译器获得相同的输出。

回复收藏 0 原文

守护在此方 2024-07-26 08:25:16

对于互操作性，请查看 ByteBuffer 类。

本质上，您创建一个特定大小的缓冲区，在不同位置放置不同类型的变量，然后在最后调用 array() 来检索“原始”数据表示：

ByteBuffer bb = ByteBuffer.allocate(8);
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.put(0, someChar);
bb.put(4, someInteger);
byte[] rawBytes = bb.array();

但这取决于您来确定在哪里put padding——即在位置之间跳过多少字节。

为了读取从 C 语言写入的数据，通常会使用 ByteBuffer 包裹从文件中读取的某些字节数组。

如果它有帮助，我在 ByteBuffer 上写了更多内容。

For interoperability, look at the ByteBuffer class.

Essentially, you create a buffer of a certain size, put() variables of different types at different positions, and then call array() at the end to retrieve the "raw" data representation:

ByteBuffer bb = ByteBuffer.allocate(8);
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.put(0, someChar);
bb.put(4, someInteger);
byte[] rawBytes = bb.array();

But it's up to you to work out where to put padding-- i.e. how many bytes to skip between positions.

For reading data written from C, then you generally wrap() a ByteBuffer around some byte array that you've read from a file.

In case it's helpful, I've written more on ByteBuffer.

回复收藏 0 原文

装纯掩盖桑 2024-07-26 08:25:16

在 Java 中读取/写入 C 结构的一种便捷方法是使用 javolution Struct 类（请参阅 http://www.javolution.组织）。这不会帮助您自动填充/对齐数据，但它确实使处理 ByteBuffer 中保存的原始数据更加方便。如果您不熟悉 javolution，那么它非常值得一看，因为里面还有很多其他很酷的东西。

回复收藏 0 原文

但可醉心 2024-07-26 08:25:16

该漏洞是可配置的，编译器具有按 1/2/4/8 字节对齐结构的开关。

所以第一个问题是：您到底想模拟哪种对齐方式？

回复收藏 0 原文

九局 2024-07-26 08:25:16

对于 Java，数据类型的大小由语言规范定义。例如，byte 类型为 1 个字节，short 为 2 个字节，依此类推。这与 C 不同，在 C 中，每种类型的大小取决于体系结构。

因此，为了能够将文件读入 Java，了解二进制文件的格式非常重要。

可能需要采取措施来确定字段具有特定大小，以考虑编译器或体系结构的差异。提到对齐似乎表明输出文件将取决于体系结构。

回复收藏 0 原文

壹場煙雨 2024-07-26 08:25:16

你可以尝试 preon：

Preon 是一个 Java 库，用于为比特流压缩数据构建编解码器
声明式（基于注释）方式。想想 JAXB 或 Hibernate，然后再考虑二进制
编码数据。

它可以处理大/小端二进制数据、对齐（填充）和各种数字类型以及其他功能。这是一个非常好的图书馆，我非常喜欢它

我的 0.02 美元

回复收藏 0 原文

蹲墙角沉默 2024-07-26 08:25:16

我强烈推荐协议缓冲区来解决这个问题。

回复收藏 0 原文

笑红尘 2024-07-26 08:25:16

据我了解，您是说您无法控制 C 程序的输出。你必须把它视为既定的。

那么您是否必须读取此文件以获取某些特定的结构集，或者是否必须在一般情况下解决此问题？我的意思是，有人说“这是程序 X 创建的文件，你必须用 Java 来读取它”的问题吗？或者他们期望你的Java程序读取C源代码，找到结构体定义，然后用Java读取它？

如果您有一个特定的文件要读取，那么这个问题实际上并不是很困难。通过查看 C 编译器规范或研究示例文件，找出填充位置。然后在 Java 端，以字节流的形式读取文件，并构建您知道即将到来的值。基本上，我会编写一组函数来从 InputStream 中读取所需数量的字节，并将它们转换为适当的数据类型。喜欢：

int readInt(InputStream is,int len)
  throws PrematureEndOfDataException
{
  int n=0;
  while (len-->0)
  {
    int i=is.read();
    if (i==-1)
      throw new PrematureEndOfDataException();
    byte b=(byte) i;
    n=(n<<8)+b;
  }
  return n;
}

As I understand it, you're saying that you don't control the output of the C program. You have to take it as given.

So do you have to read this file for some specific set of structures, or do you have to solve this in a general case? I mean, is the problem that someone said, "Here's the file created by program X, you have to read it in Java"? Or do they expect your Java program to read the C source code, find the structure definition, and then read it in Java?

If you've got a specific file to read, the problem isn't really very difficult. Either by reviewing the C compiler specifications or by studying example files, figure out where the padding is. Then on the Java side, read the file as a stream of bytes, and build the values you know are coming. Basically I'd write a set of functions to read the required number of bytes from an InputStream and turn them into the appropriate data type. Like:

int readInt(InputStream is,int len)
  throws PrematureEndOfDataException
{
  int n=0;
  while (len-->0)
  {
    int i=is.read();
    if (i==-1)
      throw new PrematureEndOfDataException();
    byte b=(byte) i;
    n=(n<<8)+b;
  }
  return n;
}

回复收藏 0 原文