在 Java 中复制 C 结构体填充
根据此处,C编译器在将结构写入二进制文件时会填充值。 正如链接中的示例所示,当将这样的结构写入
struct {
char c;
int i;
} a;
二进制文件时,编译器通常会在 char 和 int 字段之间留下一个未命名的未使用的空洞,以确保 int 字段正确对齐。
如何使用不同的语言(在我的例子中是 Java)创建二进制输出文件(用 C 生成)的精确副本?
有没有一种自动方法在 Java 输出中应用 C 填充? 或者我是否必须阅读编译器文档才能了解它是如何工作的(顺便说一句,编译器是 g++)。
According to here, the C compiler will pad out values when writing a structure to a binary file. As the example in the link says, when writing a struct like this:
struct {
char c;
int i;
} a;
to a binary file, the compiler will usually leave an unnamed, unused hole between the char and int fields, to ensure that the int field is properly aligned.
How could I to create an exact replica of the binary output file (generated in C), using a different language (in my case, Java)?
Is there an automatic way to apply C padding in Java output? Or do I have to go through compiler documentation to see how it works (the compiler is g++ by the way).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
不要这样做,它很脆弱,并且会导致对齐和字节顺序错误。
对于外部数据,最好以字节为单位显式定义格式,并使用移位和掩码(而不是联合!)编写显式函数以在内部和外部格式之间进行转换。
Don't do this, it is brittle and will lead to alignment and endianness bugs.
For external data it is much better to explicitly define the format in terms of bytes and write explicit functions to convert between internal and external format, using shift and masks (not union!).
不仅在写入文件时如此,在内存中也是如此。 事实上,结构体是在内存中填充的,如果结构体是逐字节写出的,就会导致填充出现在文件中。
一般来说,很难准确地复制精确的填充方案,尽管我猜一些启发式方法会让你走得很远。 如果您有结构声明以进行分析,这会很有帮助。
通常,大于一个字符的字段将被对齐,以便它们在结构内的起始偏移量是其大小的倍数。 这意味着
short
通常会在偶数偏移量上(可被 2 整除,假设sizeof (short) == 2
),而double
则会偏移量可被 8 整除,依此类推。更新:正是由于这样的原因(以及与字节序有关的原因),将整个结构转储到文件通常是一个坏主意。 最好逐个字段执行此操作,如下所示:
假设 put 函数仅写入值所需的字节,这将向文件发出结构体的无填充版本,解决问题。 还可以通过相应地编写这些函数来确保正确的、已知的字节排序。
This is true not only when writing to files, but also in memory. It is the fact that the struct is padded in memory, that leads to the padding showing up in the file, if the struct is written out byte-by-byte.
It is in general very hard to replicate with certainty the exact padding scheme, although I guess some heuristics would get you quite far. It helps if you have the struct declaration, for analysis.
Typically, fields larger than one char will be aligned so that their starting offset inside the structure is a multiple of their size. This means
short
s will generally be on even offsets (divisible by 2, assumingsizeof (short) == 2
), whiledouble
s will be on offsets divisible by 8, and so on.UPDATE: It is for reasons like this (and also reasons having to do with endianness) that it is generally a bad idea to dump whole structs out to files. It's better to do it field-by-field, like so:
Assuming the
put
-functions only write the bytes needed for the value, this will emit a padding-less version of the struct to the file, solving the problem. It is also possible to ensure a proper, known, byte-ordering by writing these functions accordingly.两者都不。 相反,您显式指定数据/通信格式并实现该规范,而不是依赖 C 编译器的实现细节。 您甚至不会从不同的 C 编译器获得相同的输出。
Neither. Instead, you explicitly specify a data/communication format and implement that specification, rather than relying on implementation details of the C compiler. You won't even get the same output from different C compilers.
对于互操作性,请查看 ByteBuffer 类。
本质上,您创建一个特定大小的缓冲区,在不同位置放置不同类型的变量,然后在最后调用 array() 来检索“原始”数据表示:
但这取决于您来确定在哪里put padding——即在位置之间跳过多少字节。
为了读取从 C 语言写入的数据,通常会使用 ByteBuffer 包裹从文件中读取的某些字节数组。
如果它有帮助,我在 ByteBuffer 上写了更多内容。
For interoperability, look at the ByteBuffer class.
Essentially, you create a buffer of a certain size, put() variables of different types at different positions, and then call array() at the end to retrieve the "raw" data representation:
But it's up to you to work out where to put padding-- i.e. how many bytes to skip between positions.
For reading data written from C, then you generally wrap() a ByteBuffer around some byte array that you've read from a file.
In case it's helpful, I've written more on ByteBuffer.
在 Java 中读取/写入 C 结构的一种便捷方法是使用 javolution Struct 类(请参阅 http://www.javolution.组织)。 这不会帮助您自动填充/对齐数据,但它确实使处理 ByteBuffer 中保存的原始数据更加方便。 如果您不熟悉 javolution,那么它非常值得一看,因为里面还有很多其他很酷的东西。
A handy way of reading/writing C structs in Java is to use the javolution Struct class (see http://www.javolution.org). This won't help you with automatically padding/aligning your data, but it does make working with raw data held in a ByteBuffer much more convenient. If you're not familiar with javolution, it's well worth a look as there's lots of other cool stuff in there too.
该漏洞是可配置的,编译器具有按 1/2/4/8 字节对齐结构的开关。
所以第一个问题是:您到底想模拟哪种对齐方式?
This hole is configurable, compiler has switches to align structs by 1/2/4/8 bytes.
So the first question is: Which alignment exactly do you want to simulate?
对于 Java,数据类型的大小由语言规范定义。 例如,
byte
类型为 1 个字节,short
为 2 个字节,依此类推。 这与 C 不同,在 C 中,每种类型的大小取决于体系结构。因此,为了能够将文件读入 Java,了解二进制文件的格式非常重要。
可能需要采取措施来确定字段具有特定大小,以考虑编译器或体系结构的差异。 提到对齐似乎表明输出文件将取决于体系结构。
With Java, the size of data types are defined by the language specification. For example, a
byte
type is 1 byte,short
is 2 bytes, and so on. This is unlike C, where the size of each type is architecture-dependent.Therefore, it would be important to know how the binary file is formatted in order to be able to read the file into Java.
It may be necessary to take steps in order to be certain that fields are a specific size, to account for differences in the compiler or architecture. The mention of alignment seem to suggest that the output file will depend on the architecture.
你可以尝试 preon:
它可以处理大/小端二进制数据、对齐(填充)和各种数字类型以及其他功能。 这是一个非常好的图书馆,我非常喜欢它
我的 0.02 美元
you could try preon:
it can handle Big/Little endian binary data, alignment (padding) and various numeric types along other features. It is a very nice library, I like it very much
my 0.02$
我强烈推荐协议缓冲区来解决这个问题。
I highly recommend protocol buffers for exactly this problem.
据我了解,您是说您无法控制 C 程序的输出。 你必须把它视为既定的。
那么您是否必须读取此文件以获取某些特定的结构集,或者是否必须在一般情况下解决此问题? 我的意思是,有人说“这是程序 X 创建的文件,你必须用 Java 来读取它”的问题吗? 或者他们期望你的Java程序读取C源代码,找到结构体定义,然后用Java读取它?
如果您有一个特定的文件要读取,那么这个问题实际上并不是很困难。 通过查看 C 编译器规范或研究示例文件,找出填充位置。 然后在 Java 端,以字节流的形式读取文件,并构建您知道即将到来的值。 基本上,我会编写一组函数来从 InputStream 中读取所需数量的字节,并将它们转换为适当的数据类型。 喜欢:
As I understand it, you're saying that you don't control the output of the C program. You have to take it as given.
So do you have to read this file for some specific set of structures, or do you have to solve this in a general case? I mean, is the problem that someone said, "Here's the file created by program X, you have to read it in Java"? Or do they expect your Java program to read the C source code, find the structure definition, and then read it in Java?
If you've got a specific file to read, the problem isn't really very difficult. Either by reviewing the C compiler specifications or by studying example files, figure out where the padding is. Then on the Java side, read the file as a stream of bytes, and build the values you know are coming. Basically I'd write a set of functions to read the required number of bytes from an InputStream and turn them into the appropriate data type. Like:
您可以更改 c 端的打包以确保不使用填充,或者您可以在十六进制编辑器中查看生成的文件格式,以允许您用 Java 编写一个忽略填充字节的解析器。
You can alter the packing on the c side to ensure that no padding is used, or alternatively you can look at the resultant file format in a hex editor to allow you to write a parser in Java that ignores bytes that are padding.