从 OCaml 获取 C 二进制数据
(为了论证而忽略字节序 - 这只是一个测试用例/概念证明 - 而且我也永远不会在实际代码中使用 strcpy
!)
考虑以下简单的 C 代码:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* variables of type message_t will be stored contiguously in memory */
typedef struct {
int message_id;
char message_text[80];
} message_t;
int main(int argc, char**argv) {
message_t* m = (message_t*)malloc(sizeof(message_t));
m->message_id = 1;
strcpy(m->message_text,"the rain in spain falls mainly on the plain");
/* write the memory to disk */
FILE* fp = fopen("data.dat", "wb");
fwrite((void*)m, sizeof(int) + strlen(m->message_text) + 1, 1, fp);
fclose(fp);
exit(EXIT_SUCCESS);
}
它写入的文件可以轻松地从磁盘读回:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int message_id;
char message_text[80];
} message_t;
int main(int argc, char**argv) {
message_t* m = (message_t*)malloc(sizeof(message_t));
FILE* fp = fopen("data.dat", "rb");
fread((void*)m, sizeof(message_t), 1, fp);
fclose(fp);
/* block of memory has structure "overlaid" onto it */
printf("message_id=%d, message_text='%s'\n", m->message_id, m->message_text);
exit(EXIT_SUCCESS);
}
例如,
$ ./write
$ ./read
message_id=1, message_text='the rain in spain falls mainly on the plain'
我的问题是,在 OCaml 中,如果我所拥有的只是:
type message_t = {message_id:int; message_text:string}
我将如何获取该数据? Marshal
做不到,input_binary_int
也做不到。我可以调用 C 中的辅助函数,例如“什么是 sizeof(int)
”,然后获取 n 个字节并调用 C 函数来“将这些字节转换为 int”,但在这种情况下,我无法添加任何新的 C 代码,“解包”必须在 OCaml 中完成,基于我所知道的“应该”。这只是在 sizeof
块中迭代字符串或查找 '\0' 的问题,还是有一个聪明的方法?谢谢!
(Ignoring endianness for the sake of argument - this is just a test case/proof of concept - and I would never use strcpy
in real code either!)
Consider the following trivial C code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* variables of type message_t will be stored contiguously in memory */
typedef struct {
int message_id;
char message_text[80];
} message_t;
int main(int argc, char**argv) {
message_t* m = (message_t*)malloc(sizeof(message_t));
m->message_id = 1;
strcpy(m->message_text,"the rain in spain falls mainly on the plain");
/* write the memory to disk */
FILE* fp = fopen("data.dat", "wb");
fwrite((void*)m, sizeof(int) + strlen(m->message_text) + 1, 1, fp);
fclose(fp);
exit(EXIT_SUCCESS);
}
The file it writes can easily be read back in from disk:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int message_id;
char message_text[80];
} message_t;
int main(int argc, char**argv) {
message_t* m = (message_t*)malloc(sizeof(message_t));
FILE* fp = fopen("data.dat", "rb");
fread((void*)m, sizeof(message_t), 1, fp);
fclose(fp);
/* block of memory has structure "overlaid" onto it */
printf("message_id=%d, message_text='%s'\n", m->message_id, m->message_text);
exit(EXIT_SUCCESS);
}
E.g.
$ ./write
$ ./read
message_id=1, message_text='the rain in spain falls mainly on the plain'
My question is, in OCaml, if all I have is:
type message_t = {message_id:int; message_text:string}
How would I get at that data? Marshal
can't do it, nor can input_binary_int
. I can call out to helper functions in C like "what is sizeof(int)
" then get n bytes and call a C function to "convert these bytes into an int" for example but in this case I can't add any new C code, the "unpacking" has to be done in OCaml, based on what I know it "should" be. Is it just a matter of iterating over the string either in blocks of sizeof
s or looking for '\0' or is there a clever way? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对于进行这种低级结构处理,我发现OCaml Bitstring非常方便。如果您将所有 80 个字符写入磁盘,则 message_t 的等效读取器将是这样的:
按原样,您必须修剪
message_text
,但也许位字符串就是您想要执行此类任务的内容一般的。For doing this kind of low level struct handling, I find OCaml Bitstring very convenient. The equivalent reader for your message_t would be this if you wrote all 80 characters to disk:
As is, you'll have to trim
message_text
, but maybe bitstring is what you want to do this kind of task in general.在弄清楚如何在 Ocaml 中进行编码之前,您需要弄清楚数据表示是什么。您的 C 代码在读取器和写入器之间不一致:写入器仅为字符串写入
strlen(m->message_text)+1
字节,而读取器期望完整的最大 80 字节。我的建议是使用同一种语言(C 或 Ocaml)进行所有编组。我推荐 Ocaml 的编组库,它已经可以工作、跨平台且易于使用。
如果您需要 C 和 Ocaml 编组代码之间的互操作性,那么您需要制定编组格式,并在两种语言中实现相同的规范。在执行此操作之前,请考虑是否可以使用文本表示形式,这种表示形式不易出错,并且更易于使用第三方工具检查和操作,但体积更大。 JSON 是一种轻量级的数据表示格式,或者您也可以转向重量级的 XML。如果您的所有数据确实像整数和字符串一样简单,并且字符串不包含换行符,则可以以十进制形式编写整数,后跟空格(或
:
或>,
) 后跟字符串,后跟换行符。如果 C 编组格式是预定义的并且您无法更改它,请注意它是依赖于平台的(取决于体系结构和 C 编译器),并且 Ocaml 不允许您访问此类平台详细信息。因此,最好的选择是将 Ocaml 程序与 C 助手链接,确保助手使用与原始应用程序相同的 C 类型表示形式(
sizeof(int)
、字节顺序、结构填充)。Before you can figure out how to code this in Ocaml, you need to figure out what your data representation is. Your C code isn't consistent between the reader and the writer: the writer only writes
strlen(m->message_text)+1
bytes for the string, whereas the reader expects the full maximum 80 bytes.My advice is to do all your marshalling in the same language, either C or Ocaml. I recommend Ocaml's marshalling library, which is already working, cross-platform and easy to use.
If you need interoperability between C and Ocaml marshalling code, then you need to sit down a marshalling format, and implement that same specification in both languages. Before you do that, consider if you can use a text representation, which will be less error-prone and easier to inspect and manipulate with third-party tools, but bulkier. JSON is a lightweight data representation format, or you can turn to the heavyweight XML. If all your data is truly as simple as an integer and a string, and the strings don't contain newlines, you can write the integer in decimal followed by a space (or a
:
or a,
) followed by the string followed by a newline.If the C marshalling format is predefined and you can't change it, note that it's platform-dependent (depends on the architecture and the C compiler), and Ocaml doesn't give you access to such platform details. So your best bet is to link your Ocaml program with a C helper, making sure that your helper uses the same C type representation (
sizeof(int)
, endianness, structure padding) as the original application.您依靠在同一平台上使用相同的 C 编译器来避免必须考虑写入和读回数据的格式。不幸的是,如果您尝试在 C 和 OCaml 之间进行互操作,您就没有那么奢侈了。您必须计算结构中的字节数,确定整数是小端还是大端,并在 OCaml 端进行相应的编码。
您必须分别手动解组每种类型,实际上是解析二进制文件。例如,要读取小端 32 位整数,您必须使用:
并读取以 NUL 结尾的字符串:
如果一切正常,您可以使用以下命令读回您的结构:
注意:这是命令 /em> (!) 对读取进行排序,以避免乱序读取字段。不要使用并行的let 赋值。
You are relying on using the same C compiler on the same platform to avoid having to think about what the format of the written and read back data is. Unfortunately you don't have that luxury if you are trying to interoperate between C and OCaml. You have to count the bytes in the structure, figure out if the integer is little- or big-endian, and code accordingly on the OCaml side.
You'll have to manually unmarshall each type separately, in effect parsing the binary file. For instance, to read a little-endian 32-bit integer you'd have to use:
and to read a NUL-terminated string:
If everything is right you can read back your structure with:
Note: it is imperative (!) to sequence the reads to avoid reading fields out-of-order. Do not use parallel
let
assignments.谢谢大家的建议;我已经写下了该方法我决定把我的博客。
Thanks for advice all; I have written up the approach I decided to take in my blog.