C 中优雅的二进制 I/O?
我最近使用 C/C++ 加载了大量二进制文件,我对它的不优雅感到困扰。要么我得到很多看起来像这样的代码(我已经继续前进):
uint32_t type, k;
uint32_t *variable;
FILE *f;
if (!fread(&type, 4, 1, f))
goto boundsError;
if (!fread(&k, 4, 1, f))
goto boundsError;
variable = malloc(4 * k);
if (!fread(variable, 4 * k, 1, f))
goto boundsError;
要么,我定义一个本地的打包结构,以便我可以更轻松地读取恒定大小的块。然而,在我看来,对于这样一个简单的问题——即将指定的文件读入内存——可以更有效地以更可读的方式完成。有人有任何提示/技巧等吗?我想澄清一下,我并不是在寻找图书馆或其他东西来处理这个问题;如果我正在设计自己的文件并且必须大量更改文件规范,我可能会受到诱惑,但现在我只是在寻找风格上的答案。
另外,你们中的一些人可能会建议 mmap
——我喜欢 mmap!我经常使用它,但它的问题是,它会导致处理未对齐数据类型的令人讨厌的代码,而在使用 stdio 时,这种代码并不真正存在。最后,我将编写类似 stdio 的包装函数来从内存中读取。
谢谢!
编辑:我还应该澄清,我无法更改文件格式 - 有一个我必须读取的二进制文件;我无法请求其他格式的数据。
I've been loading a lot of binary files recently using C/C++, and I'm bothered by how inelegant it can be. Either I get a lot of code that looks like this (I've since moved on):
uint32_t type, k;
uint32_t *variable;
FILE *f;
if (!fread(&type, 4, 1, f))
goto boundsError;
if (!fread(&k, 4, 1, f))
goto boundsError;
variable = malloc(4 * k);
if (!fread(variable, 4 * k, 1, f))
goto boundsError;
Or, I define a local, packed struct so that I can read in constant-sized blocks easier. It seems to me, however, that for such a simple problem—that is, reading a specified file into memory—could be done more efficiently and in more of a readable manner. Does anyone have any tips/tricks etc? I'd like to clarify that I'm not looking for a library or something to handle this; I might be tempted if I were designing my own file and had to change the file spec a lot, but for now I'm just looking for stylistic answers.
Also, some of you might suggest mmap
—I love mmap! I use it a lot, but the problem with it is that it leads to nasty code for handling unaligned data types, which doesn't really exist when using stdio. In the end, I'd be writing stdio-like wrapper functions for reading from memory.
Thanks!
EDIT: I should also clarify that I can't change file formats—there's a binary file that I have to read; I can't request the data in another format.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
对于这个问题,我见过的最优雅的解决方案是 Sean Barrett 的
writefv,它在他的小型图像写入库
stb_image_write
中使用,可用 此处。他只实现了一些原语(并且没有错误处理),但是相同的方法可以扩展到基本上是二进制printf
(并且为了阅读,您可以执行相同的操作来获取二进制>scanf
)。非常优雅整洁!事实上,整个事情非常简单,我不妨将其包含在这里:以下是他如何使用它编写真彩色 .BMP 文件:(
省略了
write_pixels
的定义,因为它在这里非常切题)The most elegant solution I've seen for this problem yet is Sean Barrett's
writefv
, used in his tiny image-writing librarystb_image_write
available here. He only implements a few primitives (and no error handling), but the same approach can be extended to what is basically a binaryprintf
(and for reading, you can do the same to get a binaryscanf
). Very elegant and tidy! In fact, the whole thing is so simple, I might as well include it here:and here is how he writes truecolor .BMP files using it:
(definition of
write_pixels
elided since it's pretty tangential here)如果要反序列化二进制数据,一种选择是为要使用的结构定义序列化宏。在使用模板函数和流的 C++ 中,这要容易得多。 (boost::serialization是一个非侵入式序列化库,但如果你想侵入式,你可以让它更优雅)
简单的C宏:
用法:
而且,是的,序列化代码是一些最无聊和脑残的代码要写的代码。如果可以的话,使用元数据描述您的数据结构,然后自动生成代码。有一些工具和库可以帮助解决这个问题,或者您可以使用 Perl 或 Python 或 PowerShell 等自行开发。
If you want to de-serialize binary data, one option is to define serialization macros for the structs that you want to use. This is a lot easier in C++ with template functions and streams. (boost::serialization is a non-intrusive serialization library, but if you want to go intrusive, you can make it more elegant)
Simple C macros:
Usage:
And, yes, serialization code is some of the most boring and brain-dead code to write. If you can, describe your data structures using metadata, and generate the code mechanically instead. There are tools and libs to help with this, or you can roll your own in Perl or Python or PowerShell or whatever.
您可能对协议缓冲区和其他 IDL 方案感兴趣。
You might be interested in protocol buffers and other IDL schemes.
我会通过稍微重构来让你的代码看起来不那么不雅观,这样你的复杂数据结构就可以通过对其底层类型的一系列调用来读取。
我假设您的代码是纯 C 而不是 C++,因为在后者中您可能会抛出异常而不是使用 goto 语句。
I would make your code less inelegant looking by refactoring it out a bit, so your complex data structures are read with a series of calls of its underlying types.
I assume your code is pure C and not C++ because in the latter you would probably throw exceptions rather than using goto statements.
数组读取部分看起来应该有自己的可重用功能。除此之外,如果您确实有可用的 C++(从问题中尚不完全清楚),则无需对变量的大小进行硬编码,因为可以从指针推导出大小。
当然
,您可以继续使用
malloc
和free
而不是new[]
和delete[]
如果数据被传递给假设使用了malloc
的代码。The array-reading part looks like it deserves its own reusable function. Beyond that, if you do actually have C++ available (it isn't completely clear from the question), then hard-coding the size of variables is unnecessary, as the size can be deduced from the pointer.
and then
Of course, feel free to continue using
malloc
andfree
instead ofnew[]
anddelete[]
if the data is being handed off to code that assume thatmalloc
was used.这是我想出的一些 C99 代码:
您的示例如下:
Here's some C99 code I came up with:
Your example would read: