Marshall 多个 protobuf 到文件

发布于 2024-09-14 17:44:24 字数 2184 浏览 4 评论 0原文

背景：

我正在使用 Google 的 protobuf，并且我会例如使用 C++ 将几 GB 的 protobuf 编组数据读/写到文件中。由于建议将每个 protobuf 对象的大小保持在 1MB 以下，因此我认为写入文件的二进制流（如下所示）是可行的。每个偏移量包含到下一个偏移量的字节数，直到到达文件末尾。这样，每个 protobuf 都可以保持在 1MB 以下，并且我可以随心所欲地将它们组合在一起。

[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]

我有一个可以在 Github 上运行的实现：

src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp

但我觉得我已经写了一些糟糕的代码，希望得到一些关于如何改进它的建议。因此，

问题：

我使用 reinterpret_cast 在二进制 fstream 中读取/写入 32 位整数。由于我使用的是 protobuf，因此我假设所有机器都是小端字节序。我还断言 int 确实是 4 个字节。 考虑到这两个限制假设，是否有更好的方法将 32 位整数读/写到二进制 fstream ？
在读取 fstream 时，我创建了一个临时固定长度 char 缓冲区，以便我可以将此固定长度缓冲区传递到 protobuf 库以使用 进行解码ParseFromArray，因为 ParseFromIstream 将消耗整个流。我真的更愿意告诉库最多从 fstream 读取接下来的 N 个字节，但 protobuf 中似乎没有该功能。 传递 fstream 最多 N 个字节的函数的最惯用方法是什么？或者我的设计是否足够颠倒，以至于我应该考虑完全不同的方法？

编辑：

@codymanix：我正在转换为char，因为istream::read 需要 char 数组，如果我没记错的话。我也没有使用提取运算符 >> 因为我读到它与二进制流一起使用的形式很差。或者这最后一条建议是假的？
@Martin York：删除了 new/delete 以支持std::vector。 glob.cpp 现已更新。谢谢！

原文

Background:

I'm using Google's protobuf, and I would like to read/write several gigabytes of protobuf marshalled data to a file using C++. As it's recommended to keep the size of each protobuf object under 1MB, I figured a binary stream (illustrated below) written to a file would work. Each offset contains the number of bytes to the next offset until the end of the file is reached. This way, each protobuf can stay under 1MB, and I can glob them together to my heart's content.

[int32 offset]
[protobuf blob 1]
[int32 offset]
[protobuf blob 2]
...
[eof]

I have an implemntation that works on Github:

src/glob.hpp
src/glob.cpp
test/readglob.cpp
test/writeglob.cpp

But I feel I have written some poor code, and would appreciate some advice on how to improve it. Thus,

Questions:

I'm using reinterpret_cast<char*> to read/write the 32 bit integers to and from the binary fstream. Since I'm using protobuf, I'm making the assumption that all machines are little-endian. I also assert that an int is indeed 4 bytes. Is there a better way to read/write a 32 bit integer to a binary fstream given these two limiting assumptions?
In reading from fstream, I create a temporary fixed-length char buffer, so that I can then pass this fixed-length buffer to the protobuf library to decode using ParseFromArray, as ParseFromIstream will consume the entire stream. I'd really prefer just to tell the library to read at most the next N bytes from fstream, but there doesn't seem to be that functionality in protobuf. What would be the most idiomatic way to pass a function at most N bytes of an fstream? Or is my design sufficiently upside down that I should consider a different approach entirely?

Edit:

@codymanix: I'm casting to char since istream::read requires a char array if I'm not mistaken. I'm also not using the extraction operator >> since I read it was poor form to use with binary streams. Or is this last piece of advice bogus?
@Martin York: Removed new/delete in favor of std::vector<char>. glob.cpp is now updated. Thanks!