我应该使用二进制文件还是文本文件来存储 protobuf 消息?

发布于 2024-08-13 06:59:18 字数 628 浏览 4 评论 0原文

使用 Google protobuf,我将序列化消息数据保存到一个文件中 - 每个文件中都有几条消息。我们有 C++ 和 Python 版本的代码,因此我需要使用两种语言都可用的 protobuf 函数。我尝试过使用 SerializeToArray 和 SerializeAsString ,似乎存在以下不幸的情况:

  1. SerializeToArray:正如一个答案中所建议的,使用它的最佳方法是为每条消息添加其数据大小的前缀。这对于 C++ 来说非常有效,但在 Python 中似乎不可能——我错了吗?

  2. SerializeAsString:这会生成一个相当于其二进制对应项的序列化字符串 - 我可以将其保存到文件中,但是如果序列化结果中的字符之一是 \n 会发生什么 - 我们如何找到行结尾或结尾与此相关的消息?

更新:

请允许我稍微改一下。据我了解,我无法用 C++ 编写二进制数据,因为这样我们的 Python 应用程序就无法读取数据,因为它只能解析字符串序列化消息。那么我应该在 C++ 和 Python 中使用 SerializeAsString 吗?如果是,那么最佳实践是将此类数据存储在文本文件而不是二进制文件中吗?我的直觉是二元的,但正如你所看到的,这看起来不像是一个选择。

Using Google protobuf, I am saving my serialized messaged data to a file - in each file there are several messages. We have both C++ and Python versions of the code, so I need to use protobuf functions that are available in both languages. I have experimented with using SerializeToArray and SerializeAsString and there seems to be the following unfortunate conditions:

  1. SerializeToArray: As suggested in one answer, the best way to use this is to prefix each message with it's data size. This would work great for C++, but in Python it doesn't look like this is possible - am I wrong?

  2. SerializeAsString: This generates a serialized string equivalent to it's binary counterpart - which I can save to a file, but what happens if one of the characters in the serialization result is \n - how do we find line endings, or the ending of messages for that matter?

Update:

Please allow me to rephrase slightly. As I understand it, I cannot write binary data in C++ because then our Python application cannot read the data, since it can only parse string serialized messages. Should I then instead use SerializeAsString in both C++ and Python? If yes, then is it best practice to store such data in a text file rather than a binary file? My gut feeling is binary, but as you can see this doesn't look like an option.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

坐在坟头思考人生 2024-08-20 06:59:18

我们已经取得了巨大的成功,对消息进行 Base64 编码,并使用简单的 \n 来分隔消息。这当然很大程度上取决于您的使用 - 我们需要将消息存储在“日志”文件中。它自然会产生编码/解码的开销 - 但这对我们来说根本不是问题。

迄今为止,将这些消息保留为行分隔文本的优点对于维护和调试来说是非常宝贵的。找出文件中有多少条消息? wc -l 。查找第 N 条消息 - head ... |尾巴。找出您需要通过 2 个 VPN 和一个 Citrix 解决方案访问的远程系统上的记录出了什么问题?复制粘贴消息并将其邮寄给程序员。

We have had great success base64 encoding the messages, and using a simple \n to separate messages. This will ofcoirse depend a lot on your use - we need to store the messages in "log" files. It naturally has overhead encoding/decoding this - but this has not even remotely been an issue for us.

The advantage of keeping these messages as line separated text has so far been invaluable for maintenance and debugging. Figure out how many messages are in a file ? wc -l . Find the Nth message - head ... | tail. Figure out what's wrong with a record on a remote system you need to access through 2 VPNs and a citrix solution ? copy paste the message and mail it to the programmer.

空城仅有旧梦在 2024-08-20 06:59:18

以这种方式连接消息的最佳实践是在每条消息前面添加其大小。这样你就可以读取大小(尝试使用 32 位 int 或其他值),然后将字节数读入缓冲区并反序列化。然后读取下一个大小,等等。

写入也是如此,首先写出消息的大小,然后写出消息本身。

有关详细信息,请参阅 protobuf 文档中的流式传输多条消息

The best practice for concatenating messages in this way is to prepend each message with its size. That way you read in the size (try a 32bit int or something), then read that number of bytes into a buffer and deserialize it. Then read the next size, etc. etc.

The same goes for writing, you first write out the size of the message, then the message itself.

See Streaming Multiple Messages on the protobuf documentation for more information.

锦欢 2024-08-20 06:59:18

Protobuf 是一种二进制格式,因此读写应该以二进制方式进行,而不是文本。
如果你不想要二进制格式,你应该考虑使用除 protobuf 之外的其他格式(有很多文本数据格式,例如 XML、JSON、CSV);仅使用文本抽象是不够的。

Protobuf is a binary format, so reading and writing should be done as binary, not text.
If you don't want binary format, you should consider using something other than protobuf (there are lots of textual data formats, such as XML, JSON, CSV); just using text abstractions is not enough.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文