如何读取二进制 C++ protobuf数据使用Python protobuf?
Google protobuf 的 Python 版本只为我们提供了:
SerializeAsString()
而 C++ 版本为我们提供了:
SerializeToArray(...)
SerializeAsString()
我们以二进制格式写入 C++ 文件,并且我们希望保持这种方式。也就是说,有没有一种方法可以将二进制数据读入 Python 并像字符串一样解析它?
这是正确的做法吗?
binary = get_binary_data()
binary_size = get_binary_size()
string = None
for i in range(len(binary_size)):
string += i
message = new MyMessage()
message.ParseFromString(string)
更新:
这是一个新示例,还有一个问题:
message_length = 512
file = open('foobars.bin', 'rb')
eof = False
while not eof:
data = file.read(message_length)
eof = not data
if not eof:
foo_bar = FooBar()
foo_bar.ParseFromString(data)
当我们到达 foo_bar.ParseFromString(data) 行,我收到此错误:
Exception Type: DecodeError
Exception Value: Too many bytes when decoding varint.
更新 2:
事实证明,二进制数据上的填充正在抛出 protobuf;正如消息所示,发送了太多字节(在本例中指的是填充)。
此填充来自在固定长度缓冲区上使用 C++ protobuf 函数 SerializeToArray
。为了消除这个问题,我使用了这个临时代码:
message_length = 512
file = open('foobars.bin', 'rb')
eof = False
while not eof:
data = file.read(message_length)
eof = not data
string = ''
for i in range(0, len(data)):
byte = data[i]
if byte != '\xcc': # yuck!
string += data[i]
if not eof:
foo_bar = FooBar()
foo_bar.ParseFromString(string)
我认为这里存在一个设计缺陷。我将重新实现我的 C++ 代码,以便它将可变长度数组写入二进制文件。根据 protobuf 文档的建议,我将为每条消息添加二进制大小的前缀,以便我知道使用 Python 打开文件时要读取多少内容。
The Python version of Google protobuf gives us only:
SerializeAsString()
Where as the C++ version gives us both:
SerializeToArray(...)
SerializeAsString()
We're writing to our C++ file in binary format, and we'd like to keep it this way. That said, is there a way of reading the binary data into Python and parsing it as if it were a string?
Is this the correct way of doing it?
binary = get_binary_data()
binary_size = get_binary_size()
string = None
for i in range(len(binary_size)):
string += i
message = new MyMessage()
message.ParseFromString(string)
Update:
Here's a new example, and a problem:
message_length = 512
file = open('foobars.bin', 'rb')
eof = False
while not eof:
data = file.read(message_length)
eof = not data
if not eof:
foo_bar = FooBar()
foo_bar.ParseFromString(data)
When we get to the foo_bar.ParseFromString(data)
line, I get this error:
Exception Type: DecodeError
Exception Value: Too many bytes when decoding varint.
Update 2:
It turns out, that the padding on the binary data was throwing protobuf off; too many bytes were being sent in, as the message suggests (in this case it was referring to the padding).
This padding comes from using the C++ protobuf function, SerializeToArray
on a fixed-length buffer. To eliminate this, I have used this temproary code:
message_length = 512
file = open('foobars.bin', 'rb')
eof = False
while not eof:
data = file.read(message_length)
eof = not data
string = ''
for i in range(0, len(data)):
byte = data[i]
if byte != '\xcc': # yuck!
string += data[i]
if not eof:
foo_bar = FooBar()
foo_bar.ParseFromString(string)
There is a design flaw here I think. I will re-implement my C++ code so that it writes variable length arrays to the binary file. As advised by the protobuf documentation, I will prefix each message with it's binary size so that I know how much to read when I'm opening the file with Python.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不是 Python 专家,但您可以将
file.read()
操作的结果传递到message.ParseFromString(...)
中,而无需构建新的字符串类型或任何东西。I'm not an expert with Python, but you can pass the result of a
file.read()
operation intomessage.ParseFromString(...)
without having to build a new string type or anything.Python 字符串可以包含任何字符,即它们能够直接保存“二进制”数据。应该不需要从字符串转换为“二进制”。
Python strings can contain any character, i.e. they are capable of holding "binary" data directly. There should be no need to convert from string to "binary".