如何读取二进制 C++ protobuf数据使用Python protobuf？

发布于 2024-08-13 23:33:19 字数 1673 浏览 2 评论 0原文

Google protobuf 的 Python 版本只为我们提供了：

SerializeAsString()

而 C++ 版本为我们提供了：

SerializeToArray(...)
SerializeAsString()

我们以二进制格式写入 C++ 文件，并且我们希望保持这种方式。也就是说，有没有一种方法可以将二进制数据读入 Python 并像字符串一样解析它？

~~这是正确的做法吗？~~

~~binary = get_binary_data() binary_size = get_binary_size() string = None for i in range(len(binary_size)): string += i message = new MyMessage() message.ParseFromString(string)~~

更新：

这是一个新示例，还有一个问题：

message_length = 512

file = open('foobars.bin', 'rb')

eof = False
while not eof:

    data = file.read(message_length)
    eof = not data

    if not eof:
        foo_bar = FooBar()
        foo_bar.ParseFromString(data)

当我们到达 foo_bar.ParseFromString(data) 行，我收到此错误：

Exception Type: DecodeError
Exception Value: Too many bytes when decoding varint.

更新 2：

事实证明，二进制数据上的填充正在抛出 protobuf；正如消息所示，发送了太多字节（在本例中指的是填充）。

此填充来自在固定长度缓冲区上使用 C++ protobuf 函数 SerializeToArray。为了消除这个问题，我使用了这个临时代码：

message_length = 512

file = open('foobars.bin', 'rb')

eof = False
while not eof:

    data = file.read(message_length)
    eof = not data

    string = ''
    for i in range(0, len(data)):
        byte = data[i]
        if byte != '\xcc': # yuck!
            string += data[i]

    if not eof:
        foo_bar = FooBar()
        foo_bar.ParseFromString(string)

我认为这里存在一个设计缺陷。我将重新实现我的 C++ 代码，以便它将可变长度数组写入二进制文件。根据 protobuf 文档的建议，我将为每条消息添加二进制大小的前缀，以便我知道使用 Python 打开文件时要读取多少内容。

原文

The Python version of Google protobuf gives us only:

SerializeAsString()

Where as the C++ version gives us both:

SerializeToArray(...)
SerializeAsString()

We're writing to our C++ file in binary format, and we'd like to keep it this way. That said, is there a way of reading the binary data into Python and parsing it as if it were a string?

~~Is this the correct way of doing it?~~

binary = get_binary_data()
binary_size = get_binary_size()

string = None
for i in range(len(binary_size)):
   string += i

message = new MyMessage()
message.ParseFromString(string)

Update:

Here's a new example, and a problem:

message_length = 512

file = open('foobars.bin', 'rb')

eof = False
while not eof:

    data = file.read(message_length)
    eof = not data

    if not eof:
        foo_bar = FooBar()
        foo_bar.ParseFromString(data)

When we get to the foo_bar.ParseFromString(data) line, I get this error:

Exception Type: DecodeError
Exception Value: Too many bytes when decoding varint.

Update 2:

It turns out, that the padding on the binary data was throwing protobuf off; too many bytes were being sent in, as the message suggests (in this case it was referring to the padding).

This padding comes from using the C++ protobuf function, SerializeToArray on a fixed-length buffer. To eliminate this, I have used this temproary code:

message_length = 512

file = open('foobars.bin', 'rb')

eof = False
while not eof:

    data = file.read(message_length)
    eof = not data

    string = ''
    for i in range(0, len(data)):
        byte = data[i]
        if byte != '\xcc': # yuck!
            string += data[i]

    if not eof:
        foo_bar = FooBar()
        foo_bar.ParseFromString(string)

There is a design flaw here I think. I will re-implement my C++ code so that it writes variable length arrays to the binary file. As advised by the protobuf documentation, I will prefix each message with it's binary size so that I know how much to read when I'm opening the file with Python.

分享到QQ

分享到微博