提升序列化与谷歌协议缓冲区？

压抑⊿情绪 2024-08-02 14:28:59

您可以将 boost 序列化与“真实”域对象紧密结合使用，并序列化完整的对象层次结构（继承）。 Protobuf 不支持继承，因此您必须使用聚合。人们认为 Protobuf 应该用于 DTO（数据传输对象），而不是用于核心域对象本身。我使用了 boost::serialization 和 protobuf。应考虑 boost::serialization 的性能， cereal 可能是替代方案。

回复收藏 0 原文

混吃等死 2024-08-02 14:28:41

与工程学中的几乎所有事情一样，我的答案是……“这取决于情况。”

两者都是经过充分测试和审查的技术。两者都会获取您的数据并将其转换为适合发送到某个地方的内容。两者都可能足够快，如果您真的在这里或那里计算一个字节，您可能不会对其中任何一个感到满意（让我们面对现实，创建的数据包都只是 XML 或 JSON 的一小部分）。

对我来说，这实际上取决于工作流程以及另一端是否需要 C++ 以外的东西。

如果您想首先弄清楚消息内容并且要从头开始构建系统，请使用协议缓冲区。您可以以抽象的方式思考消息，然后以您想要的任何语言自动生成代码（第三方插件几乎适用于所有内容）。此外，我发现 Protocol Buffers 简化了协作。我只需发送一个 .proto 文件，然后其他团队就可以清楚地了解正在传输的数据。我也不强加任何东西给他们。如果他们想使用 Java，那就继续吧！

如果我已经用 C++ 构建了一个类（这种情况经常发生）并且我现在想通过网络发送该数据，Boost 序列化显然很有意义（特别是当我在其他地方已经有 Boost 依赖项时）。

回复收藏 0 原文

乖乖公主 2024-08-02 14:28:25

我知道这是一个老问题了，但我想我应该投入 2 便士！

有了 boost，你就有机会在你的类中编写一些数据验证；这很好，因为数据定义和有效性检查都在一个地方。

使用 GPB，您能做的最好的事情就是在 .proto 文件中添加注释，并希望使用它的人能够阅读它、关注它并自行实施有效性检查。

不用说，如果您依赖网络流另一端的其他人以与自己相同的活力来完成此操作，那么这是不可能且不可靠的。另外，如果有效性约束发生变化，则需要计划、协调和完成多个代码更改。

因此，我认为 GPB 不适合几乎没有机会与所有团队成员定期会面和交谈的开发。

==编辑==

我的意思是这样的：

message Foo
{
    int32 bearing = 1;
}

现在谁能说bearing的有效范围是多少？我们可以拥有

message Foo
{
    int32 bearing = 1;  // Valid between 0 and 359
}

但这取决于其他人阅读本文并为其编写代码。例如，如果您对其进行编辑，则约束变为：

message Foo
{
    int32 bearing = 1;  // Valid between -180 and +180
}

您完全依赖于使用此 .proto 更新其代码的每个人。这是不可靠且昂贵的。

至少通过 Boost 序列化，您可以分发单个 C++ 类，并且可以直接在其中内置数据有效性检查。如果这些约束发生变化，那么除了确保他们使用与您相同版本的源代码之外，其他人不需要做任何工作。

替代方案

还有一个替代方案：ASN.1。这是古老的，但有一些非常非常方便的东西：

Foo ::= SEQUENCE
{
   bearing INTEGER (0..359)
}

注意约束。因此，每当有人使用这个 .asn 文件并生成代码时，他们最终都会得到自动检查 bearing 是否在 0 到 359 之间的代码。如果您更新 .asn 文件，

Foo ::= SEQUENCE
{
   bearing INTEGER (-180..180)
}

他们所需要做的就是是重新编译。无需更改其他代码。

您还可以执行以下操作：

bearingMin INTEGER ::= 0
bearingMax INTEGER ::= 360

Foo ::= SEQUENCE
{
   bearing INTEGER (bearingMin..<bearingMax)
}

记下 <。而且在大多数工具中，bearingMin 和bearingMax 可以在生成的代码中显示为常量。这非常有用。

约束可以非常复杂：

Garr ::= INTEGER (0..10 | 25..32)

请参阅此 PDF；你能做的事情真是令人惊奇；

数组也可以受到约束：

Bar ::= SEQUENCE (SIZE(1..5)) OF Foo
Sna ::= SEQUENCE (SIZE(5)) OF Foo
Fee ::= SEQUENCE 
{
    boo SEQUENCE (SIZE(1..<6)) OF INTEGER (-180<..<180)
}

ASN.1 是老式的，但仍在积极开发、广泛使用（您的手机经常使用它），并且比大多数其他序列化技术灵活得多。我发现的唯一缺陷是 Python 没有合适的代码生成器。如果您使用 C/C++、C#、Java、ADA，那么免费（C/C++、ADA）和商业（C/C++、C#、JAVA）工具的组合将为您提供良好的服务。

我特别喜欢基于二进制和文本的线格式的广泛选择。这使得它在某些项目中非常方便。线格式列表当前包括：

BER（二进制）
PER（二进制、对齐和未对齐。这是超位效率的。例如，约束在 0 和 15 之间的 INTEGER 将采用线上仅 4 位）
OER
DER（另一种二进制）
XML（也是 XER）
JSON（全新，工具支持仍在开发中）

以及其他。

注意最后两个吗？是的，您可以在 ASN.1 中定义数据结构、生成代码以及以 XML 和 JSON 格式发出/使用消息。对于一项 20 世纪 80 年代兴起的技术来说，这已经相当不错了。

版本控制的方式与 GPB 不同。您可以允许扩展：

Foo ::= SEQUENCE
{
   bearing INTEGER (-180..180),
   ...
}

这意味着以后我可以添加到 Foo，并且具有此版本的旧系统仍然可以工作（但只能访问 bearing场地）。

我对 ASN.1 的评价非常高。处理起来可能很痛苦（工具可能要花钱，生成的代码不一定漂亮，等等）。但这些限制确实是一个非常棒的功能，它一次又一次地让我免于心痛。当编码器/解码器报告他们生成了无效数据时，开发人员会非常抱怨。

其他链接：

观察

共享数据：

代码优先方法（例如 Boost 序列化）将您限制为原始语言（例如 C++），或者迫使您用另一种语言做大量额外工作。
模式优先更好，但是
- 其中许多在共享合同中留下了很大的空白（即没有限制）。 GPB 在这方面很烦人，因为它在其他方面都非常好。
- 有些有限制（例如 XSD、JSON），但工具支持不完整。
- 例如，Microsoft 的 xsd.exe 主动忽略 xsd 文件中的约束（MS 的借口确实站不住脚）。 XSD 很好（从约束的角度来看），但如果您不能相信其他人会使用一个好的 XSD 工具来为他/她强制执行这些约束，那么 XSD 的价值就会减弱
- JSON 验证器没问题，但它们首先不会帮助您形成 JSON，并且不会自动调用。无法保证向您发送 JSON 消息的人已通过验证器运行它。您必须记住亲自验证它。
- ASN.1 工具似乎都实现了约束检查。

所以对我来说，ASN.1 可以做到这一点。它是最不可能导致其他人犯错误的一种，因为它具有正确的功能，并且所有工具似乎都致力于完全实现这些功能，并且对于大多数用途来说，它是语言中立的。

说实话，如果GPB加一个约束机制那就是赢家了。 XSD 很接近，但工具几乎都是垃圾。如果有其他语言的不错的代码生成器，JSON 模式会非常好。

如果 GPB 添加了约束（注意：这不会改变任何有线格式），那么我会向每个人推荐几乎所有用途的约束。尽管 ASN.1 的 uPER 对于无线电链路非常有用。

I know that this is an older question now, but I thought I'd throw my 2 pence in!

With boost you get the opportunity to I'm write some data validation in your classes; this is good because the data definition and the checks for validity are all in one place.

With GPB the best you can do is to put comments in the .proto file and hope against all hope that whoever is using it reads it, pays attention to it, and implements the validity checks themselves.

Needless to say this is unlikely and unreliable if your relying on someone else at the other end of a network stream to do this with the same vigour as oneself. Plus if the constraints on validity change, multiple code changes need to be planned, coordinated and done.

Thus I consider GPB to be inappropriate for developments where there is little opportunity to regularly meet and talk with all team members.

==EDIT==

The kind of thing I mean is this:

message Foo
{
    int32 bearing = 1;
}

Now who's to say what the valid range of bearing is? We can have

message Foo
{
    int32 bearing = 1;  // Valid between 0 and 359
}

But that depends on someone else reading this and writing code for it. For example, if you edit it and the constraint becomes:

message Foo
{
    int32 bearing = 1;  // Valid between -180 and +180
}

you are completely dependent on everyone who has used this .proto updating their code. That is unreliable and expensive.

At least with Boost serialisation you're distributing a single C++ class, and that can have data validity checks built right into it. If those constraints change, then no one else need do any work other than making sure they're using the same version of the source code as you.

Alternative

There is an alternative: ASN.1. This is ancient, but has some really, really, handy things:

Foo ::= SEQUENCE
{
   bearing INTEGER (0..359)
}

Note the constraint. So whenever anyone consumes this .asn file, generates code, they end up with code that will automatically check that bearing is somewhere between 0 and 359. If you update the .asn file,

Foo ::= SEQUENCE
{
   bearing INTEGER (-180..180)
}

all they need to do is recompile. No other code changes are required.

You can also do:

bearingMin INTEGER ::= 0
bearingMax INTEGER ::= 360

Foo ::= SEQUENCE
{
   bearing INTEGER (bearingMin..<bearingMax)
}

Note the <. And also in most tools the bearingMin and bearingMax can appear as constants in the generated code. That's extremely useful.

Constraints can be quite elaborate:

Garr ::= INTEGER (0..10 | 25..32)

Look at Chapter 13 in this PDF; it's amazing what you can do;

Arrays can be constrained too:

Bar ::= SEQUENCE (SIZE(1..5)) OF Foo
Sna ::= SEQUENCE (SIZE(5)) OF Foo
Fee ::= SEQUENCE 
{
    boo SEQUENCE (SIZE(1..<6)) OF INTEGER (-180<..<180)
}

ASN.1 is old fashioned, but still actively developed, widely used (your mobile phone uses it a lot), and far more flexible than most other serialisation technologies. About the only deficiency that I can see is that there is no decent code generator for Python. If you're using C/C++, C#, Java, ADA then you are well served by a mixture of free (C/C++, ADA) and commercial (C/C++, C#, JAVA) tools.

I especially like the wide choice of binary and text based wireformats. This makes it extremely convenient in some projects. The wireformat list currently includes:

BER (binary)
PER (binary, aligned and unaligned. This is ultra bit efficient. For example, and INTEGER constrained between 0 and 15 will take up only 4 bits on the wire)
OER
DER (another binary)
XML (also XER)
JSON (brand new, tool support is still developing)

plus others.

Note the last two? Yes, you can define data structures in ASN.1, generate code, and emit / consume messages in XML and JSON. Not bad for a technology that started off back in the 1980s.

Versioning is done differently to GPB. You can allow for extensions:

Foo ::= SEQUENCE
{
   bearing INTEGER (-180..180),
   ...
}

This means that at a later date I can add to Foo, and older systems that have this version can still work (but can only access the bearing field).

I rate ASN.1 very highly. It can be a pain to deal with (tools might cost money, the generated code isn't necessarily beautiful, etc). But the constraints are a truly fantastic feature that has saved me a whole ton of heart ache time and time again. Makes developers whinge a lot when the encoders / decoders report that they've generated duff data.

Other links:

Observations

To share data:

Code first approaches (e.g. Boost serialisation) restrict you to the original language (e.g. C++), or force you to do a lot of extra work in another language
Schema first is better, but
- A lot of these leave big gaps in the sharing contract (i.e. no constraints). GPB is annoying in this regard, because it is otherwise very good.
- Some have constraints (e.g. XSD, JSON), but suffer patchy tool support.
- For example, Microsoft's xsd.exe actively ignores constraints in xsd files (MS's excuse is truly feeble). XSD is good (from the constraints point of view), but if you cannot trust the other guy to use a good XSD tool that enforces them for him/her then the worth of XSD is diminished
- JSON validators are ok, but they do nothing to help you form the JSON in the first place, and aren't automatically called. There's no guarantee that someone sending you JSON message have run it through a validator. You have to remember to validate it yourself.
- ASN.1 tools all seem to implement the constraints checking.

So for me, ASN.1 does it. It's the one that is least likely to result in someone else making a mistake, because it's the one with the right features and where the tools all seemingly endeavour to fully implement those features, and it is language neutral enough for most purposes.

To be honest, if GPB added a constraints mechanism that'd be the winner. XSD is close but the tools are almost universally rubbish. If there were decent code generators of other languages, JSON schema would be pretty good.

If GPB had constraints added (note: this would not change any of the wire formats), that'd be the one I'd recommend to everyone for almost every purpose. Though ASN.1's uPER is very useful for radio links.

回复收藏 0 原文

喵星人汪星人 2024-08-02 14:28:05

我从未使用 boost 的库实现过任何东西，但我发现 Google protobuff 的库更加深思熟虑，而且代码更干净、更容易阅读。我建议您查看一下您想要使用它的各种语言，并阅读代码和文档并做出决定。

我在使用 protobufs 时遇到的一个困难是它们在生成的代码中命名了一个非常常用的函数 GetMessage()，这当然与 Win32 GetMessage 宏冲突。

我仍然强烈推荐 protobufs。它们非常有用。

回复收藏 0 原文

微凉徒眸意 2024-08-02 14:27:47

对上述内容的更正（猜测这是关于 Boost 的那个答案）序列化：

它确实允许支持数据版本控制。

如果您需要压缩 - 使用压缩流。

可以处理平台之间的字节序交换，因为编码可以是文本、二进制或 XML。

回复收藏 0 原文

枯叶蝶 2024-08-02 14:27:28

boost.serialization只需要C++编译器，并为您提供一些语法糖，例如

serialize_obj >> archive;
// ...
unserialize_obj << archive;

用于保存和加载的语法糖。如果 C++ 是您使用的唯一语言，您应该认真考虑一下 boost.serialization。

我快速浏览了一下谷歌协议缓冲区。据我所知，我想说它不能直接与 boost.serialization 相比较。您必须将 .proto 文件的编译器添加到工具链中并维护 .proto 文件本身。该 API 不像 boost.serialization 那样集成到 C++ 中。

boost.serialization 很好地完成了它的设计任务：序列化 C++ 对象:)
OTOH 像 google protocol buffers 这样的查询 API 为您提供了更大的灵活性。

由于到目前为止我只使用了 boost.serialization，因此我无法评论性能比较。

boost.serialization just needs the C++ compiler and gives you some syntax sugar like

serialize_obj >> archive;
// ...
unserialize_obj << archive;

for saving and loading. If C++ is the only language you use you should give boost.serialization a serious shot.

I took a fast look at google protocol buffers. From what I see I'd say its not directly comparable to boost.serialization. You have to add a compiler for the .proto files to your toolchain and maintain the .proto files itself. The API doesn't integrate into C++ as boost.serialization does.

boost.serialization does the job its designed for very well: to serialize C++ objects :)
OTOH an query-API like google protocol buffers has gives you more flexibility.

Since I only used boost.serialization so far I cannot comment on performance comparison.

回复收藏 0 原文

長街聽風 2024-08-02 14:27:09

我没有 boost 序列化的经验，但我使用过协议缓冲区。我非常喜欢协议缓冲区。请记住以下几点（我在不了解 boost 的情况下这么说）。

Protocol buffers 非常高效，所以我不认为这与 boost 相比是一个严重的问题。
协议缓冲区提供了与其他语言（Python 和 Java...以及更多正在开发中的语言）一起使用的中间表示。如果您知道自己只使用 C++，也许 boost 更好，但使用其他语言的选项也不错。
协议缓冲区更像是数据容器......没有面向对象的性质，例如继承。考虑一下您想要序列化的内容的结构。
协议缓冲区非常灵活，因为您可以添加“可选”字段。这基本上意味着您可以在不破坏兼容性的情况下更改协议缓冲区的结构。

希望这可以帮助。

回复收藏 0 原文

断肠人 2024-08-02 14:26:49

Boost Serialization

是一个用于将数据写入流的库。
不压缩数据。
不支持自动数据版本控制。
支持STL容器。
写入数据的属性取决于所选的流（例如字节序、压缩）。

Protocol Buffers

根据接口描述生成代码（默认支持 C++、Python 和 Java。C、C# 和其他第三方支持）。
可选择压缩数据。
自动处理数据版本控制。
处理平台之间的字节序交换。
不支持STL容器。

Boost 序列化是一个用于将对象转换为序列化数据流的库。 Protocol Buffers 做同样的事情，但也为你做其他工作（比如版本控制和字节序交换）。对于“小型简单任务”来说，Boost 序列化更简单。协议缓冲区可能更适合“更大的基础设施”。

编辑：24-11-10：添加“自动”到 BS 版本控制。

回复收藏 0 原文

红墙和绿瓦 2024-08-02 14:26:27

对于 boost.serialization 还有一些额外的问题，我将添加到其中。警告：除了浏览文档之外，我对协议缓冲区没有任何直接的经验。

请注意，虽然我认为 boost 和 boost.serialization 非常擅长它的功能，但我得出的结论是，它附带的默认存档格式对于有线格式来说并不是一个很好的选择。

区分类的版本（如其他答案中提到的，boost.serialization对数据版本控制有一些支持）以及不同版本的序列化库之间的兼容性非常重要。

较新版本的 boost.serialization 可能不会生成旧版本可以反序列化的存档。（反之则不然：新版本始终旨在反序列化旧版本生成的存档）。这给我们带来了以下问题：

我们的客户和我们的客户。服务器软件创建其他消耗的序列化对象，因此如果我们同步升级客户端和服务器，我们只能转向更新的 boost.serialization。（在您无法完全控制客户的环境中，这是一个相当大的挑战）。
Boost 捆绑为一个具有共享部分的大型库，并且序列化代码和 boost 库的其他部分（例如共享指针）可能在同一个文件中使用，我无法升级任何 boost 的一部分，因为我无法升级 boost.serialization。我不确定尝试将多个版本的 boost 链接到单个可执行文件是否可能/安全/理智，或者我们是否有预算/精力将需要保留在旧版本 boost 上的位重构为单独的可执行文件可执行文件（在我们的例子中为 DLL）。
我们所使用的旧版本的 boost 不支持我们使用的最新版本的编译器，因此我们也使用旧版本的编译器。

Google 似乎实际上发布了协议缓冲区有线格式，并且 Wikipedia 对其进行了描述作为向前兼容、向后兼容（尽管我认为维基百科指的是数据版本控制而不是协议缓冲区库版本控制）。虽然这些都不能保证向前兼容，但对我来说这似乎是一个更强有力的指示。

总之，当我没有能力升级客户端和客户端时，我更喜欢使用众所周知的、已发布的有线格式，例如协议缓冲区。服务器同步。

脚注：我的相关答案的无耻插件。

There are a couple of additional concerns with boost.serialization that I'll add to the mix. Caveat: I don't have any direct experience with protocol buffers beyond skimming the docs.

Note that while I think boost, and boost.serialization, is great at what it does, I have come to the conclusion that the default archive formats it comes with are not a great choice for a wire format.

It's important to distinguish between versions of your class (as mentioned in other answers, boost.serialization has some support for data versioning) and compatibility between different versions of the serialization library.

Newer versions of boost.serialization may not generate archives that older versions can deserialize. (the reverse is not true: newer versions are always intended to deserialize archives made by older versions). This has led to the following problems for us:

Both our client & server software create serialized objects that the other consumes, so we can only move to a newer boost.serialization if we upgrade both client and server in lockstep. (This is quite a challenge in an environment where you don't have full control of your clients).
Boost comes bundled as one big library with shared parts, and both the serialization code and the other parts of the boost library (e.g. shared_ptr) may be in use in the same file, I can't upgrade any parts of boost because I can't upgrade boost.serialization. I'm not sure if it's possible/safe/sane to attempt to link multiple versions of boost into a single executable, or if we have the budget/energy to refactor out bits that need to remain on an older version of boost into a separate executable (DLL in our case).
The old version of boost we're stuck on doesn't support the latest version of the compiler we use, so we're stuck on an old version of the compiler too.

Google seem to actually publish the protocol buffers wire format, and Wikipedia describes them as forwards-compatible, backwards-compatible (although I think Wikipedia is referring to data versioning rather than protocol buffer library versioning). Whilst neither of these is a guarantee of forwards-compatibility, it seems like a stronger indication to me.

In summary, I would prefer a well-known, published wire format like protocol buffers when I don't have the ability to upgrade client & server in lockstep.

Footnote: shameless plug for a related answer by me.

回复收藏 0 原文

烟织青萝梦 2024-08-02 14:26:02

我对这两个系统都进行了一些尝试，没什么严重的，只是一些简单的黑客的东西，但我觉得你应该如何使用这些库有一个真正的区别。

使用 boost::serialization，您首先编写自己的结构/类，然后添加归档方法，但您仍然留下一些相当“精简”的类，它们可以用作数据成员、继承等。

使用协议缓冲区，即使是简单的结构生成的代码量也相当可观，并且生成的结构和代码更适合操作，并且您可以使用协议缓冲区的功能将数据传输到您自己的内部结构或从您自己的内部结构传输数据。

回复收藏 0 原文

北恋 2024-08-02 14:25:39

我已经使用 Boost Serialization 很长时间了，只是深入研究了协议缓冲区，我认为它们没有完全相同的目的。 BS（没有看到这一点）将您的 C++ 对象保存到流中，而 PB 是您读取/读取的交换格式。

PB 的数据模型要简单得多：您可以获得各种整数和浮点数、字符串、数组、基本结构，仅此而已。 BS 允许您直接一步保存所有对象。

这意味着使用 BS，您可以在线获取更多数据，但不必重建所有对象结构，而协议缓冲区更紧凑，但在读取存档后还有更多工作要做。顾名思义，一个用于协议（与语言无关、节省空间的数据传递），另一个用于序列化（轻松保存对象）。

那么对您来说哪个更重要：速度/空间效率还是干净的代码？

回复收藏 0 原文

提升序列化与谷歌协议缓冲区？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（11）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

提升序列化与谷歌协议缓冲区？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（11）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。