如何在Protobuf中实现VARIANT
作为 protobuf 协议的一部分,我需要能够发送动态类型的数据,有点像 VARIANT。粗略地说,我要求数据是整数、字符串、布尔值或“其他”,其中“其他”(例如 DateTime
)被序列化为字符串。我需要能够将它们用作单个字段并在协议中多个不同位置的列表中使用。
如何才能最好地实现这一点,同时保持消息大小最小和性能最佳?
我正在使用带有 C# 的 protobuf-net。
编辑:
我在下面发布了一个建议的答案,其中使用了我认为所需的最小内存。
编辑2:
在 http://github.com/pvginkel/ProtoVariant 创建了一个 github.com 项目并进行了完整的实现。
As part of my protobuf protocol I require the ability to send data of a dynamic type, a little bit like VARIANT. Roughly I require the data to be an integer, string, boolean or "other" where "other" (e.g. DateTime
) is serialized as a string. I need to be able to use these as a single field and in lists in a number of different locations in the protocol.
How can this best be implemented while keeping message size minimal and performance optimal?
I'm using protobuf-net with C#.
EDIT:
I've posted a proposed answer below which uses what I think is the minimum of memory required.
EDIT2:
Created a github.com project at http://github.com/pvginkel/ProtoVariant with a complete implementation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
Jon 的多个选项涵盖了最简单的设置,特别是如果您需要跨平台支持。在 .NET 方面(以确保您不会序列化不必要的值),只需从任何不匹配的属性返回
null
,例如:您还可以使用
bool ShouldSerialize*()
模式(如果您不喜欢空值)。将其包装在
class
中,您应该可以在字段级别或列表级别使用它。您提到最佳性能;我可以建议的唯一额外的事情是也许考虑将其视为“组”而不是“子消息”,因为这更容易编码(并且只要您期望数据,就同样容易解码)。为此,请通过[ProtoMember]
使用Grouped
数据格式,即,但是,这里的差异可能很小 - 但它避免了输出中的一些回溯流来固定长度。无论哪种方式,就开销而言,“子消息”将至少占用 2 个字节; “至少一个”用于字段标头(如果
12
实际上是1234567
,则可能需要更多) - “至少一个”用于长度,长度会变得更大更长的消息。一个组占用 2 x 字段标头,因此如果您使用低字段编号,则无论封装数据的长度如何,这都将是 2 个字节(可能是 5MB 的二进制)。一个单独的技巧,对于更复杂的场景很有用,但不具有互操作性,是泛型继承,即列出了
ConcreteType
、ConcreteType
等的抽象基类作为子类型 - 然而,这需要额外的 2 个字节(通常),所以不是那么节俭。与核心规范相距又一步,如果您真正无法说出您需要支持哪些类型,并且不需要互操作性 - 有一些支持在数据中包含(优化的)类型信息;请参阅
ProtoMember
上的DynamicType
选项 - 这比其他两个选项占用更多空间。Jon's multiple optionals covers the simplest setup, especially if you need cross-platform support. On the .NET side (to ensure you don't serialize unnecessary values), simply return
null
from any property that isn't a match, for example:You can also do the same using the
bool ShouldSerialize*()
pattern if you don't like the nulls.Wrap that up in a
class
and you should be fine to use that at either the field level or list level. You mention optimal performance; the only additional thing I can suggest there is to perhaps consider treating as a "group" rather than "submessage", as this is easier to encode (and just as easy to decode, as long as you expect the data). To do that, use theGrouped
data-format, via[ProtoMember]
, i.e.However, the difference here can be minimal - but it avoids some back-tracking in the output stream to fix the lengths. Either way, in terms of overheads a "submessage" will take at least 2 bytes; "at least one" for the field-header (perhaps taking more if the
12
is actually1234567
) - and "at least one" for the length, which gets bigger for longer messages. A group takes 2 x the field-header, so if you use low field-numbers this will be 2 bytes regardless of the length of the encapsulated data (it could be 5MB of binary).A separate trick, useful for more complex scenarios but not as interoperable, is generic inheritance, i.e. an abstract base class that has
ConcreteType<int>
,ConcreteType<string>
etc listed as subtypes - this, however, takes an extra 2 bytes (typically), so is not as frugal.Taking another step further away from the core spec, if you genuinely can't tell what types you need to support, and don't need interoperability - there is some support for including (optimized) type information in the data; see the
DynamicType
option onProtoMember
- this takes more space than the other two options.您可能会看到这样的消息:
然后编写一个辅助类 - 以及可能的扩展方法 - 以确保您只在变体中设置一个字段。
您可以选择包含一个单独的枚举值来指定设置哪个字段(以使其更像标记联合),但检查可选字段的能力仅意味着数据已经存在。这取决于您是否想要找到正确字段的速度(在这种情况下添加鉴别器)或仅包括数据本身的空间效率(在这种情况下不添加鉴别器)。
这是一种通用 Protocol Buffer 方法。当然,可能还有更多 protobuf-net 特定的东西。
You could have a message like this:
Then write a helper class - and possibly extension methods - to ensure that you only ever set one field in the variant.
You could optionally include a separate enum value to specify which field is set (to make it more like a tagged union) but the ability to check the optional fields just means the data is already there. It depends on whether you want the speed of finding the right field (in which case add the discriminator) or the space efficiency of only including the data itself (in which case don't add the discriminator).
That's a general Protocol Buffer approach. There may be something more protobuf-net specific, of course.
提出问题总是能帮助我思考。我找到了一种方法可以将用于传输的字节数降至最低。
我在这里所做的是利用可选属性。假设我想发送一个 int32。当该值不为零时,我可以检查消息上的属性是否有值。否则,我将类型设置为 INT32_ZERO。这样我就可以正确存储和重建该值。下面的示例对多种类型都有此实现。
.proto 文件:
以及随附的部分 .cs 文件:
编辑:
@Marc Gravell 的进一步评论显着改进了实施。有关此概念的完整实现,请参阅 Git 存储库。
Asking questions always helps me think. I found a way to get the number of bytes used for transfer to a bare minimum.
What I've done here is make use of optional properties. Say I want to send an int32. When the value isn't zero, I can just check a property on the message for whether it has a value. Otherwise, I set a type to INT32_ZERO. This way I can correctly store and reconstruct the value. The example below has this implementation for a number of types.
The .proto file:
And accompanying partial .cs file:
EDIT:
Further comments from @Marc Gravell have improved the implementation significantly. See the Git repository for a complete implementation of this concept.
实际上 protobuf 不支持任何类型的 VARIANT 类型。
您可以尝试使用 Unions,请此处查看更多详细信息
主要思想是定义消息包装器,将所有现有消息类型作为可选字段,并通过使用union来指定该具体消息的类型。
通过上面的链接查看示例。
Actually protobuf doesn't support any kind of
VARIANT
types.You can try to play around using Unions, see more details here
The main idea is to define message wrapper with all existing message types as optional field, and by using
union
just specify which type of this concrete message it is.See example by following the link above.
我将 ProtoInclude 与抽象基类型和子类一起使用来获取静态设置的类型和单个值。下面是 Variant 的开始:
用法:
这个答案需要更多的空间,因为它对 ProtoIninclude 类实例的长度进行编码(例如,int 为 1 字节,小于 125 字节的字符串)。为了静态控制类型的好处,我愿意接受这一点。
I use ProtoInclude with an abstract base type and subclasses to get the type and single value statically set. Here's the start of what that could look like for Variant:
Usage:
This answer gives takes a bit more space as it encodes the length of the ProtoInclude'd class instance (e.g. 1 byte for int and under < 125 byte strings). I am willing to live with this for the benefit of controlling the type statically.