如何在Protobuf中实现VARIANT

发布于 2024-11-17 19:46:36 字数 566 浏览 2 评论 0原文

作为 protobuf 协议的一部分,我需要能够发送动态类型的数据,有点像 VARIANT。粗略地说,我要求数据是整数、字符串、布尔值或“其他”,其中“其他”(例如 DateTime)被序列化为字符串。我需要能够将它们用作单个字段并在协议中多个不同位置的列表中使用。

如何才能最好地实现这一点,同时保持消息大小最小和性能最佳?

我正在使用带有 C# 的 protobuf-net。

编辑:
我在下面发布了一个建议的答案,其中使用了我认为所需的最小内存。

编辑2:
http://github.com/pvginkel/ProtoVariant 创建了一个 github.com 项目并进行了完整的实现。

As part of my protobuf protocol I require the ability to send data of a dynamic type, a little bit like VARIANT. Roughly I require the data to be an integer, string, boolean or "other" where "other" (e.g. DateTime) is serialized as a string. I need to be able to use these as a single field and in lists in a number of different locations in the protocol.

How can this best be implemented while keeping message size minimal and performance optimal?

I'm using protobuf-net with C#.

EDIT:
I've posted a proposed answer below which uses what I think is the minimum of memory required.

EDIT2:
Created a github.com project at http://github.com/pvginkel/ProtoVariant with a complete implementation.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

混浊又暗下来 2024-11-24 19:46:36

Jon 的多个选项涵盖了最简单的设置,特别是如果您需要跨平台支持。在 .NET 方面(以确保您不会序列化不必要的值),只需从任何不匹配的属性返回 null,例如:

public object Value { get;set;}
[ProtoMember(1)]
public int? ValueInt32 {
    get { return (Value is int) ? (int)Value : (int?)null; }
    set { Value = value; }
}
[ProtoMember(2)]
public string ValueString {
    get { return (Value is string) ? (string)Value : null; }
    set { Value = value; }
}
// etc

您还可以使用 bool ShouldSerialize*() 模式(如果您不喜欢空值)。

将其包装在 class 中,您应该可以在字段级别或列表级别使用它。您提到最佳性能;我可以建议的唯一额外的事情是也许考虑将其视为“组”而不是“子消息”,因为这更容易编码(并且只要您期望数据,就同样容易解码)。为此,请通过 [ProtoMember] 使用 Grouped 数据格式,即,

[ProtoMember(12, DataFormat = DataFormat.Group)]
public MyVariant Foo {get;set;}

但是,这里的差异可能很小 - 但它避免了输出中的一些回溯流来固定长度。无论哪种方式,就开销而言,“子消息”将至少占用 2 个字节; “至少一个”用于字段标头(如果 12 实际上是 1234567,则可能需要更多) - “至少一个”用于长度,长度会变得更大更长的消息。一个组占用 2 x 字段标头,因此如果您使用低字段编号,则无论封装数据的长度如何,这都将是 2 个字节(可能是 5MB 的二进制)。

一个单独的技巧,对于更复杂的场景很有用,但不具有互操作性,是泛型继承,即列出了 ConcreteTypeConcreteType 等的抽象基类作为子类型 - 然而,这需要额外的 2 个字节(通常),所以不是那么节俭。

与核心规范相距一步,如果您真正无法说出您需要支持哪些类型,并且不需要互操作性 - 有一些支持在数据中包含(优化的)类型信息;请参阅 ProtoMember 上的 DynamicType 选项 - 这比其他两个选项占用更多空间。

Jon's multiple optionals covers the simplest setup, especially if you need cross-platform support. On the .NET side (to ensure you don't serialize unnecessary values), simply return null from any property that isn't a match, for example:

public object Value { get;set;}
[ProtoMember(1)]
public int? ValueInt32 {
    get { return (Value is int) ? (int)Value : (int?)null; }
    set { Value = value; }
}
[ProtoMember(2)]
public string ValueString {
    get { return (Value is string) ? (string)Value : null; }
    set { Value = value; }
}
// etc

You can also do the same using the bool ShouldSerialize*() pattern if you don't like the nulls.

Wrap that up in a class and you should be fine to use that at either the field level or list level. You mention optimal performance; the only additional thing I can suggest there is to perhaps consider treating as a "group" rather than "submessage", as this is easier to encode (and just as easy to decode, as long as you expect the data). To do that, use the Grouped data-format, via [ProtoMember], i.e.

[ProtoMember(12, DataFormat = DataFormat.Group)]
public MyVariant Foo {get;set;}

However, the difference here can be minimal - but it avoids some back-tracking in the output stream to fix the lengths. Either way, in terms of overheads a "submessage" will take at least 2 bytes; "at least one" for the field-header (perhaps taking more if the 12 is actually 1234567) - and "at least one" for the length, which gets bigger for longer messages. A group takes 2 x the field-header, so if you use low field-numbers this will be 2 bytes regardless of the length of the encapsulated data (it could be 5MB of binary).

A separate trick, useful for more complex scenarios but not as interoperable, is generic inheritance, i.e. an abstract base class that has ConcreteType<int>, ConcreteType<string> etc listed as subtypes - this, however, takes an extra 2 bytes (typically), so is not as frugal.

Taking another step further away from the core spec, if you genuinely can't tell what types you need to support, and don't need interoperability - there is some support for including (optimized) type information in the data; see the DynamicType option on ProtoMember - this takes more space than the other two options.

红玫瑰 2024-11-24 19:46:36

您可能会看到这样的消息:

message Variant {
    optional string string_value = 1;
    optional int32 int32_value = 2;
    optional int64 int64_value = 3;
    optional string other_value = 4;
    // etc
}

然后编写一个辅助类 - 以及可能的扩展方法 - 以确保您只在变体中设置一个字段。

您可以选择包含一个单独的枚举值来指定设置哪个字段(以使其更像标记联合),但检查可选字段的能力仅意味着数据已经存在。这取决于您是否想要找到正确字段的速度(在这种情况下添加鉴别器)或包括数据本身的空间效率(在这种情况下不添加鉴别器)。

这是一种通用 Protocol Buffer 方法。当然,可能还有更多 protobuf-net 特定的东西。

You could have a message like this:

message Variant {
    optional string string_value = 1;
    optional int32 int32_value = 2;
    optional int64 int64_value = 3;
    optional string other_value = 4;
    // etc
}

Then write a helper class - and possibly extension methods - to ensure that you only ever set one field in the variant.

You could optionally include a separate enum value to specify which field is set (to make it more like a tagged union) but the ability to check the optional fields just means the data is already there. It depends on whether you want the speed of finding the right field (in which case add the discriminator) or the space efficiency of only including the data itself (in which case don't add the discriminator).

That's a general Protocol Buffer approach. There may be something more protobuf-net specific, of course.

智商已欠费 2024-11-24 19:46:36

提出问题总是能帮助我思考。我找到了一种方法可以将用于传输的字节数降至最低。

我在这里所做的是利用可选属性。假设我想发送一个 int32。当该值不为零时,我可以检查消息上的属性是否有值。否则,我将类型设置为 INT32_ZERO。这样我就可以正确存储和重建该值。下面的示例对多种类型都有此实现。

.proto 文件:

message Variant {
    optional VariantType type = 1 [default = AUTO];
    optional int32 value_int32 = 2;
    optional int64 value_int64 = 3;
    optional float value_float = 4;
    optional double value_double = 5;
    optional string value_string = 6;
    optional bytes value_bytes = 7;
    optional string value_decimal = 8;
    optional string value_datetime = 9;
}

enum VariantType {
    AUTO = 0;
    BOOL_FALSE = 1;
    BOOL_TRUE = 2;
    INT32_ZERO = 3;
    INT64_ZERO = 4;
    FLOAT_ZERO = 5;
    DOUBLE_ZERO = 6;
    NULL = 7;
}

以及随附的部分 .cs 文件:

using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;

namespace ConsoleApplication6
{
    partial class Variant
    {
        public static Variant Create(object value)
        {
            var result = new Variant();

            if (value == null)
                result.Type = VariantType.NULL;
            else if (value is string)
                result.ValueString = (string)value;
            else if (value is byte[])
                result.ValueBytes = (byte[])value;
            else if (value is bool)
                result.Type = (bool)value ? VariantType.BOOLTRUE : VariantType.BOOLFALSE;
            else if (value is float)
            {
                if ((float)value == 0f)
                    result.Type = VariantType.FLOATZERO;
                else
                    result.ValueFloat = (float)value;
            }
            else if (value is double)
            {
                if ((double)value == 0d)
                    result.Type = VariantType.DOUBLEZERO;
                else
                    result.ValueDouble = (double)value;
            }
            else if (value is decimal)
                result.ValueDecimal = ((decimal)value).ToString("r", CultureInfo.InvariantCulture);
            else if (value is DateTime)
                result.ValueDatetime = ((DateTime)value).ToString("o", CultureInfo.InvariantCulture);
            else
                throw new ArgumentException(String.Format("Cannot store data type {0} in Variant", value.GetType().FullName), "value");

            return result;
        }

        public object Value
        {
            get
            {
                switch (Type)
                {
                    case VariantType.BOOLFALSE:
                        return false;

                    case VariantType.BOOLTRUE:
                        return true;

                    case VariantType.NULL:
                        return null;

                    case VariantType.DOUBLEZERO:
                        return 0d;

                    case VariantType.FLOATZERO:
                        return 0f;

                    case VariantType.INT32ZERO:
                        return 0;

                    case VariantType.INT64ZERO:
                        return (long)0;

                    default:
                        if (ValueInt32 != 0)
                            return ValueInt32;
                        if (ValueInt64 != 0)
                            return ValueInt64;
                        if (ValueFloat != 0f)
                            return ValueFloat;
                        if (ValueDouble != 0d)
                            return ValueDouble;
                        if (ValueString != null)
                            return ValueString;
                        if (ValueBytes != null)
                            return ValueBytes;
                        if (ValueDecimal != null)
                            return Decimal.Parse(ValueDecimal, CultureInfo.InvariantCulture);
                        if (ValueDatetime != null)
                            return DateTime.Parse(ValueDatetime, CultureInfo.InvariantCulture);
                        return null;
                }
            }
        }
    }
}

编辑:
@Marc Gravell 的进一步评论显着改进了实施。有关此概念的完整实现,请参阅 Git 存储库。

Asking questions always helps me think. I found a way to get the number of bytes used for transfer to a bare minimum.

What I've done here is make use of optional properties. Say I want to send an int32. When the value isn't zero, I can just check a property on the message for whether it has a value. Otherwise, I set a type to INT32_ZERO. This way I can correctly store and reconstruct the value. The example below has this implementation for a number of types.

The .proto file:

message Variant {
    optional VariantType type = 1 [default = AUTO];
    optional int32 value_int32 = 2;
    optional int64 value_int64 = 3;
    optional float value_float = 4;
    optional double value_double = 5;
    optional string value_string = 6;
    optional bytes value_bytes = 7;
    optional string value_decimal = 8;
    optional string value_datetime = 9;
}

enum VariantType {
    AUTO = 0;
    BOOL_FALSE = 1;
    BOOL_TRUE = 2;
    INT32_ZERO = 3;
    INT64_ZERO = 4;
    FLOAT_ZERO = 5;
    DOUBLE_ZERO = 6;
    NULL = 7;
}

And accompanying partial .cs file:

using System;
using System.Collections.Generic;
using System.Text;
using System.Globalization;

namespace ConsoleApplication6
{
    partial class Variant
    {
        public static Variant Create(object value)
        {
            var result = new Variant();

            if (value == null)
                result.Type = VariantType.NULL;
            else if (value is string)
                result.ValueString = (string)value;
            else if (value is byte[])
                result.ValueBytes = (byte[])value;
            else if (value is bool)
                result.Type = (bool)value ? VariantType.BOOLTRUE : VariantType.BOOLFALSE;
            else if (value is float)
            {
                if ((float)value == 0f)
                    result.Type = VariantType.FLOATZERO;
                else
                    result.ValueFloat = (float)value;
            }
            else if (value is double)
            {
                if ((double)value == 0d)
                    result.Type = VariantType.DOUBLEZERO;
                else
                    result.ValueDouble = (double)value;
            }
            else if (value is decimal)
                result.ValueDecimal = ((decimal)value).ToString("r", CultureInfo.InvariantCulture);
            else if (value is DateTime)
                result.ValueDatetime = ((DateTime)value).ToString("o", CultureInfo.InvariantCulture);
            else
                throw new ArgumentException(String.Format("Cannot store data type {0} in Variant", value.GetType().FullName), "value");

            return result;
        }

        public object Value
        {
            get
            {
                switch (Type)
                {
                    case VariantType.BOOLFALSE:
                        return false;

                    case VariantType.BOOLTRUE:
                        return true;

                    case VariantType.NULL:
                        return null;

                    case VariantType.DOUBLEZERO:
                        return 0d;

                    case VariantType.FLOATZERO:
                        return 0f;

                    case VariantType.INT32ZERO:
                        return 0;

                    case VariantType.INT64ZERO:
                        return (long)0;

                    default:
                        if (ValueInt32 != 0)
                            return ValueInt32;
                        if (ValueInt64 != 0)
                            return ValueInt64;
                        if (ValueFloat != 0f)
                            return ValueFloat;
                        if (ValueDouble != 0d)
                            return ValueDouble;
                        if (ValueString != null)
                            return ValueString;
                        if (ValueBytes != null)
                            return ValueBytes;
                        if (ValueDecimal != null)
                            return Decimal.Parse(ValueDecimal, CultureInfo.InvariantCulture);
                        if (ValueDatetime != null)
                            return DateTime.Parse(ValueDatetime, CultureInfo.InvariantCulture);
                        return null;
                }
            }
        }
    }
}

EDIT:
Further comments from @Marc Gravell have improved the implementation significantly. See the Git repository for a complete implementation of this concept.

爺獨霸怡葒院 2024-11-24 19:46:36

实际上 protobuf 不支持任何类型的 VARIANT 类型。
您可以尝试使用 Unions,请此处查看更多详细信息
主要思想是定义消息包装器,将所有现有消息类型作为可选字段,并通过使用union来指定该具体消息的类型。
通过上面的链接查看示例。

Actually protobuf doesn't support any kind of VARIANT types.
You can try to play around using Unions, see more details here
The main idea is to define message wrapper with all existing message types as optional field, and by using union just specify which type of this concrete message it is.
See example by following the link above.

走走停停 2024-11-24 19:46:36

我将 ProtoInclude 与抽象基类型和子类一起使用来获取静态设置的类型和单个值。下面是 Variant 的开始:

[ProtoContract]
[ProtoInclude(1, typeof(Integer))]
[ProtoInclude(2, typeof(String))]
public abstract class Variant
{
    [ProtoContract]
    public sealed class Integer
    {
        [ProtoMember(1)]
        public int Value;
    }

    [ProtoContract]
    public sealed class String
    {
        [ProtoMember(1)]
        public string Value;
    }
}

用法:

var foo = new Variant.String { Value = "Bar" };
var baz = new Variant.Integer { Value = 10 };

这个答案需要更多的空间,因为它对 ProtoIninclude 类实例的长度进行编码(例如,int 为 1 字节,小于 125 字节的字符串)。为了静态控制类型的好处,我愿意接受这一点。

I use ProtoInclude with an abstract base type and subclasses to get the type and single value statically set. Here's the start of what that could look like for Variant:

[ProtoContract]
[ProtoInclude(1, typeof(Integer))]
[ProtoInclude(2, typeof(String))]
public abstract class Variant
{
    [ProtoContract]
    public sealed class Integer
    {
        [ProtoMember(1)]
        public int Value;
    }

    [ProtoContract]
    public sealed class String
    {
        [ProtoMember(1)]
        public string Value;
    }
}

Usage:

var foo = new Variant.String { Value = "Bar" };
var baz = new Variant.Integer { Value = 10 };

This answer gives takes a bit more space as it encodes the length of the ProtoInclude'd class instance (e.g. 1 byte for int and under < 125 byte strings). I am willing to live with this for the benefit of controlling the type statically.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文