是“无重复的序列化”吗？在 c++0x 中可能吗？

发布于 2024-12-01 03:31:07 字数 1804 浏览 9 评论 0原文

C++ 中代码生成的一大用途是支持消息序列化。通常，您希望支持在同一步骤中指定消息内容和布局，并为该消息类型生成代码，以便为您提供能够序列化到通信流或从通信流序列化的对象。在过去，这通常会导致代码看起来像这样：

class MyMessage : public SerialisableObject
{
  // message members
  int myNumber_;
  std::string myString_;
  std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;

public:
  // ctor, dtor, accesors, mutators, then:

  virtual void Serialise(SerialisationStream & stream)
  {
    stream & myNumber_;
    stream & myString_;
    stream & aBunchOfThingsIWantToSerialise_;
  }
};

使用这种设计的问题是违反了良好架构的一个重要规则：您不必两次指定设计的意图。意图的重复（如重复的代码和其他常见的开发重复）会导致代码中的某个位置与另一个位置出现分歧，从而导致错误。

在上面，重复的是成员列表。潜在的错误包括将成员添加到类中，但忘记将其添加到序列化列表中，将成员序列化两次（可能是由于未使用与成员声明相同的顺序，或者可能是由于类似成员的拼写错误等原因），或序列化不是成员的内容（这可能会产生编译器错误，除非名称查找在与匹配查找规则的对象不同的范围内找到某些内容）。这种错误与我们不再尝试将每个堆分配与删除（而不是使用智能指针）或文件打开与关闭（使用 RAII ctor//dtor 机制）相匹配的原因相同 - 我们不希望在多个地方匹配我们的意图，因为有时我们 - 或其他不太熟悉意图的工程师 - 会犯错误。

因此，一般来说，这是代码生成可以处理的事情之一。您可以创建一个文件 MyMessage.cg，以在一个步骤中指定布局和成员

serialisable MyMessage
{
  int myNumber_;
  std::string myString_;
  std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
};

，该文件将通过代码生成实用程序运行并生成代码。

我想知道是否有可能在没有外部代码生成的情况下在 c++0x 中执行此操作。是否有任何新的语言机制可以将类指定为可序列化一次，并且其成员的名称和布局用于在序列化期间布局消息？

需要明确的是，我知道即使在 c++0x 之前的语言中，boost 元组和融合也有一些技巧可以接近这种行为。然而，这些用法基于元组索引而不是按成员名称访问，因此对于更改布局来说都很脆弱，因为代码中访问消息的其他位置也需要重新排序。为了不必在使用消息的代码中重复布局规范，需要某种类型的按成员名称访问。

另外，我知道将其提升到下一个级别并要求指定何时不应序列化某些成员可能会很好。其他提供内置序列化的语言通常会提供某种属性来执行此操作，因此 int myNonSerializedNumber_ [[noserialise]]; 可能看起来很自然。但是，我个人认为拥有可序列化对象（其中所有内容都未序列化）是糟糕的设计，因为消息的生命周期是在往返于通信层的传输中，与其他数据生命周期分开。另外，您可以拥有一个对象，该对象的成员具有纯粹可序列化的功能，因此该语言尚未提供此类功能。

这可能吗？或者标准委员会是否遗漏了这种内省能力？我不需要它看起来像上面的代码生成文件 - 任何用于一步中布局和成员的编译时规范的简单方法都可以解决这个常见问题。

原文

One of the big uses of code generation in c++ is to support message serialisation. Typically, you want to support specifying message contents and layout in the same step and produce code for that message type that can give you objects capable of being serialised to/from communication streams. In the past, this has usually resulted in code that looks like:

class MyMessage : public SerialisableObject
{
  // message members
  int myNumber_;
  std::string myString_;
  std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;

public:
  // ctor, dtor, accesors, mutators, then:

  virtual void Serialise(SerialisationStream & stream)
  {
    stream & myNumber_;
    stream & myString_;
    stream & aBunchOfThingsIWantToSerialise_;
  }
};

The problem with using this kind of design is that violates an important rule of good architecture: you should not have to specify the intent of a design twice. Duplication of intent, like duplicated code and other common development duplication, leaves room for one place in the code to become divergent with the other, causing errors.

In the above, the duplication is the list of members. Potential errors include adding a member to the class but forgetting to add it to the serialisation list, serialising a member twice (possibly by not using the same order as the member declaration or possibly due to a misspelling of a similar member, among other ways), or serialising something that is not a member (which might produce a compiler error, unless name lookup finds something at a different scope than the object that matches lookup rules). That kind of mistake is the same reason we no longer try to match every heap allocation with a delete (instead using smart pointers) or ever file open with a close (using RAII ctor//dtor mechanisms) - we don't want to have to match up our intent in multiple places because there are times we - or another engineer less familiar with the intent - make mistakes.

Generally, therefore, this has been one of the things that code generation could take care of. You might create a file MyMessage.cg to specify both layout and members in one step

serialisable MyMessage
{
  int myNumber_;
  std::string myString_;
  std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
};

that would be run through a code generation utility and produce the code.

I was wondering if it was possible yet to do this in c++0x without external code generation. Are there any new language mechanisms that make it possible to specify a class as serialisable once, and the names and layout of it's members are used to layout the message during serialisation?

To be clear, I know that there are tricks with boost tuples and fusion that can come close to this kind of behavior even in the pre-c++0x language. Those usages, though, being based on indexing into the tuple rather than by-member-name access, have all been brittle to changing the layout, as other places in the code that access the messages would then also need to be reordered. Some kind of by-member-name access is necessary to not have to duplicate the layout specification in places in the code that use the messages.

Also, I know it might be nice to take this up to the next level and ask for specifying when some of the members shouldn't be serialised. Other languages that offer serialisation built in often offer some kind of attribute to do this, so
int myNonSerialisedNumber_ [[noserialise]];
might seem natural. However, I personally think it is bad design to have serialisable objects where everything is not serialised, since the lifetime of messages is in the transport to/from the communications layer, separate from other data lifetimes. Also, you could have an object which has a purely serialisable as on of it's members, so such functionality doesn't by anything the language doesn't already offer.

Is this possible? Or did the standards committee leave out this kind of introspective capability? I don't need it to look like the code gen file above - any simple method for compiletime specification of layout and members in a single step would solve this common problem.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

琉璃繁缕 2024-12-08 03:31:07

这在 C++11 中既可能又实用——事实上，在 C++03 中也是可能的，只是语法有点太笨拙了。我基于相同的想法编写了一个小型库 - 请参阅以下内容：

www.github.com/molw5/framework< /a>

示例语法：

class Object : serializable <Object,
    value <NAME(“Field 1”), int>,
    value <NAME(“Field 2”), float>,
    value <NAME(“Field 3”), double>>
{
};

原则上，大多数底层代码都可以在 C++03 中重现 - 一些没有可变参数模板的实现细节会......很棘手，但我相信它是可以恢复的核心功能。在 C++03 中无法重现的是上面的 NAME 宏，并且语法相当依赖它。该宏提供了从字符串生成唯一类型名所需的机制，如下所示：

NAME(“Field 1”)

进行扩展

 type_string <'F', 'i', 'e', 'l', 'd', ' ', '1'>

通过使用一些常见的宏和 constexpr（用于字符提取）。回到 C++03 中，需要手动输入类似于上面的 type_string 的内容。

This is both possible and practical in C++11 – in fact it was possible back in C++03, the syntax was just a little too unwieldy. I wrote a small library based around the same idea - see the following:

www.github.com/molw5/framework

Sample syntax:

class Object : serializable <Object,
    value <NAME(“Field 1”), int>,
    value <NAME(“Field 2”), float>,
    value <NAME(“Field 3”), double>>
{
};

Most of the underlying code could be reproduced, in principal, in C++03 – some of the implementation details without variadic templates would have been...tricky, but I believe it would have been possible to recover the core functionality. What you could not reproduce in C++03 was the NAME macro above and the syntax relies fairly heavily on it. The macro provides the machinery necessary to generate a unique typename from a string, that is the following:

NAME(“Field 1”)

expands to

 type_string <'F', 'i', 'e', 'l', 'd', ' ', '1'>

through the use of some common macros and constexpr (for character extraction). Back in C++03 something similar to the type_string above would need to be entered manually.

回复收藏 0 原文