生成二进制结构化数据的解析器;元编程或外部脚本?
我正在编写一个与专有协议交互的服务器。 目前,大多数代码由数据包处理程序组成,它们解析数据包的所有字段,同时确保可用数据的大小至少为每个字段后的最小剩余大小。除此之外,数据包处理程序还对接收到的数据进行有效性检查(即必须在一定范围内,或者在一组预定义值内)。
当然,当您将其与数据包的实际逻辑处理结合起来时,这是很多样板代码,因此我想自动生成解析器并在完全解析的结构上调用处理程序。
现在我看到可以采取两种方法:
提出一些元编程框架,使我能够描述数据包结构和最终的数据验证规则,以便我可以在编译时生成解析代码。我想这与 Boost.Spirit 的意图类似。
编写我自己的数据描述语言和一个可以从中生成 C++ 代码的外部工具。看起来并不太难,但肯定会使构建过程变得混乱,而且我通常不喜欢使用大量工具生成的代码。此外,这也不允许在源代码本身内部快速更改数据描述。
元编程方式在理论上似乎更优越,但我还没有想出一种完美的实现方式。最好声明数据包与声明类类似,并且不会充满宏。在我必须引用以前的数据成员的情况下,还有一个问题(字段重复可变次数的情况就是这种情况,其中计数是在数据包中较早指定的)。
有没有人有类似框架的经验,您有什么建议?
我了解 Google Protocol Buffers,但这具有侵入性,因为它需要控制协议。
I'm writing a server that interfaces with a proprietary protocol.
Currently most of the code consists of packet handlers that parse all of the fields of a packet, while making sure that the size of the data available is at least the minimum remaining size after each field. In addition to that, the packet handlers also do validity checks on the received data (i.e. must be in a certain range, or be in a set of predefined values).
Certainly this is a lot of boilerplate code when you combine it with the actual logic handling of the packet, so I would like to generate the parsers automatically and invoke the handlers on fully parsed structures.
Right now I see two approaches that I could take:
Come up with some metaprogramming framework that allows me to describe packet structures and eventually rules for data validation so that I can generate the parsing code at compile time. I guess this would be similar in intent to what Boost.Spirit does.
Write my own data description language and an external tool that will generate C++ code from it. Doesn't seem too hard but would certainly clutter up the build process and I generally dislike using large amounts of tool-generated code. Also this wouldn't permit quickly changing data descriptions inside the source code itself.
The metaprogramming way seems superior in theory, but I haven't thought out a flawless way of implementing this yet. Preferably declaring packets would be similar to declaring a class and would not be full of macros. There's also a problem in cases where I have to refer to previous data members (which is the case for fields repeated a variable number of times, where the count is specified earlier in the packet).
Does anyone have experience with similar frameworks, and what would you suggest?
I know about Google Protocol Buffers but that is intrusive in that it requires being in control of the protocol.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
过去,我曾多次为二进制结构化数据创建自己的语言和工具,但这在一定程度上是由于需要从数据定义中支持多种目标语言(当时是 C# 和 C++) ;我还创建了第三个目标来根据定义生成 HTML 参考文档。
我认为使用 C++ 模板元编程的主要优点是,如果有用的话,您可以直接与编译时类型系统进行交互。不过,对于典型的二进制结构化数据,我从未发现它有那么有用。例如,您需要一种方法来按特定顺序处理相关成员; Boost 序列化通过要求一个序列化方法来指定处理哪些成员以及以什么顺序处理来实现这一点。
I've gone the route of creating my own language and tooling for binary structured data multiple times in the past, but that was in part driven by the need to support multiple target languages from the data definitions (at the time, C# and C++); I also created a third target to produce HTML reference documentation from the definitions.
The main advantage I can see in using C++ template metaprogramming is that you can directly interact with the compile-time type system if and when that is useful. For typical binary structured data, though, I've never found it to be all that useful. For example, you'd need a way to process the relevant members in a specific order; Boost serialization does that by requiring a serialization method that specifies which members are processed and in what order.