如何解释 C++ 中的二进制数据?

发布于 2024-07-19 22:12:11 字数 399 浏览 8 评论 0原文

我正在以数据包(64 字节)的形式向设备发送和接收二进制数据。 数据具有特定的格式,部分格式随着不同的请求/响应而变化。

现在我正在为接收到的数据设计一个解释器。 简单地按位置读取数据就可以了,但是当我有十几种不同的响应格式时,看起来就不那么酷了。 我目前正在考虑为此目的创建一些结构,但我不知道它如何与填充一起使用。

也许有更好的方法?


相关:

I am sending and receiving binary data to/from a device in packets (64 byte). The data has a specific format, parts of which vary with different request / response.

Now I am designing an interpreter for the received data. Simply reading the data by positions is OK, but doesn't look that cool when I have a dozen different response formats. I am currently thinking about creating a few structs for that purpose, but I don't know how will it go with padding.

Maybe there's a better way?


Related:

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

逆光下的微笑 2024-07-26 22:12:11

您需要使用结构和/或联合。 您需要确保连接两端的数据都正确打包,并且如果连接的任何一端有可能运行不同的网络字节顺序,您可能需要在每一端进行网络字节顺序的转换。字节顺序。

举个例子:

#pragma pack(push)  /* push current alignment to stack */
#pragma pack(1)     /* set alignment to 1 byte boundary */
typedef struct {
    unsigned int    packetID;  // identifies packet in one direction
    unsigned int    data_length;
    char            receipt_flag;  // indicates to ack packet or keep sending packet till acked
    char            data[]; // this is typically ascii string data w/ \n terminated fields but could also be binary
} tPacketBuffer ;
#pragma pack(pop)   /* restore original alignment from stack */

然后分配时:

packetBuffer.packetID = htonl(123456);

然后接收时:

packetBuffer.packetID = ntohl(packetBuffer.packetID);

下面是一些关于 Endianness 的讨论和对齐和结构打包

如果您不这样做如果不打包结构,它最终会与字边界对齐,并且结构的内部布局及其大小将不正确。

You need to use structs and or unions. You'll need to make sure your data is properly packed on both sides of the connection and you may want to translate to and from network byte order on each end if there is any chance that either side of the connection could be running with a different endianess.

As an example:

#pragma pack(push)  /* push current alignment to stack */
#pragma pack(1)     /* set alignment to 1 byte boundary */
typedef struct {
    unsigned int    packetID;  // identifies packet in one direction
    unsigned int    data_length;
    char            receipt_flag;  // indicates to ack packet or keep sending packet till acked
    char            data[]; // this is typically ascii string data w/ \n terminated fields but could also be binary
} tPacketBuffer ;
#pragma pack(pop)   /* restore original alignment from stack */

and then when assigning:

packetBuffer.packetID = htonl(123456);

and then when receiving:

packetBuffer.packetID = ntohl(packetBuffer.packetID);

Here are some discussions of Endianness and Alignment and Structure Packing

If you don't pack the structure it'll end up aligned to word boundaries and the internal layout of the structure and it's size will be incorrect.

梦中楼上月下 2024-07-26 22:12:11

我以前已经做过无数次了:这是一个非常常见的场景。 有很多事情我几乎总是做。

不要太担心让它成为最有效的东西。

如果我们确实花费了大量时间打包和拆包数据包,那么我们总是可以更改它以提高效率。 虽然我还没有遇到过必须这样做的情况,但我还没有实现网络路由器!

虽然使用结构/联合是运行时最有效的方法,但它带来了许多复杂性:说服编译器打包结构/联合以匹配所需数据包的八位字节结构,努力避免对齐和字节顺序问题,并且缺乏安全性,因为没有或很少有机会对调试版本进行健全性检查。

我经常会得到一个包含以下内容的体系结构:

  • 数据包基类。 任何公共数据字段都是可访问的(但不可修改)。 如果数据不是以打包格式存储,则有一个虚拟函数将生成打包数据包。
  • 许多针对特定数据包类型的表示类,源自通用数据包类型。 如果我们使用打包函数,那么每个表示类都必须实现它。
  • 任何可以从表示类的特定类型(即来自公共数据字段的数据包类型id)推断出的内容都作为初始化的一部分进行处理,并且在其他情况下是不可修改的。
  • 每个表示类都可以从解包的数据包构建,或者如果数据包数据对该类型无效,则将正常失败。 为了方便起见,可以将其封装在工厂中。
  • 如果我们没有可用的 RTTI,我们可以使用数据包 ID 来获取“穷人的 RTTI”,以确定对象真正属于哪个特定的表示类。

在所有这些中,可以(即使只是为了调试版本)验证每个可修改的字段是否被设置为合理的值。 虽然看起来工作量很大,但它使得很难获得无效格式的数据包,可以使用调试器轻松地通过肉眼检查预打包的数据包内容(因为它都是正常的平台本机格式变量)。

如果我们确实必须实现更高效的存储方案,那么也可以将其包含在这个抽象中,而几乎不需要额外的性能成本。

I've done this innumerable times before: it's a very common scenario. There's a number of things which I virtually always do.

Don't worry too much about making it the most efficient thing available.

If we do wind up spending a lot of time packing and unpacking packets, then we can always change it to be more efficient. Whilst I've not encountered a case where I've had to as yet, I've not been implementing network routers!

Whilst using structs/unions is the most efficient approach in term of runtime, it comes with a number of complications: convincing your compiler to pack the structs/unions to match the octet structure of the packets you need, work to avoid alignment and endianness issues, and a lack of safety since there is no or little opportunity to do sanity checks on debug builds.

I often wind up with an architecture including the following kinds of things:

  • A packet base class. Any common data fields are accessible (but not modifiable). If the data isn't stored in a packed format, then there's a virtual function which will produce a packed packet.
  • A number of presentation classes for specific packet types, derived from common packet type. If we're using a packing function, then each presentation class must implement it.
  • Anything which can be inferred from the specific type of the presentation class (i.e. a packet type id from a common data field), is dealt with as part of initialisation and is otherwise unmodifiable.
  • Each presentation class can be constructed from an unpacked packet, or will gracefully fail if the packet data is invalid for the that type. This can then be wrapped up in a factory for convenience.
  • If we don't have RTTI available, we can get "poor-man's RTTI" using the packet id to determine which specific presentation class an object really is.

In all of this, it's possible (even if just for debug builds) to verify that each field which is modifiable is being set to a sane value. Whilst it might seem like a lot of work, it makes it very difficult to have an invalidly formatted packet, a pre-packed packets contents can be easilly checked by eye using a debugger (since it's all in normal platform-native format variables).

If we do have to implement a more efficient storage scheme, that too can be wrapped in this abstraction with little additional performance cost.

岁月打碎记忆 2024-07-26 22:12:11

在不知道数据的确切格式的情况下,很难说出最好的解决方案是什么。 您考虑过使用工会吗?

It's hard to say what the best solution is without knowing the exact format(s) of the data. Have you considered using unions?

别再吹冷风 2024-07-26 22:12:11

我同意伍吉的观点。 您还可以使用代码生成来执行此操作。 使用一个简单的数据定义文件来定义所有数据包类型,然后对其运行 python 脚本以生成原型结构和每个数据包类型的序列化/反序列化函数。

I agree with Wuggy. You can also use code generation to do this. Use a simple data-definition file to define all your packet types, then run a python script over it to generate prototype structures and serialiation/unserialization functions for each one.

深海里的那抹蓝 2024-07-26 22:12:11

这是一个“开箱即用”的解决方案,但我建议看一下 Python 构造 库。

Construct是一个Python库
数据的解析和构建
结构(二进制或文本)。 这是
基于定义数据的概念
以声明方式构建结构,
而不是程序代码:更多
复杂的结构由
更简单的层次结构。 这是
第一个让解析变得有趣的库,
而不是通常的头痛
今天。

构造非常健壮和强大,仅仅阅读教程将帮助您更好地理解问题。 作者还计划从定义自动生成 C 代码,因此绝对值得花精力去阅读。

This is an "out-of-the-box" solution, but I'd suggest to take a look at the Python construct library.

Construct is a python library for
parsing and building of data
structures (binary or textual). It is
based on the concept of defining data
structures in a declarative manner,
rather than procedural code: more
complex constructs are composed of a
hierarchy of simpler ones. It's the
first library that makes parsing fun,
instead of the usual headache it is
today.

construct is very robust and powerful, and just reading the tutorial will help you understand the problem better. The author also has plans for auto-generating C code from definitions, so it's definitely worth the effort to read about.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文