网络数据包有效负载数据是否应该在正确的边界上对齐?
如果您有以下类作为网络数据包有效负载:
class Payload { 字符字段0; int 字段1; 字符字段2; int 字段3; };
通过套接字接收数据时,使用 Payload 这样的类是否会使数据接收者容易受到对齐问题的影响? 我认为该类要么需要重新排序,要么添加填充以确保对齐。
要么重新排序:
class Payload
{
int field1;
int field3;
char field0;
char field2;
};
要么添加填充:
class Payload
{
char field0;
char pad[3];
int field1;
char field2;
char pad[3];
int field3;
};
如果由于某种原因重新排序没有意义,我认为添加填充将是首选,因为它可以避免对齐问题,即使它会增加类的大小。
您对网络数据中的此类对齐问题有什么经验?
If you have the following class as a network packet payload:
class Payload
{
char field0;
int field1;
char field2;
int field3;
};
Does using a class like Payload leave the recipient of the data susceptible to alignment issues when receiving the data over a socket? I would think that the class would either need to be reordered or add padding to ensure alignment.
Either reorder:
class Payload
{
int field1;
int field3;
char field0;
char field2;
};
or add padding:
class Payload
{
char field0;
char pad[3];
int field1;
char field2;
char pad[3];
int field3;
};
If reordering doesn't make sense for some reason, I would think adding the padding would be preferred since it would avoid alignment issues even though it would increase the size of the class.
What is your experience with such alignment issues in network data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
正确,盲目地忽略对齐会导致问题。 即使在同一操作系统上,如果两个组件是使用不同的编译器或不同的编译器版本编译的。
最好...
1) 通过某种序列化过程传递数据。
2)或者单独传递每个原语,同时仍然注意字节顺序== Endianness
一个好的起点是 Boost Serialization。
Correct, blindly ignoring alignment can cause problems. Even on the same operating system if 2 components were compiled with different compilers or different compiler versions.
It is better to...
1) Pass your data through some sort of serialization process.
2) Or pass each of your primitives individually, while still paying attention to byte ordering == Endianness
A good place to start would be Boost Serialization.
您应该查看Google protocol buffers,或Boost::serialize,例如另一位海报说。
如果您想自己推出,请正确操作。
如果您使用 stdint.h 中的类型(即:uint32_t、int8_t 等),并确保每个变量都具有“本机对齐”(意味着其地址可被其大小整除(
int8_t 可以在任何地方,
uint16_t
可以在偶数地址上,uint32_t
可以被 4 整除的地址上),您不必担心对齐或 在之前的工作中,我们通过 XML 定义的数据总线(以太网或 CANbus 或 byteflight 或串行端口)发送所有结构,有一个解析器可以验证结构内变量的对齐情况(如果有人编写了错误的 XML,则会发出警报)。 ),然后为各种平台和语言生成头文件来发送和接收结构,这对我们来说非常有效,我们永远不必担心手写代码。进行消息解析或打包,并且保证所有平台都不会出现愚蠢的小编码错误。我们的一些数据链路层非常受带宽限制,因此我们实现了位字段之类的东西,让解析器为每个平台生成正确的代码。 。 我们还有枚举,这非常好(你会惊讶地发现,对于人类来说,手动搞砸枚举上的位域编码是多么容易)。
除非你需要担心它在带有 C 的 8051 和 HC11 上运行,或者在带宽非常受限的数据链路层上运行,否则你不会想出比协议缓冲区更好的东西,你只会花很多时间尝试与他们平起平坐。
You should look into Google protocol buffers, or Boost::serialize like another poster said.
If you want to roll your own, please do it right.
If you use types from stdint.h (ie:
uint32_t, int8_t,
etc.), and make sure every variable has "native alignment" (meaning its address is divisible evenly by its size (int8_t
s are anywhere,uint16_t
s are on even addresses,uint32_t
s are on addresses divisble by 4), you won't have to worry about alignment or packing.At a previous job we had all structures sent over our databus (ethernet or CANbus or byteflight or serial ports) defined in XML. There was a parser that would validate alignment on the variables within the structures (alerting you if someone wrote bad XML), and then generate header files for various platforms and languages to send and receive the structures. This worked really well for us, we never had to worry about hand-writing code to do message parsing or packing, and it was guaranteed that all platforms wouldn't have stupid little coding errors. Some of our datalink layers were pretty bandwidth constrained, so we implemented things like bitfields, with the parser generating the proper code for each platform. We also had enumerations, which was very nice (you'd be surprised how easy it is for a human to screw up coding bitfields on enumerations by hand).
Unless you need to worry about it running on 8051s and HC11s with C, or over data link layers that are very bandwidth constrained, you are not going to come up with something better than protocol buffers, you'll just spend a lot of time trying to be on par with them.
今天我们使用直接覆盖在内存中的二进制数据包上的打包结构,我对我决定这样做的那一天感到后悔。 我们让它发挥作用的唯一方法是:
typedef unsigned int uint32_t
)如果您刚刚开始,我建议您跳过尝试表示正在发生的事情的整个混乱具有结构的电线。 只需分别序列化每个原始元素即可。 如果您选择不使用 Boost Serialize 等现有库或 TibCo 等中间件,那么通过围绕二进制缓冲区编写抽象来隐藏序列化方法的细节,可以省去很多麻烦。 目标是这样的接口:
每个 packet 类都将有一个方法来序列化到
ByteBuffer
或从ByteBuffer
反序列化并偏移。 这是我绝对希望能够回到过去并纠正的事情之一。 我无法计算出有多少次我花时间调试由于忘记交换字节或未打包struct
而导致的问题。要避免的另一个陷阱是使用联合来表示字节或使用 memcpy 到无符号字符缓冲区来提取字节。 如果您总是在线上使用 Big-Endian,那么您可以使用简单的代码将字节写入缓冲区,而不必担心
htonl
内容:这仍然很好地与平台无关,因为数字表示始终是逻辑上是大尾数。 这段代码还非常适合使用基于基本类型大小的模板(想想
encode((unsigned char const*)&val)
)...不是那么漂亮,但是非常非常容易编写和维护。We use packed structures that are overlaid directly over the binary packet in memory today and I am rueing the day that I decided to do that. The only way that we have gotten this to work is by:
typedef unsigned int uint32_t
)If you are just starting out, I would advise you to skip the whole mess of trying to represent what's on the wire with structures. Just serialize each primitive element separately. If you choose not to use an existing library like Boost Serialize or a middleware like TibCo, then save yourself a lot of headache by writing an abstraction around a binary buffer that hides the details of your serialization method. Aim for an interface like:
The each of your packet classes would have a method to serialize to a
ByteBuffer
or be deserialized from aByteBuffer
and offset. This is one of those things that I absolutely wish that I could go back in time and correct. I cannot count the number of times that I have spent time debugging an issue that was caused by forgetting to swap bytes or not packing astruct
.The other trap to avoid is using a
union
to represent bytes ormemcpy
ing to an unsigned char buffer to extract bytes. If you always use Big-Endian on the wire, then you can use simple code to write the bytes to the buffer and not worry about thehtonl
stuff:This remains nicely platform agnostic since the numerical representation is always logically Big-Endian. This code also lends itself very nicely to using templates based on the size of the primitive type (think
encode<sizeof(val)>((unsigned char const*)&val)
)... not so pretty, but very, very easy to write and maintain.我的经验是,以下方法是首选(按优先顺序):
使用高级框架,如 Tibco、CORBA、DCOM 或任何可以为您管理所有这些问题的框架。
在连接的两端编写您自己的库,以了解打包、字节顺序和其他问题。
在连接的两端编写您自己的库,以了解打包
仅使用字符串数据进行通信。
尝试在没有任何中介的情况下发送原始二进制数据几乎肯定会导致很多问题。
My experience is that the following approaches are to be preferred (in order of preference):
Use a high level framework like Tibco, CORBA, DCOM or whatever that will manage all these issues for you.
Write your own libraries on both sides of the connection that are are aware of packing, byte order and other issues.
Communicate only using string data.
Trying to send raw binary data without any mediation will almost certainly cause lots of problems.
如果您想要任何形式的可移植性,您实际上不能为此使用类或结构。 在您的示例中,整数可能是 32 位或 64 位,具体取决于您的系统。 您很可能使用的是小端机器,但较旧的 Apple Mac 是大端机器。 编译器也可以随意填充。
一般来说,在确保使用 n2hll、n2hl 或 n2hs 获得正确的字节顺序后,您需要一种方法,每次将每个字段写入缓冲区一个字节。
You practically can't use a class or structure for this if you want any sort of portability. In your example, the ints may be 32-bit or 64-bit depending on your system. You're most likely using a little endian machine, but the older Apple macs are big endian. The compiler is free to pad as it likes too.
In general you'll need a method that writes each field to the buffer a byte at a time, after ensuring you get the byte order right with n2hll, n2hl or n2hs.
如果结构中没有自然对齐,编译器通常会插入填充,以便对齐正确。 但是,如果您使用编译指示来“打包”结构(删除填充),则可能会产生非常有害的副作用。 在 PowerPC 上,未对齐的浮点会生成异常。 如果您正在使用不处理该异常的嵌入式系统,您将得到重置。 如果有一个例程来处理该中断,它可能会大大减慢您的代码速度,因为它将使用软件例程来解决未对齐问题,这会默默地削弱您的代码你的表现。
If you don't have natural alignment in the structures, compilers will usually insert padding so that alignment is proper. If, however, you use pragmas to "pack" the structures (remove the padding), there can be very harmful side affects. On PowerPCs, non-aligned floats generate an exception. If you're working on an embedded system that doesn't handle that exception, you'll get a reset. If there is a routine to handle that interrupt, it can DRASTICALLY slow down your code, because it'll use a software routine to work around the misalignment, which will silently cripple your performance.