弄清楚网络、十六进制和 ascii 如何交互

发布于 2024-12-25 22:14:02 字数 374 浏览 1 评论 0原文

我最近被分配到一个 C++ 项目,涉及通过 UDP 在计算机之间发送信息。当数据包到达时,我有一个程序接受数据并将其显示为原始十六进制字符串。然而,我很难准确地掌握整个过程应该如何进行。据推测,十六进制字符串包含多个字段(例如,4 个字符的数组、一些 float_32 和一些 uint_32)。

如何将该字符串的各个部分转换为正确的变量类型?第一个值是 ASCII 标题,非常简单;十六进制字符串中的前八个字符是 ASCII 单词的十六进制表示(十六进制 0x45 可以直接翻译为大写字母 E)。但下一个值(32 位浮点数)对我来说并没有什么意义。十六进制值“42 01 33 33”和浮点值“32.3”(给定示例)之间有什么关系?

我有点不知所措,我觉得我错过了一些关于数字系统工作方式的基本信息。

I've recently been assigned to a C++ project involving information being sent between computers via UDP. When a packet arrives, I have a program which accepts the data and can display it as a raw hexadecimal string. However, I'm struggling to grasp exactly how this whole process is supposed to work. The hex string supposedly contains several fields (e.g. a 4-char array, some float_32s, and some uint_32s).

How do I translate the sections of this string into the correct variable types? The first value, an ASCII title, was simple enough; the first eight chars in the hex string are a hexadecimal representation of an ASCII word (0x45 hex can be translated directly to the capital letter E). But the next value, a 32-bit float, doesn't really make sense to me. What is the relation between the hex value "42 01 33 33" and the float value "32.3" (a given example)?

I'm a bit in over my head here, I feel I'm missing some essential information regarding the way number systems work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

眉黛浅 2025-01-01 22:14:02

C 中的所有类型都有一个表示形式(对于大多数类型来说,它是由特定的实现定义的)。大多数 C 实现使用 IEEE 754 来表示浮点类型(这实际上可能是C 和 C++,但从记忆来看不是)。维基百科文章解释了浮点类型如何在内存中表示。在大多数 C 和 C++ 实现中,float 是 32 位类型,double 是 64 位类型。因此,在这些实现中,float 的宽度为 4 个字节,double 的宽度为 8 个字节。

请小心,因为字节顺序可能不同。有些架构以小端存储浮点类型,有些以大端存储。还有一篇关于 endianness 的维基百科文章。

要将字节复制到浮点类型,您必须确保浮点类型的大小与您拥有的字节数相同,然后您可以将字节一一复制到浮点类型中。像这样的东西会给你它的要点:

unsigned char rep[] = { 0x42, 0x01, 0x33, 0x33 };
float someFloat;

if (sizeof(someFloat) == 4)
{
    memcpy(&someFloat, rep, sizeof(someFloat));
}
else
{
    // throw an exception or something
}

还有其他方法将字节复制到浮点类型,但要小心“打破规则”(类型双关语等)。另外,如果结果值不正确,可能是因为字节顺序错误,因此需要反向复制字节,以便表示中的第 4 个字节是浮点的第 1 个字节。

All types in C have a representation (which for most types is defined by a particular implementation). Most C implementations use IEEE 754 for representing the floating types (this may actually be a requirement for C and C++, but from memory it is not). The Wikipedia article explains how the floating types are represented in memory. In most C and C++ implementations, float is a 32-bit type and double is a 64-bit type. Therefore, in these implementations float is 4 bytes wide and double is 8 bytes wide.

Be careful, because the byte order can be different. Some architectures store the floating type in little endian, some in big endian. There is also a Wikipedia article on endianness too.

To copy the bytes to the floating type, you have to make sure that the floating type is the same size as the number of bytes you have, and then you can copy the bytes one-by-one ‘into’ the floating type. Something like this will give you the gist of it:

unsigned char rep[] = { 0x42, 0x01, 0x33, 0x33 };
float someFloat;

if (sizeof(someFloat) == 4)
{
    memcpy(&someFloat, rep, sizeof(someFloat));
}
else
{
    // throw an exception or something
}

There are other ways of copying the bytes to the floating type, but be careful about ‘breaking the rules’ (type-punning etc.). Also, if the resulting value is incorrect, it may be because the byte order is wrong, and therefore you need to copy the bytes in reverse, so that the 4th byte in the representation is the 1st byte of the float.

不羁少年 2025-01-01 22:14:02

如果您有十六进制值:

42 01 33 33

它相当于

0100 0010 0000 0001 0011 0011 0011 0011

二进制代码。

现在,有一个名为 IEEE 754 的浮点标准,它告诉您如何格式化浮点点数转换成二进制或返回。

其要点是第一位是符号(正/负数),接下来的 8 位是指数,最后 23 位是尾数。这就是计算机内部保存浮点数的方式,因为它只能存储 1 和 0。

如果按照 IEEE 指定的方式将它们加在一起,您将得到 32.3。

If you have a hex value:

42 01 33 33

It is the equivalent of

0100 0010 0000 0001 0011 0011 0011 0011

in binary code.

Now, there is a floating point standard called IEEE 754 which tells you how to format a floating point number into binary or back.

The gist of it is that the first bit is the sign (positive/negative number), the next 8 bits are the exponent and the last 23 are the mantisse. This is how the computer internally saves floating point numbers, since it's only able to store 1's and 0's.

If you add it all together in the way the IEEE specifies you get 32.3.

陌若浮生 2025-01-01 22:14:02

确切的数据格式由所使用的协议指定,但表示数字数据的常见方法是:

无符号整数:这实际上是最简单的。它的典型表示形式原则上与我们正常的十进制系统类似,只是“数字”是字节,并且可以有 256 个不同的值。

如果您查看像 3127 这样的十进制数,您会看到三位数字。最低有效数字是最后一位(本例中为 7)。最不显着意味着如果将其更改 1,您将得到该值的最小变化(即 1)。示例中最重要的数字是最左边的 3:如果将其更改为 1,则将进行值的最大更改,即更改 1000。由于有 10 个不同的数字(0 到 9), “3127”代表的数字是 3*10*10*10 + 1*10*10 + 2*10 + 7。注意itz只是一个约定,最高有效数字来自 第一的;您还可以定义最低有效数字在前,然后该数字将写为“7213”。

现在,在大多数编码中,无符号数字的工作方式完全相同,只是“数字”是字节,因此我们用基数 256 代替基数 10。此外,与十进制数字不同,没有通用约定是否最高有效字节 (MSB) ) 或最低有效字节 (LSB) 最先出现;这两种约定都用于不同的协议或文件格式。

例如,在 MSB 在前(也称为大端编码)的 4 字节(即 32 位)无符号 int 中,将表示值 1000 = 0*256^3 + 0*256^2 + 3*256 + 232由四个字节值 0, 0, 3, 232 或十六进制 00 00 03 E8 组成。对于小尾数编码(LSB 在前),则为 E8 03 00 00。作为 16 位整数,它只是 03 E8 (大端)或 E8 03 (小端)。

对于有符号整数,最常用的表示形式是二进制补码。基本上这意味着如果最高有效位是 1(即最高有效字节是 128 或更大),则字节序列不会对上面写的数字进行编码,而是通过减去 2^(bits) 得到的负数其中,(bits) 是数字中的位数。例如,在有符号 16 位 int 中,序列 FF FF 不是 16 位无符号 int 中的 65535,而是 65535-2^16=-1。与无符号整数一样,您必须区分大端和小端。例如,-3 在 16 位 bit endian 中将是 FF FD,但在 16 位 little endian 中将是 FD FF

浮点数要复杂一些;现在通常使用 IEEE/IEC 指定的格式。基本上,浮点数的形式为符号*(1.尾数)*2^指数,并且符号、尾数和指数存储在不同的子字段中。同样,也有小端和大端两种形式。

The exact data format is specified by the protocol used, but the common ways to represent numeric data are:

Unsigned integer: This is actually the simplest. Its typical representation works in principle like our normal decimal system, except that the "digits" are bytes, and can have 256 different values.

If you look at a decimal number like 3127, you see the three digits. The least significant digit is the last one (the 7 in this case). Least significant means that if you change it by 1, you get the minimal change of the value (namely 1). The most significant digit in the example is the 3 at the very left: If you change that one by 1, you make the maximal change of the value, namely a change of 1000. Since there are 10 different digits (0 to 9), the number represented by "3127" is 3*10*10*10 + 1*10*10 + 2*10 + 7. Note that itz is just a convention that the most significant digit comes first; you could also define that the least significant digit comes first, and then this number would be written as "7213".

Now in most encodings, unsigned numbers work exactly the same, except that the "digits" are bytes, and therefore instead of base 10 we have base 256. Also, unlike in decimal numbers, there's no universal convention whether the most significant byte (MSB) or the least significant byte (LSB) comes first; both conventions are used in different protocols or file formats.

For example, in 4-byte (i.e. 32 bit) unsigned int with MSB first (also called big-endian encoding), the value 1000 = 0*256^3 + 0*256^2 + 3*256 + 232 would be represented by the four byte values 0, 0, 3, 232, or hex 00 00 03 E8. For little-endian encoding (LSB first), it would be E8 03 00 00 instead. And as 16 bit integer, it would be just 03 E8 (big endian) or E8 03 (little endian).

For signed integers, the most often used representation is two's complement. Basically it means that if the most significant bit is 1 (i.e. the most significant byte is 128 or larger), the byte sequence doesn't encode the number as written above, but instead the negative number you get by subtracting 2^(bits) from it, where (bits) is the number of bits in the number. For example, in a signed 16-bit int, the sequence FF FF is not 65535 as it would be in 16-bit unsigned int, but rather 65535-2^16=-1. As with unsigned ints, you have to distinguish between big-endian and little-endian. For example, -3 would be FF FD in 16-bit bit endian, but FD FF in 16-bit little endian.

Floating point is quite a bit more complicated; today usually the format specified by IEEE/IEC is used. Basically, floating point numbers are of the form sign*(1.mantissa)*2^exponent, and sign, mantissa and exponent are stored in different subfields. Again, there are little-endian and big-endian forms.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文