math c unsigned-integer integer-promotion

C 中无符号整数相加

发布于 2024-12-03 01:46:58 字数 662 浏览 1 评论 0原文

这是两个非常简单的程序。我希望得到相同的输出，但我没有。我不明白为什么。第一个输出 251。第二个输出 -5。我可以理解为什么是 251。但是，我不明白为什么第二个程序给我一个 -5。

程序 1：

#include <stdio.h>

int main()
{

unsigned char  a;
unsigned char  b;
unsigned int  c;

a = 0;
b= -5;

c =  (a + b);

printf("c hex: %x\n", c);
printf("c dec: %d\n",c);

}

输出：

c hex: fb
c dec: 251

程序 2：

#include <stdio.h>

int main()
{

unsigned char  a;
unsigned char  b;
unsigned int  c;

a = 0;
b=  5;

c =  (a - b);

printf("c hex: %x\n", c);
printf("c dec: %d\n",c);

}

输出：

c hex: fffffffb
c dec: -5

原文

Here are two very simple programs. I would expect to get the same output, but I don't. I can't figure out why. The first outputs 251. The second outputs -5. I can understand why the 251. However, I don't see why the second program gives me a -5.

PROGRAM 1:

#include <stdio.h>

int main()
{

unsigned char  a;
unsigned char  b;
unsigned int  c;

a = 0;
b= -5;

c =  (a + b);

printf("c hex: %x\n", c);
printf("c dec: %d\n",c);

}

Output:

c hex: fb
c dec: 251

PROGRAM 2:

#include <stdio.h>

int main()
{

unsigned char  a;
unsigned char  b;
unsigned int  c;

a = 0;
b=  5;

c =  (a - b);

printf("c hex: %x\n", c);
printf("c dec: %d\n",c);

}

Output:

c hex: fffffffb
c dec: -5

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初心未许 2024-12-10 01:46:58

在第一个程序中，b=-5; 将 251 分配给 b。（转换为无符号类型总是减少模一加上目标类型最大值的值。）

在第二个程序中，b=5; 只需将 5 分配给 b ，然后 c = (a - b); 由于默认提升，以 int 类型执行减法 0-5 - 简而言之，“小于 int" 类型在用作算术和按位运算符的操作数之前，始终将其提升为 int。

编辑：我错过了一件事：由于 c 的类型为 unsigned int，因此第二个程序中的结果 -5 将转换为 当执行对 c 的赋值时，unsigned int 会产生 UINT_MAX-4。这就是您通过 printf 的 %x 说明符看到的内容。当使用 %d 打印 c 时，您会得到未定义的行为，因为 %d 需要一个（有符号的）int 参数并且您传递了一个 unsigned int 参数，其值无法以普通（有符号）int 表示。

回复收藏 0 原文

嘿嘿嘿 2024-12-10 01:46:58

这里有两个单独的问题。首先，对于看似相同的操作，您会得到不同的十六进制值。您忽略的根本事实是，char 被提升为 int（与 short 一样）来进行算术。区别如下：

a = 0  //0x00
b = -5 //0xfb
c = (int)a + (int)b

这里，a 扩展为 0x00000000，b 扩展为 0x000000fb（没有符号扩展，因为它是一个无符号字符）。然后，执行加法，得到0x000000fb。

a = 0  //0x00
b = 5  //0x05
c = (int)a - (int)b

此处，a 扩展为 0x00000000，b 扩展为 0x00000005。然后进行减法，得到0xfffffffb。

解决方案是什么？坚持使用 char 或 int；混合它们可能会导致您意想不到的事情。

第二个问题是 unsigned int 被打印为 -5，显然是一个有符号值。但是，在字符串中，您告诉 printf 打印其第二个参数，将其解释为有符号整数（这就是 "%d" 的含义）。这里的技巧是 printf 不知道您传入的变量的类型。它只是按照字符串告诉它的方式解释它们。下面是一个例子，我们告诉 printf 将指针打印为 int：

int main()
{
    int a = 0;
    int *p = &a;
    printf("%d\n", p);
}

当我运行这个程序时，每次都会得到一个不同的值，这是 a 的内存位置，转换为基数 10。您可能会注意到这种情况会导致警告。您应该阅读编译器向您提供的所有警告，并且仅在您完全确定自己正在执行您打算执行的操作时才忽略它们。

There are two separate issues here. The first is the fact that you are getting different hex values for what looks like the same operations. The underlying fact that you are missing is that chars are promoted to ints (as are shorts) to do arithmetic. Here is the difference:

a = 0  //0x00
b = -5 //0xfb
c = (int)a + (int)b

Here, a is extended to 0x00000000 and b is extended to 0x000000fb (not sign extended, because it is an unsigned char). Then, the addition is performed, and we get 0x000000fb.

a = 0  //0x00
b = 5  //0x05
c = (int)a - (int)b

Here, a is extended to 0x00000000 and b is extended to 0x00000005. Then, the subtraction is performed, and we get 0xfffffffb.

The solution? Stick with chars or ints; mixing them can cause things you won't expect.

The second problem is that an unsigned int is being printed as -5, clearly a signed value. However, in the string, you told printf to print its second argument, interpreted as a signed int (that's what "%d" means). The trick here is that printf doesn't know what the types of the variables you passed in. It merely interprets them in the way the string tells it to. Here's an example where we tell printf to print a pointer as an int:

int main()
{
    int a = 0;
    int *p = &a;
    printf("%d\n", p);
}

When I run this program, I get a different value each time, which is the memory location of a, converted to base 10. You may note that this kind of thing causes a warning. You should read all of the warnings your compiler gives you, and only ignore them if you're completely sure you are doing what you intend to.

回复收藏 0 原文

等风也等你 2024-12-10 01:46:58

您正在使用格式说明符%d。它将参数视为带符号的十进制数（基本上是 int）。

您从第一个程序中得到 251，因为 (unsigned char)-5 是 251，然后您将其打印为带符号的十进制数字。它被提升为 4 个字节而不是 1 个字节，并且这些位是 0，因此数字看起来像 0000...251 （其中 251 > 是二进制的，我只是没有转换它）。

您从第二个程序中得到 -5，因为 (unsigned int)-5 是一个很大的值，但转换为 int 时，它是 -5 。由于使用 printf 的方式，它被视为 int。

使用格式说明符%ud 打印无符号十进制值。

回复收藏 0 原文

偏闹i 2024-12-10 01:46:58

您所看到的是~~底层机器如何表示数字~~ C 标准如何定义有符号到无符号类型转换（对于算术）以及底层机器如何表示数字（对于算术）的结果最后未定义行为的结果）。

当我最初写我的回复时，我假设 C 标准没有明确定义如何将有符号值转换为无符号值，因为该标准没有定义如何表示有符号值或如何转换无符号值当范围超出有符号类型的范围时，转换为有符号值。

然而，事实证明，该标准确实明确定义了从负有符号值转换为正无符号值时的情况。对于整数，负有符号值 x 将转换为 UINT_MAX+1-x，就像它以二进制补码形式存储为有符号值，然后解释为无符号值一样。

所以当你说：

unsigned char  a;
unsigned char  b;
unsigned int c;

a = 0; 
b = -5;
c = a + b;

b的值变成251时，因为-5使用C标准转换为无符号类型的值UCHAR_MAX-5+1（255-5+1）。然后在转换之后进行添加。这使得 a+b 与 0 + 251 相同，然后存储在 c 中。但是，当您说：

unsigned char  a;
unsigned char  b;
unsigned int c;

a = 0;
b = 5;
c = (a-b);

printf("c dec: %d\n", c);

在这种情况下，a 和 b 被提升为无符号整数，以与 c 匹配，因此它们的值仍然是 0 和 5。然而，无符号整数数学中的 0 - 5 会导致下溢错误，该错误被定义为导致 UINT_MAX+1-5。如果这种情况发生在升级之前，则该值将为 UCHAR_MAX+1-5（即再次为 251）。

但是，您在输出中看到 -5 的原因是无符号整数 UINT_MAX-4 和 -5 具有相同的精确二进制表示形式，就像 -5 和 251 对于单字节数据类型一样，并且事实上，当您使用“%d”作为格式化字符串时，这告诉 printf 将 c 的值解释为有符号整数而不是无符号整数。

由于未定义从无符号值到无效值的有符号值的转换，因此结果将变得特定于实现。在您的情况下，由于底层机器对有符号值使用二进制补码，因此结果是无符号值 UINT_MAX-4 变为有符号值 -5。

在第一个程序中不会发生这种情况的唯一原因是，无符号整型和有符号整型都可以表示 251，因此两者之间的转换定义良好，使用“%d”或“%u”并不重要。然而，在第二个程序中，它会导致未定义的行为并变得特定于实现，因为 UINT_MAX-4 的值超出了有符号整数的范围。

幕后发生的事情

仔细检查您认为正在发生的事情或应该发生的事情与实际发生的事情总是好的，所以让我们现在看看编译器的汇编语言输出，以准确了解发生了什么。这是第一个程序的有意义的部分：

    mov     BYTE PTR [rbp-1], 0   ; a becomes 0
    mov     BYTE PTR [rbp-2], -5  ; b becomes -5, which as an unsigned char is also 251
    movzx   edx, BYTE PTR [rbp-1] ; promote a by zero-extending to an unsigned int, which is now 0
    movzx   eax, BYTE PTR [rbp-2] ; promote b by zero-extending to an unsigned int which is now 251
    add     eax, edx  ; add a and b, that is, 0 and 251

请注意，虽然我们在字节 b 中存储了有符号值 -5，但当编译器提升它时，它会通过对数字进行零扩展来提升它，这意味着它被解释为无符号值11111011 代表代替有符号的值。然后将提升的值加在一起成为c。这也是 C 标准定义有符号到无符号转换的原因——在使用二进制补码表示有符号值的体系结构上很容易实现转换。

现在程序 2：

    mov     BYTE PTR [rbp-1], 0 ; a = 0
    mov     BYTE PTR [rbp-2], 5 ; b = 5
    movzx   edx, BYTE PTR [rbp-1] ; a is promoted to 32-bit integer with value 0
    movzx   eax, BYTE PTR [rbp-2] ; b is promoted to a 32-bit integer with value 5
    mov     ecx, edx 
    sub     ecx, eax ; a - b is now done as 32-bit integers resulting in -5, which is '4294967291' when interpreted as unsigned

我们看到 a 和 b 在任何算术之前再次提升，因此我们最终减去两个无符号整数，这导致由于下溢而导致 UINT_MAX-4，这也是 -5 作为有符号值。因此，无论您将其解释为有符号还是无符号减法，由于机器使用二进制补码形式，结果都符合 C 标准，无需任何额外的转换。

What you're seeing is the result of ~~how the underlying machine is representing the numbers~~ how the C standard defines signed to unsigned type conversions (for the arithmetic) and how the underlying machine is representing numbers (for the result of the undefined behavior at the end).

When I originally wrote my response I had assumed that the C standard didn't explicitly define how signed values should be converted to unsigned values, since the standard doesn't define how signed values should be represented or how to convert unsigned values to signed values when the range is outside that of the signed type.

However, it turns out that the standard does explicitly define that when converting from negative signed to positive unsigned values. In the case of an integer, a negative signed value x will be converted to UINT_MAX+1-x, just as if it were stored as a signed value in two's complement and then interpreted as an unsigned value.

So when you say:

unsigned char  a;
unsigned char  b;
unsigned int c;

a = 0; 
b = -5;
c = a + b;

b's value becomes 251, because -5 is converted to an unsigned type of value UCHAR_MAX-5+1 (255-5+1) using the C standard. It's then after that conversion that the addition takes place. That makes a+b the same as 0 + 251, which is then stored in c. However, when you say:

unsigned char  a;
unsigned char  b;
unsigned int c;

a = 0;
b = 5;
c = (a-b);

printf("c dec: %d\n", c);

In this case, a and b are promoted to unsigned ints, to match with c, so they remain 0 and 5 in value. However 0 - 5 in unsigned integer math leads to an underflow error, which is defined to result in UINT_MAX+1-5. If this had happened before the promotion, the value would be UCHAR_MAX+1-5 (i.e. 251 again).

However, the reason you see -5 printed in your output is a combination of the fact that the unsigned integer UINT_MAX-4 and -5 have the same exact binary representation, just like -5 and 251 do with a single-byte datatype, and the fact that when you used "%d" as the formatting string, that told printf to interpret the value of c as a signed integer instead of an unsigned integer.

Since a conversion from unsigned values to signed values for invalid values isn't defined, the result becomes implementation specific. In your case, since the underlying machine uses two's complement for signed values, the result is that the unsigned value UINT_MAX-4 becomes the signed value -5.

The only reason this doesn't happen in the first program because an unsigned int and a signed int can both represent 251, so converting between the two is well defined and using "%d" or "%u" doesn't matter. In the second program, however, it results in undefined behavior and becomes implementation specific since your value of UINT_MAX-4 went outside the range of an signed int.

What's happening under the hood

It's always good to double check what you think is happening or what should happen with what's actually happening, so let's look at the assembly language output from the compiler now to see exactly what's going on. Here's the meaningful part of the first program:

    mov     BYTE PTR [rbp-1], 0   ; a becomes 0
    mov     BYTE PTR [rbp-2], -5  ; b becomes -5, which as an unsigned char is also 251
    movzx   edx, BYTE PTR [rbp-1] ; promote a by zero-extending to an unsigned int, which is now 0
    movzx   eax, BYTE PTR [rbp-2] ; promote b by zero-extending to an unsigned int which is now 251
    add     eax, edx  ; add a and b, that is, 0 and 251

Notice that although we store a signed value of -5 in the byte b, when the compiler promotes it, it promotes it by zero-extending the number, meaning it's being interpreted as the unsigned value that 11111011 represents instead of the signed value. Then the promoted values are added together to become c. This is also why the C standard defines signed to unsigned conversions the way it does -- it's easy to implement the conversions on architectures that use two's complement for signed values.

Now with program 2:

    mov     BYTE PTR [rbp-1], 0 ; a = 0
    mov     BYTE PTR [rbp-2], 5 ; b = 5
    movzx   edx, BYTE PTR [rbp-1] ; a is promoted to 32-bit integer with value 0
    movzx   eax, BYTE PTR [rbp-2] ; b is promoted to a 32-bit integer with value 5
    mov     ecx, edx 
    sub     ecx, eax ; a - b is now done as 32-bit integers resulting in -5, which is '4294967291' when interpreted as unsigned

We see that a and b are once again promoted before any arithmetic, so we end up subtracting two unsigned ints, which leads to a UINT_MAX-4 due to underflow, which is also -5 as a signed value. So whether you interpret it as a signed or unsigned subtraction, due to the machine using two's complement form, the result matches the C standard without any extra conversions.

回复收藏 0 原文