C 中无符号整数相加
这是两个非常简单的程序。我希望得到相同的输出,但我没有。我不明白为什么。第一个输出 251。第二个输出 -5。我可以理解为什么是 251。但是,我不明白为什么第二个程序给我一个 -5。
程序 1:
#include <stdio.h>
int main()
{
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b= -5;
c = (a + b);
printf("c hex: %x\n", c);
printf("c dec: %d\n",c);
}
输出:
c hex: fb
c dec: 251
程序 2:
#include <stdio.h>
int main()
{
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b= 5;
c = (a - b);
printf("c hex: %x\n", c);
printf("c dec: %d\n",c);
}
输出:
c hex: fffffffb
c dec: -5
Here are two very simple programs. I would expect to get the same output, but I don't. I can't figure out why. The first outputs 251. The second outputs -5. I can understand why the 251. However, I don't see why the second program gives me a -5.
PROGRAM 1:
#include <stdio.h>
int main()
{
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b= -5;
c = (a + b);
printf("c hex: %x\n", c);
printf("c dec: %d\n",c);
}
Output:
c hex: fb
c dec: 251
PROGRAM 2:
#include <stdio.h>
int main()
{
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b= 5;
c = (a - b);
printf("c hex: %x\n", c);
printf("c dec: %d\n",c);
}
Output:
c hex: fffffffb
c dec: -5
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在第一个程序中,
b=-5;
将 251 分配给b
。 (转换为无符号类型总是减少模一加上目标类型最大值的值。)在第二个程序中,
b=5;
只需将 5 分配给b
,然后c = (a - b);
由于默认提升,以int
类型执行减法 0-5 - 简而言之,“小于int
" 类型在用作算术和按位运算符的操作数之前,始终将其提升为int
。编辑:我错过了一件事:由于
c
的类型为unsigned int
,因此第二个程序中的结果 -5 将转换为当执行对
会产生c
的赋值时,unsigned intUINT_MAX-4
。这就是您通过printf
的%x
说明符看到的内容。当使用%d
打印c
时,您会得到未定义的行为,因为%d
需要一个(有符号的)int
参数并且您传递了一个unsigned int
参数,其值无法以普通(有符号)int
表示。In the first program,
b=-5;
assigns 251 tob
. (Conversions to an unsigned type always reduce the value modulo one plus the max value of the destination type.)In the second program,
b=5;
simply assigns 5 tob
, thenc = (a - b);
performs the subtraction 0-5 as typeint
due to the default promotions - put simply, "smaller thanint
" types are always promoted toint
before being used as operands of arithmetic and bitwise operators.Edit: One thing I missed: Since
c
has typeunsigned int
, the result -5 in the second program will be converted tounsigned int
when the assignment toc
is performed, resulting inUINT_MAX-4
. This is what you see with the%x
specifier toprintf
. When printingc
with%d
, you get undefined behavior, because%d
expects a (signed)int
argument and you passed anunsigned int
argument with a value that's not representable in plain (signed)int
.这里有两个单独的问题。首先,对于看似相同的操作,您会得到不同的十六进制值。您忽略的根本事实是,
char
被提升为int
(与short
一样)来进行算术。区别如下:这里,
a
扩展为0x00000000
,b
扩展为0x000000fb
(没有符号扩展,因为它是一个无符号字符)。然后,执行加法,得到0x000000fb
。此处,
a
扩展为0x00000000
,b
扩展为0x00000005
。然后进行减法,得到0xfffffffb
。解决方案是什么?坚持使用
char
或int
;混合它们可能会导致您意想不到的事情。第二个问题是
unsigned int
被打印为-5
,显然是一个有符号值。但是,在字符串中,您告诉 printf 打印其第二个参数,将其解释为有符号整数(这就是"%d"
的含义)。这里的技巧是 printf 不知道您传入的变量的类型。它只是按照字符串告诉它的方式解释它们。下面是一个例子,我们告诉printf
将指针打印为 int:当我运行这个程序时,每次都会得到一个不同的值,这是
a
的内存位置,转换为基数 10。您可能会注意到这种情况会导致警告。您应该阅读编译器向您提供的所有警告,并且仅在您完全确定自己正在执行您打算执行的操作时才忽略它们。There are two separate issues here. The first is the fact that you are getting different hex values for what looks like the same operations. The underlying fact that you are missing is that
char
s are promoted toint
s (as areshort
s) to do arithmetic. Here is the difference:Here,
a
is extended to0x00000000
andb
is extended to0x000000fb
(not sign extended, because it is an unsigned char). Then, the addition is performed, and we get0x000000fb
.Here,
a
is extended to0x00000000
andb
is extended to0x00000005
. Then, the subtraction is performed, and we get0xfffffffb
.The solution? Stick with
char
s orint
s; mixing them can cause things you won't expect.The second problem is that an
unsigned int
is being printed as-5
, clearly a signed value. However, in the string, you toldprintf
to print its second argument, interpreted as a signed int (that's what"%d"
means). The trick here is thatprintf
doesn't know what the types of the variables you passed in. It merely interprets them in the way the string tells it to. Here's an example where we tellprintf
to print a pointer as an int:When I run this program, I get a different value each time, which is the memory location of
a
, converted to base 10. You may note that this kind of thing causes a warning. You should read all of the warnings your compiler gives you, and only ignore them if you're completely sure you are doing what you intend to.您正在使用格式说明符
%d
。它将参数视为带符号的十进制数(基本上是int
)。您从第一个程序中得到 251,因为
(unsigned char)-5
是 251,然后您将其打印为带符号的十进制数字。它被提升为 4 个字节而不是 1 个字节,并且这些位是0
,因此数字看起来像0000...251
(其中251
> 是二进制的,我只是没有转换它)。您从第二个程序中得到 -5,因为
(unsigned int)-5
是一个很大的值,但转换为int
时,它是-5
。由于使用 printf 的方式,它被视为 int。使用格式说明符
%ud
打印无符号十进制值。You're using the format specifier
%d
. That treats the argument as a signed decimal number (basicallyint
).You get 251 from the first program because
(unsigned char)-5
is 251 then you print it like a signed decimal digit. It gets promoted to 4 bytes instead of 1, and those bits are0
, so the number looks like0000...251
(where the251
is binary, I just didn't convert it).You get -5 from the second program because
(unsigned int)-5
is some large value, but casted to anint
, it's-5
. It gets treated like an int because of the way you useprintf
.Use the format specifier
%ud
to print unsigned decimal values.您所看到的是
底层机器如何表示数字C 标准如何定义有符号到无符号类型转换(对于算术)以及底层机器如何表示数字(对于算术)的结果最后未定义行为的结果)。当我最初写我的回复时,我假设 C 标准没有明确定义如何将有符号值转换为无符号值,因为该标准没有定义如何表示有符号值或如何转换无符号值当范围超出有符号类型的范围时,转换为有符号值。
然而,事实证明,该标准确实明确定义了从负有符号值转换为正无符号值时的情况。对于整数,负有符号值 x 将转换为 UINT_MAX+1-x,就像它以二进制补码形式存储为有符号值,然后解释为无符号值一样。
所以当你说:
b的值变成251时,因为-5使用C标准转换为无符号类型的值UCHAR_MAX-5+1(255-5+1)。然后在转换之后进行添加。这使得 a+b 与 0 + 251 相同,然后存储在 c 中。但是,当您说:
在这种情况下,a 和 b 被提升为无符号整数,以与 c 匹配,因此它们的值仍然是 0 和 5。然而,无符号整数数学中的 0 - 5 会导致下溢错误,该错误被定义为导致 UINT_MAX+1-5。如果这种情况发生在升级之前,则该值将为 UCHAR_MAX+1-5(即再次为 251)。
但是,您在输出中看到 -5 的原因是无符号整数 UINT_MAX-4 和 -5 具有相同的精确二进制表示形式,就像 -5 和 251 对于单字节数据类型一样,并且事实上,当您使用“%d”作为格式化字符串时,这告诉 printf 将 c 的值解释为有符号整数而不是无符号整数。
由于未定义从无符号值到无效值的有符号值的转换,因此结果将变得特定于实现。在您的情况下,由于底层机器对有符号值使用二进制补码,因此结果是无符号值 UINT_MAX-4 变为有符号值 -5。
在第一个程序中不会发生这种情况的唯一原因是,无符号整型和有符号整型都可以表示 251,因此两者之间的转换定义良好,使用“%d”或“%u”并不重要。然而,在第二个程序中,它会导致未定义的行为并变得特定于实现,因为 UINT_MAX-4 的值超出了有符号整数的范围。
幕后发生的事情
仔细检查您认为正在发生的事情或应该发生的事情与实际发生的事情总是好的,所以让我们现在看看编译器的汇编语言输出,以准确了解发生了什么。这是第一个程序的有意义的部分:
请注意,虽然我们在字节 b 中存储了有符号值 -5,但当编译器提升它时,它会通过对数字进行零扩展来提升它,这意味着它被解释为无符号值11111011 代表代替有符号的值。然后将提升的值加在一起成为c。这也是 C 标准定义有符号到无符号转换的原因——在使用二进制补码表示有符号值的体系结构上很容易实现转换。
现在程序 2:
我们看到 a 和 b 在任何算术之前再次提升,因此我们最终减去两个无符号整数,这导致由于下溢而导致 UINT_MAX-4,这也是 -5 作为有符号值。因此,无论您将其解释为有符号还是无符号减法,由于机器使用二进制补码形式,结果都符合 C 标准,无需任何额外的转换。
What you're seeing is the result of
how the underlying machine is representing the numbershow the C standard defines signed to unsigned type conversions (for the arithmetic) and how the underlying machine is representing numbers (for the result of the undefined behavior at the end).When I originally wrote my response I had assumed that the C standard didn't explicitly define how signed values should be converted to unsigned values, since the standard doesn't define how signed values should be represented or how to convert unsigned values to signed values when the range is outside that of the signed type.
However, it turns out that the standard does explicitly define that when converting from negative signed to positive unsigned values. In the case of an integer, a negative signed value x will be converted to UINT_MAX+1-x, just as if it were stored as a signed value in two's complement and then interpreted as an unsigned value.
So when you say:
b's value becomes 251, because -5 is converted to an unsigned type of value UCHAR_MAX-5+1 (255-5+1) using the C standard. It's then after that conversion that the addition takes place. That makes a+b the same as 0 + 251, which is then stored in c. However, when you say:
In this case, a and b are promoted to unsigned ints, to match with c, so they remain 0 and 5 in value. However 0 - 5 in unsigned integer math leads to an underflow error, which is defined to result in UINT_MAX+1-5. If this had happened before the promotion, the value would be UCHAR_MAX+1-5 (i.e. 251 again).
However, the reason you see -5 printed in your output is a combination of the fact that the unsigned integer UINT_MAX-4 and -5 have the same exact binary representation, just like -5 and 251 do with a single-byte datatype, and the fact that when you used "%d" as the formatting string, that told printf to interpret the value of c as a signed integer instead of an unsigned integer.
Since a conversion from unsigned values to signed values for invalid values isn't defined, the result becomes implementation specific. In your case, since the underlying machine uses two's complement for signed values, the result is that the unsigned value UINT_MAX-4 becomes the signed value -5.
The only reason this doesn't happen in the first program because an unsigned int and a signed int can both represent 251, so converting between the two is well defined and using "%d" or "%u" doesn't matter. In the second program, however, it results in undefined behavior and becomes implementation specific since your value of UINT_MAX-4 went outside the range of an signed int.
What's happening under the hood
It's always good to double check what you think is happening or what should happen with what's actually happening, so let's look at the assembly language output from the compiler now to see exactly what's going on. Here's the meaningful part of the first program:
Notice that although we store a signed value of -5 in the byte b, when the compiler promotes it, it promotes it by zero-extending the number, meaning it's being interpreted as the unsigned value that 11111011 represents instead of the signed value. Then the promoted values are added together to become c. This is also why the C standard defines signed to unsigned conversions the way it does -- it's easy to implement the conversions on architectures that use two's complement for signed values.
Now with program 2:
We see that a and b are once again promoted before any arithmetic, so we end up subtracting two unsigned ints, which leads to a UINT_MAX-4 due to underflow, which is also -5 as a signed value. So whether you interpret it as a signed or unsigned subtraction, due to the machine using two's complement form, the result matches the C standard without any extra conversions.
将负数分配给无符号变量基本上违反了规则。您所做的是将负数转换为大的正数。从技术上讲,您甚至不能保证从一个处理器到另一个处理器的转换是相同的 - 在 1 的补码系统(如果仍然存在)上,您会得到不同的值,例如。
所以你得到你得到的。你不能指望有符号代数仍然适用。
Assigning a negative number to an unsigned variable is basically breaking the rules. What you're doing is converting the negative number to a large positive number. You're not even guaranteed, technically, that the conversion is the same from one processor to another -- on a 1's complement system (if any still existed) you'd get a different value, eg.
So you get what you get. You can't expect signed algebra to still apply.