编译器对标记 int unsigned 进行优化?

发布于 2024-10-16 06:48:16 字数 82 浏览 7 评论 0原文

对于永远不会取 -ve 值的整数,可以是 unsigned int 或 int。 从编译器角度或纯粹的CPU周期角度来看,x86_64 有什么区别吗?

For an integer that is never expected to take -ve values, one could unsigned int or int.
From a compiler perspective or purely cpu cycle perspective is there any difference on x86_64 ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

坐在坟头思考人生 2024-10-23 06:48:16

这取决于。它可能会采用任何一种方式,具体取决于您使用该 int 执行的操作以及底层硬件的属性。


无符号整型的一个明显例子是整数除法运算。在 C/C++ 中,整数除法应该向零舍入,而 x86 上的机器整数除法则向负无穷大舍入。此外,整数除法的各种“优化”替换(移位等)通常也会向负无穷大舍入。因此,为了满足标准要求,编译器被迫使用额外的机器指令来调整有符号整数除法结果。在无符号整数除法的情况下,不会出现此问题,这就是为什么通常整数除法对于无符号类型比对于有符号类型要快得多。

例如,考虑这个简单的表达式

rand() / 2

MSVC 编译器为此表达式生成的代码通常如下所示

call        rand
cdq              
sub         eax,edx 
sar         eax,1 

请注意,我们在这里看到的不是单个移位指令 (sar),而是一大堆指令,即我们的 sar 前面有两个额外的指令(cdqsub)。这些额外的指令只是为了“调整”除法,以强制它生成“正确”(从 C 语言的角度来看)结果。请注意,编译器不知道您的值将始终为正,因此它必须始终无条件地生成这些指令。他们永远不会做任何有用的事情,从而浪费 CPU 周期。

不看一下代码,

(unsigned) rand() / 2

只是

call        rand  
shr         eax,1 

在这种情况下,一次转变就达到了目的,从而为我们提供了一个天文数字更快的代码(仅针对除法)。


另一方面,当您混合整数算术和 FPU 浮点算术时,有符号整数类型可能工作得更快,因为 FPU 指令集包含用于加载/存储有符号整数值的立即指令,但没有用于无符号整数值的指令。

为了说明这一点,可以使用以下简单函数

double zero() { return rand(); }

生成的代码通常非常简单

call        rand 
mov         dword ptr [esp],eax 
fild        dword ptr [esp]

但是如果我们将函数更改为

double zero() { return (unsigned) rand(); }

生成的代码将更改为此

call        rand
test        eax,eax 
mov         dword ptr [esp],eax 
fild        dword ptr [esp] 
jge         zero+17h 
fadd        qword ptr [__real@41f0000000000000 (4020F8h)] 

代码明显更大,因为 FPU 指令集不适用于无符号整数类型,因此加载无符号值后需要进行额外的调整(这就是条件 fadd 的作用)。


还有其他上下文和示例可用于证明它可以以任何方式工作。所以,再说一次,这一切都取决于。但一般来说,从程序性能的整体情况来看,所有这些都不重要。我通常更喜欢使用无符号类型来表示无符号数量。在我的代码中,99% 的整数类型都是无符号的。但我这样做纯粹是出于概念上的原因,而不是为了任何性能提升。

It depends. It might go either way, depending on what you are doing with that int as well as on the properties of the underlying hardware.


An obvious example in unsigned ints favor would be the integer division operation. In C/C++ integer division is supposed to round towards zero, while machine integer division on x86 rounds towards negative infinity. Also, various "optimized" replacements for integer division (shifts, etc.) also generally round towards negative infinity. So, in order to satisfy standard requirements the compiler are forced to adjust the signed integer division results with additional machine instructions. In case of unsigned integer division this problem does not arise, which is why generally integer division works much faster for unsigned types than for signed types.

For example, consider this simple expression

rand() / 2

The code generated for this expression by MSVC complier will generally look as follows

call        rand
cdq              
sub         eax,edx 
sar         eax,1 

Note that instead of a single shift instruction (sar) we are seeing a whole bunch of instructions here, i.e our sar is preceded by two extra instructions (cdq and sub). These extra instructions are there just to "adjust" the division in order to force it to generate the "correct" (from C language point of view) result. Note, that the compiler does not know that your value will always be positive, so it has to generate these instructions always, unconditionally. They will never do anything useful, thus wasting the CPU cycles.

Not take a look at the code for

(unsigned) rand() / 2

It is just

call        rand  
shr         eax,1 

In this case a single shift did the trick, thus providing us with an astronomically faster code (for the division alone).


On the other hand, when you are mixing integer arithmetics and FPU floating-point arithmetics, signed integer types might work faster since the FPU instruction set contains immediate instruction for loading/storing signed integer values, but has no instructions for unsigned integer values.

To illustrate this one can use the following simple function

double zero() { return rand(); }

The generated code will generally be very simple

call        rand 
mov         dword ptr [esp],eax 
fild        dword ptr [esp]

But if we change our function to

double zero() { return (unsigned) rand(); }

the generated code will change to

call        rand
test        eax,eax 
mov         dword ptr [esp],eax 
fild        dword ptr [esp] 
jge         zero+17h 
fadd        qword ptr [__real@41f0000000000000 (4020F8h)] 

This code is noticeably larger because the FPU instruction set does not work with unsigned integer types, so the extra adjustments are necessary after loading an unsigned value (which is what that conditional fadd does).


There are other contexts and examples that can be used to demonstrate that it works either way. So, again, it all depends. But generally, all this will not matter in the big picture of your program's performance. I generally prefer to use unsigned types to represent unsigned quantities. In my code 99% of integer types are unsigned. But I do it for purely conceptual reasons, not for any performance gains.

动次打次papapa 2024-10-23 06:48:16

这几乎肯定不会有什么区别,但有时编译器可以使用类型的符号来玩游戏,以减少几个周期,但说实话,总体而言,这可能是一个可以忽略不计的变化。

例如,假设您有一个 int x 并想要编写:

if(x >= 10 && x < 200) { /* ... */ }

您(或者更好的是,编译器)可以对其进行一点转换以减少一次比较:

if((unsigned int)(x - 10) < 190) { /* ... */ }

这是假设 int 以 2 的补码表示,因此如果 (x - 10) 小于 0,则在查看时会变成一个巨大值作为无符号整数。例如,在典型的 x86 系统上,(unsigned int)-1 == 0xffffffff 明显大于正在测试的 190

这充其量只是微优化,最好由编译器负责,相反,您应该编写能够表达您的意思的代码,如果代码太慢,请分析并决定哪些地方确实需要变得聪明。

It will almost certainly make no difference, but occasionally the compiler can play games with the signedness of types in order to shave a couple of cycles, but to be honest it probably is a negligible change overall.

For example suppose you have an int x and want to write:

if(x >= 10 && x < 200) { /* ... */ }

You (or better yet, the compiler) can transform this a little to do one less comparison:

if((unsigned int)(x - 10) < 190) { /* ... */ }

This is making an assumption that int is represented in 2's compliment, so that if (x - 10) is less that 0 is becomes a huge value when viewed as an unsigned int. For example, on a typical x86 system, (unsigned int)-1 == 0xffffffff which is clearly bigger than the 190 being tested.

This is micro-optimization at best and best left up the compiler, instead you should write code that expresses what you mean and if it is too slow, profile and decide where it really is necessary to get clever.

秋心╮凉 2024-10-23 06:48:16

我不认为它会对 CPU 或编译器产生太大影响。一种可能的情况是,它使编译器能够知道该数字永远不会为负数并优化代码。

然而,它对于阅读代码的人很有用,因此他们知道相关变量的域。

I don't imagine it would make much difference in terms of CPU or the compiler. One possible case would be if it enabled the compiler to know that the number would never be negative and optimize away code.

However it IS useful to a human reading your code so they know the domain of the variable in question.

野心澎湃 2024-10-23 06:48:16

从 ALU 的角度来看,添加(或其他)有符号或无符号值没有任何区别,因为它们都由一组位表示。 0100 + 1011 始终为 1111,但可以选择是否为 4 + (-5) = -14 + 11 = 15
所以我同意@Mark,你应该选择最好的数据类型来帮助其他人理解你的代码。

From the ALU's point of view adding (or whatever) signed or unsigned values doesn't make any difference, since they're both represented by a group of bit. 0100 + 1011 is always 1111, but you choose if that is 4 + (-5) = -1 or 4 + 11 = 15.
So I agree with @Mark, you should choose the best data-type to help others understand your code.

不及他 2024-10-23 06:48:16

在大多数情况下,有符号类型本质上更容易优化,因为编译器可以忽略溢出的可能性,并以它认为合适的任何方式简化/重新排列算术。另一方面,无符号类型本质上更安全,因为结果总是明确定义的(即使不是您天真的认为应该是的)。

无符号类型更好优化的一种情况是当您编写 2 的幂除法/余数时。对于无符号类型,这直接转换为位移位和按位与。对于有符号类型,除非编译器可以确定该值已知为正数,否则它必须生成额外的代码来补偿负数的差一问题(根据 C,-3/2 为 -1,而代数和按位运算为-2)。

Signed types are inherently more optimizable in most cases because the compiler can ignore the possibility of overflow and simplify/rearrange arithmetic in whatever ways it sees fit. On the other hand, unsigned types are inherently safer because the result is always well-defined (even if not to what you naively think it should be).

The one case where unsigned types are better optimizable is when you're writing division/remainder by a power of two. For unsigned types this translates directly to bitshift and bitwise and. For signed types, unless the compiler can establish that the value is known to be positive, it must generate extra code to compensate for the off-by-one issue with negative numbers (according to C, -3/2 is -1, whereas algebraically and by bitwise operations it's -2).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文