当前位置：文江博客话题详情

编译器对标记 int unsigned 进行优化？

发布于 2024-10-16 06:48:16 字数 82 浏览 7 评论 0原文

对于永远不会取 -ve 值的整数，可以是 unsigned int 或 int。从编译器角度或纯粹的CPU周期角度来看，x86_64 有什么区别吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

坐在坟头思考人生 2024-10-23 06:48:16

这取决于。它可能会采用任何一种方式，具体取决于您使用该 int 执行的操作以及底层硬件的属性。

无符号整型的一个明显例子是整数除法运算。在 C/C++ 中，整数除法应该向零舍入，而 x86 上的机器整数除法则向负无穷大舍入。此外，整数除法的各种“优化”替换（移位等）通常也会向负无穷大舍入。因此，为了满足标准要求，编译器被迫使用额外的机器指令来调整有符号整数除法结果。在无符号整数除法的情况下，不会出现此问题，这就是为什么通常整数除法对于无符号类型比对于有符号类型要快得多。

例如，考虑这个简单的表达式

rand() / 2

MSVC 编译器为此表达式生成的代码通常如下所示

call        rand
cdq              
sub         eax,edx 
sar         eax,1

请注意，我们在这里看到的不是单个移位指令 (sar)，而是一大堆指令，即我们的 sar 前面有两个额外的指令（cdq 和 sub）。这些额外的指令只是为了“调整”除法，以强制它生成“正确”（从 C 语言的角度来看）结果。请注意，编译器不知道您的值将始终为正，因此它必须始终无条件地生成这些指令。他们永远不会做任何有用的事情，从而浪费 CPU 周期。

不看一下代码，

(unsigned) rand() / 2

只是

call        rand  
shr         eax,1

在这种情况下，一次转变就达到了目的，从而为我们提供了一个天文数字更快的代码（仅针对除法）。

另一方面，当您混合整数算术和 FPU 浮点算术时，有符号整数类型可能工作得更快，因为 FPU 指令集包含用于加载/存储有符号整数值的立即指令，但没有用于无符号整数值的指令。

为了说明这一点，可以使用以下简单函数

double zero() { return rand(); }

生成的代码通常非常简单

call        rand 
mov         dword ptr [esp],eax 
fild        dword ptr [esp]

但是如果我们将函数更改为

double zero() { return (unsigned) rand(); }

生成的代码将更改为此

call        rand
test        eax,eax 
mov         dword ptr [esp],eax 
fild        dword ptr [esp] 
jge         zero+17h 
fadd        qword ptr [__real@41f0000000000000 (4020F8h)]

代码明显更大，因为 FPU 指令集不适用于无符号整数类型，因此加载无符号值后需要进行额外的调整（这就是条件 fadd 的作用）。

还有其他上下文和示例可用于证明它可以以任何方式工作。所以，再说一次，这一切都取决于。但一般来说，从程序性能的整体情况来看，所有这些都不重要。我通常更喜欢使用无符号类型来表示无符号数量。在我的代码中，99% 的整数类型都是无符号的。但我这样做纯粹是出于概念上的原因，而不是为了任何性能提升。

It depends. It might go either way, depending on what you are doing with that int as well as on the properties of the underlying hardware.

An obvious example in unsigned ints favor would be the integer division operation. In C/C++ integer division is supposed to round towards zero, while machine integer division on x86 rounds towards negative infinity. Also, various "optimized" replacements for integer division (shifts, etc.) also generally round towards negative infinity. So, in order to satisfy standard requirements the compiler are forced to adjust the signed integer division results with additional machine instructions. In case of unsigned integer division this problem does not arise, which is why generally integer division works much faster for unsigned types than for signed types.

For example, consider this simple expression

rand() / 2

The code generated for this expression by MSVC complier will generally look as follows

call        rand
cdq              
sub         eax,edx 
sar         eax,1

Note that instead of a single shift instruction (sar) we are seeing a whole bunch of instructions here, i.e our sar is preceded by two extra instructions (cdq and sub). These extra instructions are there just to "adjust" the division in order to force it to generate the "correct" (from C language point of view) result. Note, that the compiler does not know that your value will always be positive, so it has to generate these instructions always, unconditionally. They will never do anything useful, thus wasting the CPU cycles.

Not take a look at the code for

(unsigned) rand() / 2

It is just

call        rand  
shr         eax,1

In this case a single shift did the trick, thus providing us with an astronomically faster code (for the division alone).

On the other hand, when you are mixing integer arithmetics and FPU floating-point arithmetics, signed integer types might work faster since the FPU instruction set contains immediate instruction for loading/storing signed integer values, but has no instructions for unsigned integer values.

To illustrate this one can use the following simple function

double zero() { return rand(); }

The generated code will generally be very simple

call        rand 
mov         dword ptr [esp],eax 
fild        dword ptr [esp]

But if we change our function to

double zero() { return (unsigned) rand(); }

the generated code will change to

call        rand
test        eax,eax 
mov         dword ptr [esp],eax 
fild        dword ptr [esp] 
jge         zero+17h 
fadd        qword ptr [__real@41f0000000000000 (4020F8h)]

This code is noticeably larger because the FPU instruction set does not work with unsigned integer types, so the extra adjustments are necessary after loading an unsigned value (which is what that conditional fadd does).

There are other contexts and examples that can be used to demonstrate that it works either way. So, again, it all depends. But generally, all this will not matter in the big picture of your program's performance. I generally prefer to use unsigned types to represent unsigned quantities. In my code 99% of integer types are unsigned. But I do it for purely conceptual reasons, not for any performance gains.

回复收藏 0 原文

动次打次papapa 2024-10-23 06:48:16

这几乎肯定不会有什么区别，但有时编译器可以使用类型的符号来玩游戏，以减少几个周期，但说实话，总体而言，这可能是一个可以忽略不计的变化。

例如，假设您有一个 int x 并想要编写：

if(x >= 10 && x < 200) { /* ... */ }

您（或者更好的是，编译器）可以对其进行一点转换以减少一次比较：

if((unsigned int)(x - 10) < 190) { /* ... */ }

这是假设 int 以 2 的补码表示，因此如果 (x - 10) 小于 0，则在查看时会变成一个巨大值作为无符号整数。例如，在典型的 x86 系统上，(unsigned int)-1 == 0xffffffff 明显大于正在测试的 190。

这充其量只是微优化，最好由编译器负责，相反，您应该编写能够表达您的意思的代码，如果代码太慢，请分析并决定哪些地方确实需要变得聪明。

It will almost certainly make no difference, but occasionally the compiler can play games with the signedness of types in order to shave a couple of cycles, but to be honest it probably is a negligible change overall.

For example suppose you have an int x and want to write:

if(x >= 10 && x < 200) { /* ... */ }

You (or better yet, the compiler) can transform this a little to do one less comparison:

if((unsigned int)(x - 10) < 190) { /* ... */ }

This is making an assumption that int is represented in 2's compliment, so that if (x - 10) is less that 0 is becomes a huge value when viewed as an unsigned int. For example, on a typical x86 system, (unsigned int)-1 == 0xffffffff which is clearly bigger than the 190 being tested.

This is micro-optimization at best and best left up the compiler, instead you should write code that expresses what you mean and if it is too slow, profile and decide where it really is necessary to get clever.

回复收藏 0 原文