无符号整数与有符号整数的性能

发布于 2024-10-12 09:02:02 字数 63 浏览 5 评论 0 原文

使用无符号整数相对于有符号整数是否会带来性能增益/损失?

如果是这样,这是否也适用于短期和长期?

Is there any performance gain/loss by using unsigned integers over signed integers?

If so, does this goes for short and long as well?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

当爱已成负担 2024-10-19 09:02:02

使用unsigned int除以2的幂会更快,因为它可以优化为单个移位指令。对于signed int,它通常需要更多的机器指令,因为除法向零舍入,但向右移位向下舍入。示例:

int foo(int x, unsigned y)
{
    x /= 8;
    y /= 8;
    return x + y;
}

这是相关的 x 部分(有符号除法):

movl 8(%ebp), %eax
leal 7(%eax), %edx
testl %eax, %eax
cmovs %edx, %eax
sarl $3, %eax

这是相关的 y 部分(无符号除法):

movl 12(%ebp), %edx
shrl $3, %edx

Division by powers of 2 is faster with unsigned int, because it can be optimized into a single shift instruction. With signed int, it usually requires more machine instructions, because division rounds towards zero, but shifting to the right rounds down. Example:

int foo(int x, unsigned y)
{
    x /= 8;
    y /= 8;
    return x + y;
}

Here is the relevant x part (signed division):

movl 8(%ebp), %eax
leal 7(%eax), %edx
testl %eax, %eax
cmovs %edx, %eax
sarl $3, %eax

And here is the relevant y part (unsigned division):

movl 12(%ebp), %edx
shrl $3, %edx
笑饮青盏花 2024-10-19 09:02:02

在 C++(和 C)中,有符号整数溢出是未定义的,而无符号整数溢出则定义为回绕。请注意,例如在 gcc 中,您可以使用 -fwrapv 标志来定义有符号溢出(以环绕)。

未定义的有符号整数溢出允许编译器假设不会发生溢出,这可能会带来优化机会。请参阅此博文进行讨论。

In C++ (and C), signed integer overflow is undefined, whereas unsigned integer overflow is defined to wrap around. Notice that e.g. in gcc, you can use the -fwrapv flag to make signed overflow defined (to wrap around).

Undefined signed integer overflow allows the compiler to assume that overflows don't happen, which may introduce optimization opportunities. See e.g. this blog post for discussion.

仙女 2024-10-19 09:02:02

unsigned 的性能与 signed 相同或更好。
一些例子:

  • 除以一个常数,它是 2 的幂(另请参阅 FredOverflow 的答案)
  • 除以一个常数(例如,我的编译器使用 2 个无符号的 asm 指令实现除以 13,和 6 个有符号指令)
  • 检查数字是否为偶数(我不知道为什么我的 MS Visual Studio 编译器用 4 个有符号数字指令来实现它;gcc 用 1 个指令来实现它,就像在unsigned 情况)

short 通常会导致与 int 相同或更差的性能(假设 sizeof(short) )。当您将算术运算的结果(通常是 int,而不是 short)分配给 short 类型的变量时,会发生性能下降,该变量存储在处理器的寄存器中(也是int类型)。所有从 shortint 的转换都需要时间并且很烦人。

注意:一些DSP具有signed Short类型的快速乘法指令;在这种特定情况下,shortint 更快。

至于intlong之间的区别,我只能猜测(我不熟悉64位架构)。当然,如果intlong具有相同的大小(在32位平台上),它们的性能也是相同的。


一些人指出了一个非常重要的补充:

对于大多数应用程序来说真正重要的是内存占用和使用的带宽。对于大型数组,您应该使用最小的必要整数(short,甚至可能是signed/unsigned char)。

这将提供更好的性能,但增益是非线性的(即不是 2 或 4 倍)并且有些不可预测 - 它取决于缓存大小以及应用程序中计算和内存传输之间的关系。

unsigned leads to the same or better performance than signed.
Some examples:

  • Division by a constant which is a power of 2 (see also the answer from FredOverflow)
  • Division by a constant number (for example, my compiler implements division by 13 using 2 asm instructions for unsigned, and 6 instructions for signed)
  • Checking whether a number is even (i have no idea why my MS Visual Studio compiler implements it with 4 instructions for signed numbers; gcc does it with 1 instruction, just like in the unsigned case)

short usually leads to the same or worse performance than int (assuming sizeof(short) < sizeof(int)). Performance degradation happens when you assign a result of an arithmetic operation (which is usually int, never short) to a variable of type short, which is stored in the processor's register (which is also of type int). All the conversions from short to int take time and are annoying.

Note: some DSPs have fast multiplication instructions for the signed short type; in this specific case short is faster than int.

As for the difference between int and long, i can only guess (i am not familiar with 64-bit architectures). Of course, if int and long have the same size (on 32-bit platforms), their performance is also the same.


A very important addition, pointed out by several people:

What really matters for most applications is the memory footprint and utilized bandwidth. You should use the smallest necessary integers (short, maybe even signed/unsigned char) for large arrays.

This will give better performance, but the gain is nonlinear (i.e. not by a factor of 2 or 4) and somewhat unpredictable - it depends on cache size and the relationship between calculations and memory transfers in your application.

灵芸 2024-10-19 09:02:02

这将取决于具体的实施。但在大多数情况下,不会有任何区别。如果你真的很关心,你必须尝试你考虑的所有变体并衡量性能。

This will depend on exact implementation. In most cases there will be no difference however. If you really care you have to try all the variants you consider and measure performance.

权谋诡计 2024-10-19 09:02:02

这很大程度上取决于特定的处理器。

在大多数处理器上,都有用于有符号和无符号算术的指令,因此使用有符号整数和无符号整数之间的区别取决于编译器使用哪一种。

如果两者中的任何一个更快,那么它完全是特定于处理器的,并且很可能差异很小(如果存在的话)。

This is pretty much dependent on the specific processor.

On most processors, there are instructions for both signed and unsigned arithmetic, so the difference between using signed and unsigned integers comes down to which one the compiler uses.

If any of the two is faster, it's completely processor specific, and most likely the difference is miniscule, if it exists at all.

∞梦里开花 2024-10-19 09:02:02

有符号整数和无符号整数之间的性能差异实际上比接受答案所暗示的更为普遍。无符号整数除以任何常量比有符号整数除以常量更快,无论该常量是否为 2 的幂。请参阅http://ridiculousfish.com/blog/posts/labor- of-division-episode-iii.html

在帖子的末尾,他包含以下部分:

一个自然的问题是相同的优化是否可以改善有符号除法;不幸的是,事实似乎并非如此,原因有二:

股息的增量必须变成幅度的增量,即如果n>1则增量。 0,如果 n < 则递减0. 这会带来额外的费用。

不合作除数的惩罚只有有符号除法的一半左右,留下了更小的改进窗口。

因此,向下舍入算法似乎可以在有符号除法中工作,但其性能会低于标准向上舍入算法。

The performance difference between signed and unsigned integers is actually more general than the acceptance answer suggests. Division of an unsigned integer by any constant can be made faster than division of a signed integer by a constant, regardless of whether the constant is a power of two. See http://ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html

At the end of his post, he includes the following section:

A natural question is whether the same optimization could improve signed division; unfortunately it appears that it does not, for two reasons:

The increment of the dividend must become an increase in the magnitude, i.e. increment if n > 0, decrement if n < 0. This introduces an additional expense.

The penalty for an uncooperative divisor is only about half as much in signed division, leaving a smaller window for improvements.

Thus it appears that the round-down algorithm could be made to work in signed division, but will underperform the standard round-up algorithm.

第几種人 2024-10-19 09:02:02

使用无符号类型不仅除以 2 的幂更快,使用无符号类型除以任何其他值也更快。如果您查看Agner Fog 的指令表,您会看到无符号除法 (DIV)与签名版本 (IDIV) 具有相似或更好的性能

例如,使用 AMD K7

指令 操作数 操作 延迟 倒数 吞吐量
DIV r8/m8 32 24 23
DIV r16/m16 47 24 23
DIV r32/m32 79 40 40
IDIV r8 41 17 17
IDIV r16 56 25 25
IDIV r32 88 41 41
IDIV m8 42 17 17
IDIV m16 57 25 25
IDIV m32 89 41 41

同样的情况也适用于 Intel Pentium

指令 操作 数 时钟周期
DIV r8/m8 17
DIV r16/m16 25
DIV r32/m32 41
IDIV r8/m8 22
IDIV r16/m16 30
IDIV r32/m32 46

当然,这些都是相当古老的。具有更多晶体管的新型架构可能会缩小差距,但基本的事情是适用的:通常您需要更多的微操作、更多的逻辑、更多的芯片面积、更多的延迟来进行有符号除法

Not only division by powers of 2 are faster with unsigned type, division by any other values are also faster with unsigned type. If you look at Agner Fog's Instruction tables you'll see that unsigned divisions (DIV) have similar or better performance than signed versions (IDIV)

For example with the AMD K7

Instruction Operands Ops Latency Reciprocal throughput
DIV r8/m8 32 24 23
DIV r16/m16 47 24 23
DIV r32/m32 79 40 40
IDIV r8 41 17 17
IDIV r16 56 25 25
IDIV r32 88 41 41
IDIV m8 42 17 17
IDIV m16 57 25 25
IDIV m32 89 41 41

The same thing applies to Intel Pentium

Instruction Operands Clock cycles
DIV r8/m8 17
DIV r16/m16 25
DIV r32/m32 41
IDIV r8/m8 22
IDIV r16/m16 30
IDIV r32/m32 46

Of course those are quite ancient. Newer architectures with more transistors might close the gap but the basic things apply: generally you need more micro-ops, more logic, more die area, more latency to do a signed division

一片旧的回忆 2024-10-19 09:02:02

简而言之,在事实发生之前不要打扰。但之后就麻烦了。

如果您想要获得性能,您必须使用编译器的性能优化,这可能违反常识。要记住的一件事是,不同的编译器可以以不同的方式编译代码,并且它们本身具有不同类型的优化。如果我们谈论的是 g++ 编译器,并谈论通过使用 -Ofast 或至少使用 -O3 标志来最大化其优化级别,根据我的经验,它可以将 long 类型编译为代码,其性能比任何 unsigned 类型甚至只是 int 更好。

这是我自己的经验,我建议您首先编写完整的程序,然后只有在您手上有实际代码并且可以通过优化对其进行编译以尝试选择实际执行的类型时才关心这些事情最好的。这也是关于代码性能优化的一个很好的非常普遍的建议,首先快速编写,尝试使用优化进行编译,调整一些东西看看什么最有效。您还应该尝试使用不同的编译器来编译您的程序,并选择输出性能最高的机器代码的编译器。

优化的多线程线性代数计算程序可以轻松实现>10倍的性能差异精细优化与未优化。所以这很重要。

在很多情况下,优化器的输出与逻辑相矛盾。例如,我遇到过一个情况,a[x]+=ba[x]=b 之间的差异使程序执行时间几乎增加了 2 倍。不,a[x]=b 并不是更快的。

例如,NVidia 声明用于对其 GPU 进行编程:

注意:正如推荐的最佳实践一样,签名算术
应尽可能优先于无符号算术
SMM 上的最佳吞吐量。 C语言标准放置了更多
对无符号数学溢出行为的限制,限制编译器
优化机会。

In short, don't bother before the fact. But do bother after.

If you want to have performance you have to use performance optimizations of a compiler which may work against common sense. One thing to remember is that different compilers can compile code differently and they themselves have different sorts of optimizations. If we're talking about a g++ compiler and talking about maxing out it's optimization level by using -Ofast, or at least an -O3 flag, in my experience it can compile long type into code with even better performance than any unsigned type, or even just int.

This is from my own experience and I recommend you to first write your full program and care about such things only after that, when you have your actual code on your hands and you can compile it with optimizations to try and pick the types that actually perform best. This is also a good very general suggestion about code optimization for performance, write quickly first, try compiling with optimizations, tweak things to see what works best. And you should also try using different compilers to compile your program and choosing the one that outputs the most performant machine code.

An optimized multi-threaded linear algebra calculation program can easily have a >10x performance difference finely optimized vs unoptimized. So this does matter.

Optimizer output contradicts logic in plenty of cases. For example, I had a case when a difference between a[x]+=b and a[x]=b changed program execution time almost 2x. And no, a[x]=b wasn't the faster one.

Here's for example NVidia stating that for programming their GPUs:

Note: As was already the recommended best practice, signed arithmetic
should be preferred over unsigned arithmetic wherever possible for
best throughput on SMM. The C language standard places more
restrictions on overflow behavior for unsigned math, limiting compiler
optimization opportunities.

清醇 2024-10-19 09:02:02

传统上,int 是目标硬件平台的本机整数格式。任何其他整数类型都可能会导致性能损失。

编辑:

现代系统上的情况略有不同:

  • 出于兼容性原因,

    int 实际上在 64 位系统上可能是 32 位。我相信这种情况会在 Windows 系统上发生。

  • 在某些情况下,现代编译器在执行较短类型的计算时可能会隐式使用 int

Traditionally int is the native integer format of the target hardware platform. Any other integer type may incur performance penalties.

EDIT:

Things are slightly different on modern systems:

  • int may in fact be 32-bit on 64-bit systems for compatibility reasons. I believe this happens on Windows systems.

  • Modern compilers may implicitly use int when performing computations for shorter types in some cases.

み青杉依旧 2024-10-19 09:02:02

IIRC,在 x86 上签名/未签名应该没有任何区别。另一方面,短/长是一个不同的故事,因为对于长来说,必须移入/移出 RAM 的数据量更大(其他原因可能包括转换操作,例如将短扩展到长)。

IIRC, on x86 signed/unsigned shouldn't make any difference. Short/long, on the other hand, is a different story, since the amount of data that has to be moved to/from RAM is bigger for longs (other reasons may include cast operations like extending a short to long).

骄傲 2024-10-19 09:02:02

有符号和无符号整数始终都作为单时钟指令运行,并具有相同的读写性能,但根据 Dr Andrei Alexandrescu 未签名优于签名。这样做的原因是,您可以在相同位数中容纳两倍数量的数字,因为您不会浪费符号位,并且您将使用更少的指令来检查负数,从而通过减少 ROM 来提高性能。根据我使用 Kabuki VM 的经验,它具有超高性能Script 实现,当您实际需要签名号码时,很少见与记忆一起工作。我花了很多年的时间对有符号和无符号的数字进行指针算术,我发现当不需要符号位时,有符号没有任何好处。

当使用位移位来执行 2 的幂的乘法和除法时,有符号可能是首选,因为您可以使用带符号的 2 的补码整数执行 2 的负幂除法。请观看更多来自 Andrei 的 YouTube 视频,了解更多优化技巧。您还可以在我的文章中找到一些关于 世界上最快的整数到字符串转换算法

Signed and unsigned integers will always both operate as single clock instructions and have the same read-write performance but according to Dr Andrei Alexandrescu unsigned is preferred over signed. The reason for this is you can fit twice the amount of numbers in the same number of bits because you're not wasting the sign bit and you will use fewer instructions checking for negative numbers yielding performance increases from the decreased ROM. In my experience with the Kabuki VM, which features an ultra-high-performance Script Implementation, it is rare that you actually require a signed number when working with memory. I've spend may years doing pointer arithmetic with signed and unsigned numbers and I've found no benefit to the signed when no sign bit is needed.

Where signed may be preferred is when using bit shifting to perform multiplication and division of powers of 2 because you may perform negative powers of 2 division with signed 2's complement integers. Please see some more YouTube videos from Andrei for more optimization techniques. You can also find some good info in my article about the the world's fastest Integer-to-String conversion algorithm.

小猫一只 2024-10-19 09:02:02

无符号整数的优点在于,您可以将两者存储和视为比特流,我的意思是只是一个没有符号的数据,因此通过位移操作,乘法、除法变得更容易(更快)

Unsigned integer is advantageous in that you store and treat both as bitstream, I mean just a data, without sign, so multiplication, devision becomes easier (faster) with bit-shift operations

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文