i=(i+1)&3 比 i=(i+1)%4 快吗
我正在优化 C++ 代码。 在一个关键步骤中,我想实现以下函数y=f(x)
:
f(0)=1
f(1)=2
f(2)=3
f(3)=0
哪个更快?使用查找表或 i=(i+1)&3
或 i=(i+1)%4
?或者有更好的建议吗?
I am optimizing a c++ code.
at one critical step, I want to implement the following function y=f(x)
:
f(0)=1
f(1)=2
f(2)=3
f(3)=0
which one is faster ? using a lookup table or i=(i+1)&3
or i=(i+1)%4
? or any better suggestion?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
几乎可以肯定,查找表将是最慢的。在很多情况下,编译器将为
(i+1)&3
和(i+1)%4
生成相同的程序集;然而,根据 i 的类型/符号,它们可能并不严格等效,并且编译器将无法进行优化。例如,对于我系统上的代码,
gcc -O2
生成:因此,正如您所看到的,由于有关带符号模数结果的规则,
(i+1)%4
生成首先有更多的代码。最重要的是,如果
(i+1)&3
版本表达了您想要的内容,您可能最好使用它,因为编译器执行您不期望的操作的机会较小。Almost certainly the lookup table is going to be slowest. In a lot of cases, the compiler will generate the same assembly for
(i+1)&3
and(i+1)%4
; however depending on the type/signedness of i, they may not be strictly equivalent and the compiler won't be able to make that optimization. For example for the codeon my system,
gcc -O2
generates:so as you can see because of the rules about signed modulus results,
(i+1)%4
generates a lot more code in the first place.Bottom line, you're probably best off using the
(i+1)&3
version if that expresses what you want, because there's less chance for the compiler to do something you don't expect.我不会讨论过早优化。但答案是它们的速度相同。
任何理智的编译器都会将它们编译成相同的东西。无论如何,除以 2 的幂的除法/模数将被优化为按位运算。
因此,请使用您发现的(或其他人会发现的)更具可读性的内容。
编辑:正如罗兰所指出的,它有时会根据符号的不同而表现不同:
无符号&:
无符号模数:
有符号&:
有符号模数:
I won't get into the discussion of premature optimization. But the answer is that they will be the same speed.
Any sane compiler will compile them to the same thing. Division/modulus by a power of two will be optimized to bitwise operations anyway.
So use whichever you find (or others will find) to be more readable.
EDIT : As Roland has pointed out, it does sometimes behave different depending on the signness:
Unsigned &:
Unsigned Modulus:
Signed &:
Signed Modulus:
很有可能,您不会发现任何差异:任何相当现代的编译器都知道将两者优化为相同的代码。
Good chances are, you wouldn't find any differences: any reasonably modern compiler knows to optimize both into the same code.
您尝试过对其进行基准测试吗?作为一个临时猜测,我假设
&3
版本会更快,因为这是一个简单的加法和按位 AND 运算,这两者都应该是任何现代 ish 上的单周期运算中央处理器。%4
可以采用几种不同的方式,具体取决于编译器的智能程度。它可以通过除法来完成,这比加法慢得多,或者它也可以转换为按位and
运算,最终与&3
一样快代码>版本。Have you tried benchmarking it? As an offhand gues, I'll assume that the
&3
version will be faster, as that's a simple addition and bitwise AND operation, both of which should be single-cycle operations on any modern-ish CPU.The
%4
could go a few different ways, depending on how smart the compiler is. it could be done via division, which is much slower than addition, or it could be translated into a bitwiseand
operation as well and end up being just as fast as the&3
version.与 Mystical 相同,但 C 和 ARM
创建:
对于负数,掩码和模数不相等,仅对于正数/无符号数。对于这些情况,您的编译器应该知道 %4 与 &3 相同,并在 (&3) 上使用较便宜的(与上面的 gcc 一样)。下面是 clang/llc
same as Mystical but C and ARM
creates:
For negative numbers the mask and the modulo are not equivalent, only for positive/unsigned numbers. For those cases your compiler should know that %4 is the same as &3 and use the less expensive on (&3) as gcc above. clang/llc below
当然&比 % 更快。以前的许多帖子都证明了这一点。另外,由于 i 是局部变量,因此您可以使用 ++i 而不是 i+1,因为大多数编译器都可以更好地实现它。 i+1 可能(不)被优化为 ++i。
更新:也许我不清楚,我的意思是,该函数应该只是“return((++i)&3);”
Ofcourse & is faster then %. Which is proven by many previous posts. Also as i is local variable, u can use ++i instead of i+1, as it is better implemented by most of the compilers. i+1 may(not) be optimized as ++i.
UPDATE: Perhaps i was not clear, i meant, the function should just "return((++i)&3);"