如何“位和面具等于面具”可以优化吗?
如何优化“位和面具等于掩码”?
示例:
bool foo(uint64_t x)
{
return (x & 0x7ff0000000000000) == 0x7ff0000000000000;
}
导致(ARM 32位):
gcc 12.1 (linux) -O3:
f:
movs r3, #0
movt r3, 32752
bics r3, r3, r1
ite eq
moveq r0, #1
movne r0, #0
bx lr
armv7-a clang 11.0.1 -O3:
f:
mov r0, #267386880
orr r0, r0, #1879048192
bic r0, r0, r1
rsbs r1, r0, #0
adc r0, r0, r1
bx lr
上面的C代码可以以产生更快的ASM代码的方式重写吗?
也许有相关的 bit twidddling hacks ?还是他们的组合?还是类似?
How "bitwise AND mask equals mask" can be optimized?
Example:
bool foo(uint64_t x)
{
return (x & 0x7ff0000000000000) == 0x7ff0000000000000;
}
leads to (ARM 32-bit):
gcc 12.1 (linux) -O3:
f:
movs r3, #0
movt r3, 32752
bics r3, r3, r1
ite eq
moveq r0, #1
movne r0, #0
bx lr
armv7-a clang 11.0.1 -O3:
f:
mov r0, #267386880
orr r0, r0, #1879048192
bic r0, r0, r1
rsbs r1, r0, #0
adc r0, r0, r1
bx lr
Can the C code above be rewritten in such a way that a faster ASM code is produced?
Perhaps there are relevant bit twiddling hacks? Or their combinations? Or similar?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一种选择是,
哪个与GCC一起编译到
此处的保存中,主要来自不必转换为0/1结果,而是直接生成1位。如果将此功能夹住并将结果用于分支,则无用,实际上可能导致代码较慢。
One option is
which compiles with gcc to
The saving here mostly comes from not having to convert to a 0/1 result but generating an 1 bit directly. If this function is inlined and the result is used for a branch, this is not helpful and might actually result in slower code.
在clang上,代码已经和它所一样好:
尤其是第一个版本非常聪明:清除
0x7ff00000 ... 0
仅当设置源寄存器中的所有11位时才产生零。第二版是我希望生成诸如
Carry的代码的内容。但这将与
bic
方法相提并论 - 当mov x8,常数
可以重复使用时,本质上只是两个说明。在ARM64上,具有大量谓语操作,将结果CF,ZF或任何其他状态登记册中的结果没有区别。
On clang the code is already as good as it gets:
Especially the first version is quite clever: clearing the bits of
0x7ff00000...0
produces zero only if all the 11 bits in the source register are set.The second version was something I hoped to generate code like
for carry. But this would be on par with the
bic
method -- being essentially just two instructions when themov x8, constant
could be reused.On Arm64 with plenty of predicated operations it would make no difference to have the result in CF, ZF or any other status register.