如何“位和面具等于面具”可以优化吗?

发布于 2025-02-11 03:45:38 字数 776 浏览 3 评论 0原文

如何优化“位和面具等于掩码”?

示例:

bool foo(uint64_t x)
{
      return (x & 0x7ff0000000000000) == 0x7ff0000000000000;
}

导致(ARM 32位):

gcc 12.1 (linux) -O3:
f:
        movs    r3, #0
        movt    r3, 32752
        bics    r3, r3, r1
        ite     eq
        moveq   r0, #1
        movne   r0, #0
        bx      lr

armv7-a clang 11.0.1 -O3:
f:
        mov     r0, #267386880
        orr     r0, r0, #1879048192
        bic     r0, r0, r1
        rsbs    r1, r0, #0
        adc     r0, r0, r1
        bx      lr

上面的C代码可以以产生更快的ASM代码的方式重写吗?

也许有相关的 bit twidddling hacks ?还是他们的组合?还是类似?

How "bitwise AND mask equals mask" can be optimized?

Example:

bool foo(uint64_t x)
{
      return (x & 0x7ff0000000000000) == 0x7ff0000000000000;
}

leads to (ARM 32-bit):

gcc 12.1 (linux) -O3:
f:
        movs    r3, #0
        movt    r3, 32752
        bics    r3, r3, r1
        ite     eq
        moveq   r0, #1
        movne   r0, #0
        bx      lr

armv7-a clang 11.0.1 -O3:
f:
        mov     r0, #267386880
        orr     r0, r0, #1879048192
        bic     r0, r0, r1
        rsbs    r1, r0, #0
        adc     r0, r0, r1
        bx      lr

Can the C code above be rewritten in such a way that a faster ASM code is produced?

Perhaps there are relevant bit twiddling hacks? Or their combinations? Or similar?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

慕巷 2025-02-18 03:45:38

一种选择是,

bool foo4(uint64_t x)
{
    return (((x << 1) >> 53) + 1) >> 11;
}

哪个与GCC一起编译到

foo:
        ubfx    r0, r1, #20, #11
        adds    r0, r0, #1
        ubfx    r0, r0, #11, #1
        bx      lr

此处的保存中,主要来自不必转换为0/1结果,而是直接生成1位。如果将此功能夹住并将结果用于分支,则无用,实际上可能导致代码较慢。

One option is

bool foo4(uint64_t x)
{
    return (((x << 1) >> 53) + 1) >> 11;
}

which compiles with gcc to

foo:
        ubfx    r0, r1, #20, #11
        adds    r0, r0, #1
        ubfx    r0, r0, #11, #1
        bx      lr

The saving here mostly comes from not having to convert to a 0/1 result but generating an 1 bit directly. If this function is inlined and the result is used for a branch, this is not helpful and might actually result in slower code.

×纯※雪 2025-02-18 03:45:38

在clang上,代码已经和它所一样好:

bool foo(uint64_t x)
{
      return (x & 0x7ff0000000000000) == 0x7ff0000000000000;
}
        mov     x8, #9218868437227405312
        bics    xzr, x8, x0
        cset    w0, eq

bool foo2(uint64_t x)
{
      // check if x*2 overflows (i.e. produces a carry)
      // by adding one the LSB
      return ((x * 2) + 0x0020000000000000) < (x * 2);
}
        ubfx    x8, x0, #52, #11
        cmp     x8, #2046
        cset    w0, hi

尤其是第一个版本非常聪明:清除0x7ff00000 ... 0仅当设置源寄存器中的所有11位时才产生零。

第二版是我希望生成诸如

    mov x8, #0x0020000000000000
    adds x8, x0, lsr #2
    cset x0, lt 

Carry的代码的内容。但这将与bic方法相提并论 - 当mov x8,常数可以重复使用时,本质上只是两个说明。

在ARM64上,具有大量谓语操作,将结果CF,ZF或任何其他状态登记册中的结果没有区别。

On clang the code is already as good as it gets:

bool foo(uint64_t x)
{
      return (x & 0x7ff0000000000000) == 0x7ff0000000000000;
}
        mov     x8, #9218868437227405312
        bics    xzr, x8, x0
        cset    w0, eq

bool foo2(uint64_t x)
{
      // check if x*2 overflows (i.e. produces a carry)
      // by adding one the LSB
      return ((x * 2) + 0x0020000000000000) < (x * 2);
}
        ubfx    x8, x0, #52, #11
        cmp     x8, #2046
        cset    w0, hi

Especially the first version is quite clever: clearing the bits of 0x7ff00000...0 produces zero only if all the 11 bits in the source register are set.

The second version was something I hoped to generate code like

    mov x8, #0x0020000000000000
    adds x8, x0, lsr #2
    cset x0, lt 

for carry. But this would be on par with the bic method -- being essentially just two instructions when the mov x8, constant could be reused.

On Arm64 with plenty of predicated operations it would make no difference to have the result in CF, ZF or any other status register.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文