为什么64位RISC-V GCC Sign-Extend extend在加载后无符号值?

发布于 2025-01-25 20:30:25 字数 1951 浏览 2 评论 0原文

GCC和Clang为此片段生成不同的代码:

extern volatile unsigned  WATCHDOG;

void reset_watchdog() {
    unsigned t = WATCHDOG;
    WATCHDOG  = t;
}

考虑WatchDog作为内存映射的I/O寄存器。

使用-O3 clang 14生成非常简单的代码时:

reset_watchdog():
        lui     a0, %hi(WATCHDOG)
        lw      a1, %lo(WATCHDOG)(a0)
        sw      a1, %lo(WATCHDOG)(a0)
        ret

但是,GCC 10.0.2插入加载后多余的签名 - 伸展指令:

reset_watchdog():
        lui     a4,%hi(WATCHDOG)
        lw      a5,%lo(WATCHDOG)(a4)
        sext.w  a5,a5
        sw      a5,%lo(WATCHDOG)(a4)
        ret

另请参阅: https://godbolt.org/z/zk1sss3wsz

sext.w/code>在两种方面是多余的:

  1. 已经在rv64i下的目标寄存器(lwu将零扩展到目标),而
  2. sw指令零4个最有意义的内容源寄存器的字节是无关紧要的

,那么为什么GCC生成这样(次优?)代码?


当功能返回原始看门狗值时,GCC变得更加有趣:

extern volatile unsigned  WATCHDOG;

unsigned long reset_watchdog() {
    unsigned long t = WATCHDOG;

    WATCHDOG  = t;
    return t;
}

正如预期的那样,Clang基本上只是用lw指令lwu避免符号扩展:

reset_watchdog():
        lui     a1, %hi(WATCHDOG)
        lwu     a0, %lo(WATCHDOG)(a1)
        sw      a0, %lo(WATCHDOG)(a1)
        ret

何时GCC生成更多复杂的代码:

reset_watchdog():
        lui     a5,%hi(WATCHDOG)
        lw      a0,%lo(WATCHDOG)(a5)
        sext.w  a0,a0
        sw      a0,%lo(WATCHDOG)(a5)
        slli    a0,a0,32
        srli    a0,a0,32
        ret

基本上GCC确实坚持使用签名 - 延长加载的4个字节值,因此必须发出另外的两个指令,以撤消可能的扩展,并清除最重要的4个字节。好像GCC不知道RV64具有lwu指令,而lw已经签名了目标。

参见 https://godbolt.org/z/zw9qtzdzdwh

GCC and Clang generate different code for this snippet:

extern volatile unsigned  WATCHDOG;

void reset_watchdog() {
    unsigned t = WATCHDOG;
    WATCHDOG  = t;
}

Consider WATCHDOG as a memory-mapped I/O register.

When compiling with -O3 Clang 14 generates pretty straight forward code:

reset_watchdog():
        lui     a0, %hi(WATCHDOG)
        lw      a1, %lo(WATCHDOG)(a0)
        sw      a1, %lo(WATCHDOG)(a0)
        ret

However, GCC 10.0.2 inserts a superfluous sign-extend instruction after the load:

reset_watchdog():
        lui     a4,%hi(WATCHDOG)
        lw      a5,%lo(WATCHDOG)(a4)
        sext.w  a5,a5
        sw      a5,%lo(WATCHDOG)(a4)
        ret

See also: https://godbolt.org/z/zK1ss3Wsz

The sext.w is superfluous in two ways:

  1. lw already sign-extends the destination register under RV64I (lwu would zero-extend the destination, instead)
  2. for the sw instruction the content of the 4 most-significant bytes of the source register is irrelevant

So why does GCC generate such (suboptimal?) code?


GCC gets even more interesting when the function returns the original watchdog value:

extern volatile unsigned  WATCHDOG;

unsigned long reset_watchdog() {
    unsigned long t = WATCHDOG;

    WATCHDOG  = t;
    return t;
}

As expected, Clang basically just replaces the lw instruction with lwu to avoid the sign extension:

reset_watchdog():
        lui     a1, %hi(WATCHDOG)
        lwu     a0, %lo(WATCHDOG)(a1)
        sw      a0, %lo(WATCHDOG)(a1)
        ret

While GCC generates more convoluted code:

reset_watchdog():
        lui     a5,%hi(WATCHDOG)
        lw      a0,%lo(WATCHDOG)(a5)
        sext.w  a0,a0
        sw      a0,%lo(WATCHDOG)(a5)
        slli    a0,a0,32
        srli    a0,a0,32
        ret

Basically GCC really sticks with sign-extending the loaded 4 byte value and thus has to emit two additional instruction to undo a possible extension and clear the most significant 4 bytes. It's as if GCC doesn't know that RV64 has the lwu instruction and that lw already sign-extends the destination.

cf. https://godbolt.org/z/zW9qTzdWh

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文