为什么64位RISC-V GCC Sign-Extend extend在加载后无符号值?
GCC和Clang为此片段生成不同的代码:
extern volatile unsigned WATCHDOG;
void reset_watchdog() {
unsigned t = WATCHDOG;
WATCHDOG = t;
}
考虑WatchDog
作为内存映射的I/O寄存器。
使用-O3
clang 14生成非常简单的代码时:
reset_watchdog():
lui a0, %hi(WATCHDOG)
lw a1, %lo(WATCHDOG)(a0)
sw a1, %lo(WATCHDOG)(a0)
ret
但是,GCC 10.0.2插入加载后多余的签名 - 伸展指令:
reset_watchdog():
lui a4,%hi(WATCHDOG)
lw a5,%lo(WATCHDOG)(a4)
sext.w a5,a5
sw a5,%lo(WATCHDOG)(a4)
ret
另请参阅: https://godbolt.org/z/zk1sss3wsz
sext.w/code>在两种方面是多余的:
已经在rv64i下的目标寄存器(
lwu
将零扩展到目标),而sw
指令零4个最有意义的内容源寄存器的字节是无关紧要的
,那么为什么GCC生成这样(次优?)代码?
当功能返回原始看门狗值时,GCC变得更加有趣:
extern volatile unsigned WATCHDOG;
unsigned long reset_watchdog() {
unsigned long t = WATCHDOG;
WATCHDOG = t;
return t;
}
正如预期的那样,Clang基本上只是用lw
指令lwu
避免符号扩展:
reset_watchdog():
lui a1, %hi(WATCHDOG)
lwu a0, %lo(WATCHDOG)(a1)
sw a0, %lo(WATCHDOG)(a1)
ret
何时GCC生成更多复杂的代码:
reset_watchdog():
lui a5,%hi(WATCHDOG)
lw a0,%lo(WATCHDOG)(a5)
sext.w a0,a0
sw a0,%lo(WATCHDOG)(a5)
slli a0,a0,32
srli a0,a0,32
ret
基本上GCC确实坚持使用签名 - 延长加载的4个字节值,因此必须发出另外的两个指令,以撤消可能的扩展,并清除最重要的4个字节。好像GCC不知道RV64具有lwu
指令,而lw
已经签名了目标。
GCC and Clang generate different code for this snippet:
extern volatile unsigned WATCHDOG;
void reset_watchdog() {
unsigned t = WATCHDOG;
WATCHDOG = t;
}
Consider WATCHDOG
as a memory-mapped I/O register.
When compiling with -O3
Clang 14 generates pretty straight forward code:
reset_watchdog():
lui a0, %hi(WATCHDOG)
lw a1, %lo(WATCHDOG)(a0)
sw a1, %lo(WATCHDOG)(a0)
ret
However, GCC 10.0.2 inserts a superfluous sign-extend instruction after the load:
reset_watchdog():
lui a4,%hi(WATCHDOG)
lw a5,%lo(WATCHDOG)(a4)
sext.w a5,a5
sw a5,%lo(WATCHDOG)(a4)
ret
See also: https://godbolt.org/z/zK1ss3Wsz
The sext.w
is superfluous in two ways:
lw
already sign-extends the destination register under RV64I (lwu
would zero-extend the destination, instead)- for the
sw
instruction the content of the 4 most-significant bytes of the source register is irrelevant
So why does GCC generate such (suboptimal?) code?
GCC gets even more interesting when the function returns the original watchdog value:
extern volatile unsigned WATCHDOG;
unsigned long reset_watchdog() {
unsigned long t = WATCHDOG;
WATCHDOG = t;
return t;
}
As expected, Clang basically just replaces the lw
instruction with lwu
to avoid the sign extension:
reset_watchdog():
lui a1, %hi(WATCHDOG)
lwu a0, %lo(WATCHDOG)(a1)
sw a0, %lo(WATCHDOG)(a1)
ret
While GCC generates more convoluted code:
reset_watchdog():
lui a5,%hi(WATCHDOG)
lw a0,%lo(WATCHDOG)(a5)
sext.w a0,a0
sw a0,%lo(WATCHDOG)(a5)
slli a0,a0,32
srli a0,a0,32
ret
Basically GCC really sticks with sign-extending the loaded 4 byte value and thus has to emit two additional instruction to undo a possible extension and clear the most significant 4 bytes. It's as if GCC doesn't know that RV64 has the lwu
instruction and that lw
already sign-extends the destination.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论