为什么Clang不愿意或无法在此处消除重复负载
考虑以下C程序:
typedef struct { int x; } Foo;
void original(Foo***** xs, Foo* foo) {
xs[0][1][2][3] = foo;
xs[0][1][2][3]->x = 42;
}
据我了解,根据C标准foo **
不能别名foo*
等,因为它们的类型不兼容。将程序用clang 14.0和-O3
汇编,但是会导致重复负载:
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax + 8]
mov rax, qword ptr [rax + 16]
mov qword ptr [rax + 24], rsi
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax + 8]
mov rax, qword ptr [rax + 16]
mov rax, qword ptr [rax + 24]
mov dword ptr [rax], 42
ret
我希望将优化编译器用于:
(a)分配给x
on foo
直接并分配foo
xs (按任何顺序)
(b)对xs
的执行地址计算一次,然后使用它们来分配foo
和x
。
clang正确编译 b :
void fixed(Foo***** xs, Foo* foo) {
Foo** ix = &xs[0][1][2][3];
*ix = foo;
(*ix)->x = 42;
}
如下:(实际上将其变成 a )
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax + 8]
mov rax, qword ptr [rax + 16]
mov qword ptr [rax + 24], rsi
mov dword ptr [rsi], 42
ret
有趣的是,GCC将这两个定义汇编成 a 。为什么Clang不愿意或无法优化原始
定义中的地址计算?
Consider the following C program:
typedef struct { int x; } Foo;
void original(Foo***** xs, Foo* foo) {
xs[0][1][2][3] = foo;
xs[0][1][2][3]->x = 42;
}
As far as I understand, per the C standard Foo**
cannot alias Foo*
etc, as their types are not compatible. Compiling the program with clang 14.0 and -O3
however results in duplicate loads:
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax + 8]
mov rax, qword ptr [rax + 16]
mov qword ptr [rax + 24], rsi
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax + 8]
mov rax, qword ptr [rax + 16]
mov rax, qword ptr [rax + 24]
mov dword ptr [rax], 42
ret
I would expect an optimising compiler to either:
(A) Assign to x
on foo
directly and assign foo
to xs
(in any order)
(B) Perform address calculations for xs
once and use them for assigning foo
and x
.
Clang correctly compiles B:
void fixed(Foo***** xs, Foo* foo) {
Foo** ix = &xs[0][1][2][3];
*ix = foo;
(*ix)->x = 42;
}
as follows: (actually turning it into A)
mov rax, qword ptr [rdi]
mov rax, qword ptr [rax + 8]
mov rax, qword ptr [rax + 16]
mov qword ptr [rax + 24], rsi
mov dword ptr [rsi], 42
ret
Interestingly gcc compiles both definitions into A. Why is clang unwilling or unable to optimise the address calculation in the original
definition?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是部分答案。
负载是两次执行的,因为优化器错过了优化。它成功地检测了这种特定情况,但通过报告以下错误而失败:
您可以通过在Godbolt中打开“优化输出”窗口来看到。
该优化由LLVM中的全局值编号(GVN)通过执行,并且似乎从函数
report mayclobberedload
。该代码指出,错过的负载淘汰是由于中间商店(再次)造成的。有关更多信息,当然需要深入研究此优化通行证的算法。一个好的开始似乎是gvnpass :: Analyzeloadavavaiavaialabilitoy
功能。幸运的是,该代码已评论。
请注意,简化的
foo **
用用例已优化,并且简化的foo *** ***
用例未通过默认情况进行优化,而是使用限制
>修复错过的优化(看起来Optimizer错误地假设由于商店而在这里可能是一个问题)。我想知道这是否可能是由于LLVM-IR造成的,这似乎在
foo **
或foo *** ***
指针类型之间没有区别:它们显然都是被认为是原始的指针。因此,商店转发优化可能会失败,因为商店可能会影响链条的任何指针,而优化器不知道由于混溶而导致哪个指针(由于指针类型的损失,其本身)。这是生产的LLVM-IR代码:This is a partial answer.
The loads are performed twice because the optimizer missed the optimization. It succeed to detect this specific case, but fail by reporting the following errors:
You can see that by opening the "optimization output" window in Godbolt.
This optimization is performed by the Global Value Numbering (GVN) pass in LLVM and the specific error appears to be reported from the function
reportMayClobberedLoad
. The code states that the missed load-elimination is due to an intervening store (again). For more information, one certainly need to delve into the algorithm of this optimization pass. A good start seems to be theGVNPass::AnalyzeLoadAvailability
function. Fortunately, the code is commented.Note a simplified
Foo**
use-case is optimized and a simplifiedFoo***
use-case is not optimized by default, but usingrestrict
fix the missed-optimization (it looks like the optimizer wrongly assumes that aliasing can be an issue here due to the store).I am wondering if this could be due to the LLVM-IR which seems to make no distinction between a
Foo**
orFoo***
pointer types: they are apparently all considered as raw pointers. Thus, the store forwarding optimization could fail because of the store may impact any pointer of the chain and the optimizer cannot know which one due to the aliasing (itself due to the loss of pointer type). Here is the produced LLVM-IR code:答案似乎是一个开放的LLVM问题: [tbaa]为具有不同深度,类型,类型,类型的指针的独特的TBAA标签。
Jérôme的答案使我答应了这可能与基于类型的别名分析(TBAA)有关时,当我注意到所有负载都使用相同的TBAA元数据时。
目前,Clang仅排放*以下TBAA:
可能能够沿着以下方面发出的内容的LLVM修订:
查看我认为Clang最终 格式,请原谅任何错误)
以及LLVM下面的代码产生预期的组件。
Compiler Explorer Playground
*编译器的Explorer llvm llvm ir视图过滤器代码> -emit-llvm 和禁用“指令”过滤
The answer seems to be it's an open LLVM issue: [TBAA] Emit distinct TBAA tags for pointers with different depths,types.
Jérôme's answer tipped me off that this might have something to do with Type Based Alias Analysis (TBAA) when I noticed all loads use the same TBAA metadata.
Right now clang only emits* the following TBAA:
Looking at the LLVM revision I figured eventually clang might be able to emit something along the lines of:
(I'm still not sure I fully grok the TBAA metadata format so please excuse any mistakes)
Together with the code below LLVM produces the expected assembly.
Compiler Explorer Playground
* Compiler's Explorer LLVM IR view filters these out by default but you can see them by using
-emit-llvm
and disabling "Directives" filtering