为什么Clang不愿意或无法在此处消除重复负载

发布于 2025-02-02 18:25:30 字数 1576 浏览 4 评论 0原文

考虑以下C程序:

typedef struct { int x; } Foo;

void original(Foo***** xs, Foo* foo) {
    xs[0][1][2][3] = foo;
    xs[0][1][2][3]->x = 42;
}

据我了解,根据C标准foo **不能别名foo*等,因为它们的类型不兼容。将程序用clang 14.0和-O3汇编,但是会导致重复负载:

    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax + 8]
    mov     rax, qword ptr [rax + 16]
    mov     qword ptr [rax + 24], rsi
    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax + 8]
    mov     rax, qword ptr [rax + 16]
    mov     rax, qword ptr [rax + 24]
    mov     dword ptr [rax], 42
    ret

我希望将优化编译器用于:

(a)分配给x on foo直接并分配foo xs (按任何顺序)
(b)xs的执行地址计算一次,然后使用它们来分配foox

clang正确编译 b

void fixed(Foo***** xs, Foo* foo) {
    Foo** ix = &xs[0][1][2][3];
    *ix = foo;
    (*ix)->x = 42;
}

如下:(实际上将其变成 a

    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax + 8]
    mov     rax, qword ptr [rax + 16]
    mov     qword ptr [rax + 24], rsi
    mov     dword ptr [rsi], 42
    ret

有趣的是,GCC将这两个定义汇编成 a 。为什么Clang不愿意或无法优化原始定义中的地址计算?

Compiler Explorer Playground

Consider the following C program:

typedef struct { int x; } Foo;

void original(Foo***** xs, Foo* foo) {
    xs[0][1][2][3] = foo;
    xs[0][1][2][3]->x = 42;
}

As far as I understand, per the C standard Foo** cannot alias Foo* etc, as their types are not compatible. Compiling the program with clang 14.0 and -O3 however results in duplicate loads:

    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax + 8]
    mov     rax, qword ptr [rax + 16]
    mov     qword ptr [rax + 24], rsi
    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax + 8]
    mov     rax, qword ptr [rax + 16]
    mov     rax, qword ptr [rax + 24]
    mov     dword ptr [rax], 42
    ret

I would expect an optimising compiler to either:

(A) Assign to x on foo directly and assign foo to xs (in any order)
(B) Perform address calculations for xs once and use them for assigning foo and x.

Clang correctly compiles B:

void fixed(Foo***** xs, Foo* foo) {
    Foo** ix = &xs[0][1][2][3];
    *ix = foo;
    (*ix)->x = 42;
}

as follows: (actually turning it into A)

    mov     rax, qword ptr [rdi]
    mov     rax, qword ptr [rax + 8]
    mov     rax, qword ptr [rax + 16]
    mov     qword ptr [rax + 24], rsi
    mov     dword ptr [rsi], 42
    ret

Interestingly gcc compiles both definitions into A. Why is clang unwilling or unable to optimise the address calculation in the original definition?

Compiler Explorer Playground

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烛影斜 2025-02-09 18:25:30

这是部分答案。

负载是两次执行的,因为优化器错过了优化。它成功地检测了这种特定情况,但通过报告以下错误而失败:

错过的 - PTR类型的负载未被消除以支持负载,因为它被商店撞击了
错过 - 未被消除的PTR类型的负载,因为它被商店堵塞
错过 - 未被消除的PTR类型的负载,因为它被商店堵塞
错过的 - PTR类型的负载未被消除,因为它被Store

固定

您可以通过在Godbolt中打开“优化输出”窗口来看到。

该优化由LLVM中的全局值编号(GVN)通过执行,并且似乎从函数 report mayclobberedload 。该代码指出,错过的负载淘汰是由于中间商店(再次)造成的。有关更多信息,当然需要深入研究此优化通行证的算法。一个好的开始似乎是 gvnpass :: Analyzeloadavavaiavaialabilitoy功能。幸运的是,该代码已评论。

请注意,简化的foo **用用例已优化,并且简化的foo *** ***用例未通过默认情况进行优化,而是使用限制 >修复错过的优化(看起来Optimizer错误地假设由于商店而在这里可能是一个问题)。

我想知道这是否可能是由于LLVM-IR造成的,这似乎在foo **foo *** ***指针类型之间没有区别:它们显然都是被认为是原始的指针。因此,商店转发优化可能会失败,因为商店可能会影响链条的任何指针,而优化器不知道由于混溶而导致哪个指针(由于指针类型的损失,其本身)。这是生产的LLVM-IR代码:

define dso_local void @original(ptr nocapture noundef readonly %0, ptr noundef %1) local_unnamed_addr #0 !dbg !9 {
  call void @llvm.dbg.value(metadata ptr %0, metadata !24, metadata !DIExpression()), !dbg !26
  call void @llvm.dbg.value(metadata ptr %1, metadata !25, metadata !DIExpression()), !dbg !26
  %3 = load ptr, ptr %0, align 8, !dbg !27, !tbaa !28
  %4 = getelementptr inbounds ptr, ptr %3, i64 1, !dbg !27
  %5 = load ptr, ptr %4, align 8, !dbg !27, !tbaa !28
  %6 = getelementptr inbounds ptr, ptr %5, i64 2, !dbg !27
  %7 = load ptr, ptr %6, align 8, !dbg !27, !tbaa !28
  %8 = getelementptr inbounds ptr, ptr %7, i64 3, !dbg !27
  store ptr %1, ptr %8, align 8, !dbg !32, !tbaa !28
  %9 = load ptr, ptr %0, align 8, !dbg !33, !tbaa !28
  %10 = getelementptr inbounds ptr, ptr %9, i64 1, !dbg !33
  %11 = load ptr, ptr %10, align 8, !dbg !33, !tbaa !28
  %12 = getelementptr inbounds ptr, ptr %11, i64 2, !dbg !33
  %13 = load ptr, ptr %12, align 8, !dbg !33, !tbaa !28
  %14 = getelementptr inbounds ptr, ptr %13, i64 3, !dbg !33
  %15 = load ptr, ptr %14, align 8, !dbg !33, !tbaa !28
  store i32 42, ptr %15, align 4, !dbg !34, !tbaa !35
  ret void, !dbg !38
}

This is a partial answer.

The loads are performed twice because the optimizer missed the optimization. It succeed to detect this specific case, but fail by reporting the following errors:

Missed - load of type ptr not eliminated in favor of load because it is clobbered by store
Missed - load of type ptr not eliminated because it is clobbered by store
Missed - load of type ptr not eliminated because it is clobbered by store
Missed - load of type ptr not eliminated because it is clobbered by store

You can see that by opening the "optimization output" window in Godbolt.

This optimization is performed by the Global Value Numbering (GVN) pass in LLVM and the specific error appears to be reported from the function reportMayClobberedLoad. The code states that the missed load-elimination is due to an intervening store (again). For more information, one certainly need to delve into the algorithm of this optimization pass. A good start seems to be the GVNPass::AnalyzeLoadAvailability function. Fortunately, the code is commented.

Note a simplified Foo** use-case is optimized and a simplified Foo*** use-case is not optimized by default, but using restrict fix the missed-optimization (it looks like the optimizer wrongly assumes that aliasing can be an issue here due to the store).

I am wondering if this could be due to the LLVM-IR which seems to make no distinction between a Foo** or Foo*** pointer types: they are apparently all considered as raw pointers. Thus, the store forwarding optimization could fail because of the store may impact any pointer of the chain and the optimizer cannot know which one due to the aliasing (itself due to the loss of pointer type). Here is the produced LLVM-IR code:

define dso_local void @original(ptr nocapture noundef readonly %0, ptr noundef %1) local_unnamed_addr #0 !dbg !9 {
  call void @llvm.dbg.value(metadata ptr %0, metadata !24, metadata !DIExpression()), !dbg !26
  call void @llvm.dbg.value(metadata ptr %1, metadata !25, metadata !DIExpression()), !dbg !26
  %3 = load ptr, ptr %0, align 8, !dbg !27, !tbaa !28
  %4 = getelementptr inbounds ptr, ptr %3, i64 1, !dbg !27
  %5 = load ptr, ptr %4, align 8, !dbg !27, !tbaa !28
  %6 = getelementptr inbounds ptr, ptr %5, i64 2, !dbg !27
  %7 = load ptr, ptr %6, align 8, !dbg !27, !tbaa !28
  %8 = getelementptr inbounds ptr, ptr %7, i64 3, !dbg !27
  store ptr %1, ptr %8, align 8, !dbg !32, !tbaa !28
  %9 = load ptr, ptr %0, align 8, !dbg !33, !tbaa !28
  %10 = getelementptr inbounds ptr, ptr %9, i64 1, !dbg !33
  %11 = load ptr, ptr %10, align 8, !dbg !33, !tbaa !28
  %12 = getelementptr inbounds ptr, ptr %11, i64 2, !dbg !33
  %13 = load ptr, ptr %12, align 8, !dbg !33, !tbaa !28
  %14 = getelementptr inbounds ptr, ptr %13, i64 3, !dbg !33
  %15 = load ptr, ptr %14, align 8, !dbg !33, !tbaa !28
  store i32 42, ptr %15, align 4, !dbg !34, !tbaa !35
  ret void, !dbg !38
}
那一片橙海, 2025-02-09 18:25:30

答案似乎是一个开放的LLVM问题: [tbaa]为具有不同深度,类型,类型,类型的指针的独特的TBAA标签。

Jérôme的答案使我答应了这可能与基于类型的别名分析(TBAA)有关时,当我注意到所有负载都使用相同的TBAA元数据时。

目前,Clang仅排放*以下TBAA:

; Descriptors
!15 = !{!"Simple C/C++ TBAA"}
!14 = !{!"omnipotent char", !15, i64 0}
!13 = !{!"any pointer", !14, i64 0}
!21 = !{!"int", !14, i64 0}
!20 = !{!"", !21, i64 0}
; Tags
!12 = !{!13, !13, i64 0}
!19 = !{!20, !21, i64 0}

可能能够沿着以下方面发出的内容的LLVM修订:

; Type descriptors
!0 = !{!"TBAA Root"}
!1 = !{!"omnipotent char", !0, i64 0}
!3 = !{!"int", !0, i64 0}
!2 = !{!"any pointer", !1, i64 0}
!11 = !{!"p1 foo", !2, i64 0} ; Foo*
!12 = !{!"p2 foo", !2, i64 0} ; Foo**
!13 = !{!"p3 foo", !2, i64 0} ; Foo***
!14 = !{!"p4 foo", !2, i64 0} ; Foo****
!10 = !{!"foo", !3, i64 0} ; struct {int x}

; Access tags
!20 = !{!14, !14, i64 0} ; Foo****
!21 = !{!13, !13, i64 0} ; Foo***
!22 = !{!12, !12, i64 0} ; Foo**
!23 = !{!11, !11, i64 0} ; Foo*
!24 = !{!10, !3, i64 0}  ; Foo.x

查看我认为Clang最终 格式,请原谅任何错误)

以及LLVM下面的代码产生预期的组件。

define void @original(ptr %0, ptr %1) {
  %3 = load ptr, ptr %0, !tbaa !20
  %4 = getelementptr ptr, ptr %3, i64 1
  %5 = load ptr, ptr %4, !tbaa !21
  %6 = getelementptr ptr, ptr %5, i64 2
  %7 = load ptr, ptr %6, !tbaa !22
  %8 = getelementptr ptr, ptr %7, i64 3
  store ptr %1, ptr %8, !tbaa !23

  %9 = load ptr, ptr %0, !tbaa !20
  %10 = getelementptr ptr, ptr %9, i64 1
  %11 = load ptr, ptr %10, !tbaa !21
  %12 = getelementptr ptr, ptr %11, i64 2
  %13 = load ptr, ptr %12, !tbaa !22
  %14 = getelementptr ptr, ptr %13, i64 3
  %15 = load ptr, ptr %14, !tbaa !23 ; : Foo*
  store i32 42, ptr %15, !tbaa !24

  ret void
}

Compiler Explorer Playground

*编译器的Explorer llvm llvm ir视图过滤器代码> -emit-llvm 和禁用“指令”过滤

The answer seems to be it's an open LLVM issue: [TBAA] Emit distinct TBAA tags for pointers with different depths,types.

Jérôme's answer tipped me off that this might have something to do with Type Based Alias Analysis (TBAA) when I noticed all loads use the same TBAA metadata.

Right now clang only emits* the following TBAA:

; Descriptors
!15 = !{!"Simple C/C++ TBAA"}
!14 = !{!"omnipotent char", !15, i64 0}
!13 = !{!"any pointer", !14, i64 0}
!21 = !{!"int", !14, i64 0}
!20 = !{!"", !21, i64 0}
; Tags
!12 = !{!13, !13, i64 0}
!19 = !{!20, !21, i64 0}

Looking at the LLVM revision I figured eventually clang might be able to emit something along the lines of:

; Type descriptors
!0 = !{!"TBAA Root"}
!1 = !{!"omnipotent char", !0, i64 0}
!3 = !{!"int", !0, i64 0}
!2 = !{!"any pointer", !1, i64 0}
!11 = !{!"p1 foo", !2, i64 0} ; Foo*
!12 = !{!"p2 foo", !2, i64 0} ; Foo**
!13 = !{!"p3 foo", !2, i64 0} ; Foo***
!14 = !{!"p4 foo", !2, i64 0} ; Foo****
!10 = !{!"foo", !3, i64 0} ; struct {int x}

; Access tags
!20 = !{!14, !14, i64 0} ; Foo****
!21 = !{!13, !13, i64 0} ; Foo***
!22 = !{!12, !12, i64 0} ; Foo**
!23 = !{!11, !11, i64 0} ; Foo*
!24 = !{!10, !3, i64 0}  ; Foo.x

(I'm still not sure I fully grok the TBAA metadata format so please excuse any mistakes)

Together with the code below LLVM produces the expected assembly.

define void @original(ptr %0, ptr %1) {
  %3 = load ptr, ptr %0, !tbaa !20
  %4 = getelementptr ptr, ptr %3, i64 1
  %5 = load ptr, ptr %4, !tbaa !21
  %6 = getelementptr ptr, ptr %5, i64 2
  %7 = load ptr, ptr %6, !tbaa !22
  %8 = getelementptr ptr, ptr %7, i64 3
  store ptr %1, ptr %8, !tbaa !23

  %9 = load ptr, ptr %0, !tbaa !20
  %10 = getelementptr ptr, ptr %9, i64 1
  %11 = load ptr, ptr %10, !tbaa !21
  %12 = getelementptr ptr, ptr %11, i64 2
  %13 = load ptr, ptr %12, !tbaa !22
  %14 = getelementptr ptr, ptr %13, i64 3
  %15 = load ptr, ptr %14, !tbaa !23 ; : Foo*
  store i32 42, ptr %15, !tbaa !24

  ret void
}

Compiler Explorer Playground

* Compiler's Explorer LLVM IR view filters these out by default but you can see them by using -emit-llvm and disabling "Directives" filtering

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文