由 clang 1.1 和 1.0（llvm 2.7 和 2.6）生成的尾部调用

发布于 2024-09-02 16:06:59 字数 1144 浏览 5 评论 0原文

使用 clang -O2 （或使用在线演示）编译下一个代码片段后：

#include <stdio.h>
#include <stdlib.h>

int flop(int x);
int flip(int x) {
  if (x == 0) return 1;
  return (x+1)*flop(x-1);
}
int flop(int x) {
  if (x == 0) return 1;
  return (x+0)*flip(x-1);
}

int main(int argc, char **argv) {
  printf("%d\n", flip(atoi(argv[1])));
}

我得到下一个片段flip 中的 llvm 汇编：

bb1.i:                                            ; preds = %bb1
  %4 = add nsw i32 %x, -2                         ; <i32> [#uses=1]
  %5 = tail call i32 @flip(i32 %4) nounwind       ; <i32> [#uses=1]
  %6 = mul nsw i32 %5, %2                         ; <i32> [#uses=1]
  br label %flop.exit

我认为 tail call 意味着删除当前堆栈（即返回将返回到上层框架，因此下一条指令应该是 ret %5），但根据此代码，它将为其执行 mul 操作。在本机汇编中，有一个简单的 call ，没有尾部优化（即使有适当的 llc 标志）

有人可以解释为什么 clang 生成这样的代码吗？

同样，我不明白为什么 llvm 有 tail call 如果它可以简单地检查下一个 ret 将使用上一个 call 的结果，然后再执行适当的优化或生成尾部调用指令的本机等效项？

原文

After compilation next snippet of code with clang -O2 (or with online demo):

#include <stdio.h>
#include <stdlib.h>

int flop(int x);
int flip(int x) {
  if (x == 0) return 1;
  return (x+1)*flop(x-1);
}
int flop(int x) {
  if (x == 0) return 1;
  return (x+0)*flip(x-1);
}

int main(int argc, char **argv) {
  printf("%d\n", flip(atoi(argv[1])));
}

I'm getting next snippet of llvm assembly in flip:

bb1.i:                                            ; preds = %bb1
  %4 = add nsw i32 %x, -2                         ; <i32> [#uses=1]
  %5 = tail call i32 @flip(i32 %4) nounwind       ; <i32> [#uses=1]
  %6 = mul nsw i32 %5, %2                         ; <i32> [#uses=1]
  br label %flop.exit

I thought that tail call means dropping current stack (i.e. return will be to the upper frame, so next instruction should be ret %5), but according to this code it will do mul for it. And in native assembly there is simple call without tail optimisation (even with appropriate flag for llc)

Can sombody explain why clang generates such code?

As well I can't understand why llvm have tail call if it can simply check that next ret will use result of prev call and later do appropriate optimisation or generate native equivalent of tail-call instruction?

分享到QQ

分享到微博