从 clang 发出 llvm 字节码:'byval'用于将带有非平凡析构函数的对象传递到函数中的属性
我有一个 C++ 源代码,我使用 clang 对其进行解析,生成 llvm 字节码。从现在起我想自己处理该文件...... 然而我遇到了一个问题。考虑以下场景: - 我创建一个带有重要析构函数或复制构造函数的类。 - 我定义一个函数,其中此类的对象作为参数按值(无引用或指针)传递。
在生成的字节码中,我得到了一个指针。对于没有析构函数的类,参数被注释为“byval”,但在本例中并非如此。 因此,我无法区分参数是按值传递还是实际上按指针传递。
考虑以下示例:
输入文件 - cpass.cpp:
class C {
public:
int x;
~C() {}
};
void set(C val, int x) {val.x=x;};
void set(C *ptr, int x) {ptr->x=x;}
编译命令行:
clang++ -c cpass.cpp -emit-llvm -o cpass.bc; llvm-dis cpass.bc
生成的输出文件 (cpass.ll):
; ModuleID = 'cpass.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"
%class.C = type { i32 }
define void @_Z3set1Ci(%class.C* %val, i32 %x) nounwind {
%1 = alloca i32, align 4
store i32 %x, i32* %1, align 4
%2 = load i32* %1, align 4
%3 = getelementptr inbounds %class.C* %val, i32 0, i32 0
store i32 %2, i32* %3, align 4
ret void
}
define void @_Z3setP1Ci(%class.C* %ptr, i32 %x) nounwind {
%1 = alloca %class.C*, align 8
%2 = alloca i32, align 4
store %class.C* %ptr, %class.C** %1, align 8
store i32 %x, i32* %2, align 4
%3 = load i32* %2, align 4
%4 = load %class.C** %1, align 8
%5 = getelementptr inbounds %class.C* %4, i32 0, i32 0
store i32 %3, i32* %5, align 4
ret void
}
如您所见,两个 set
函数的参数看起来完全相同。那么我怎么知道第一个函数是按值而不是指针获取参数呢?
一种解决方案可能是以某种方式解析损坏的函数名称,但它可能并不总是可行。如果有人将 extern "C"
放在函数前面怎么办?
有没有办法告诉clang保留byval注释,或者为每个通过值传递的函数参数生成额外的注释?
Anton Korobeynikov 建议我应该深入研究 clang 的 LLVM IR 发射。不幸的是我对 clang 的内部结构几乎一无所知,文档相当稀疏。 clang 的 内部手册 没有讨论 IR 发射。所以我真的不知道如何开始,从哪里去解决问题,希望不需要实际浏览所有 clang 源代码。有什么指点吗?提示?进一步阅读?
回应 Anton Korobeynikov:
我或多或少知道 C++ ABI 在参数传递方面的样子。在这里找到了一些不错的读物:http://agner.org./optimize/calling_conventions.pdf。但这非常依赖于平台!这种方法在不同的架构或某些特殊情况下可能不可行。
例如,就我而言,该函数将在与调用它的设备不同的设备上运行。这两个设备不共享内存,因此它们甚至不共享堆栈。除非用户传递指针(在这种情况下,我们假设他知道自己在做什么),否则应始终在函数参数消息中传递对象。如果它有一个重要的复制构造函数,则它应该由调用者执行,但该对象也应该在参数区域中创建。
因此,我想做的是以某种方式覆盖 clang 中的 ABI,而不过多侵入其源代码。或者也许添加一些额外的注释,这些注释在正常的编译管道中会被忽略,但我可以在解析 .bc/.ll 文件时检测到。或者以某种方式不同地重建函数签名。
I have a source C++ code which I parse using clang, producing llvm bytecode. From this point I want to process the file myself...
However I encoudered a problem. Consider the following scenario:
- I create a class with a nontrivial destructor or copy constructor.
- I define a function, where an object of this class is passed as a parameter, by value (no reference or pointer).
In the produced bytecode, I get a pointer instead. For classes without the destructor, the parameter is annotated as 'byval', but it is not so in this case.
As a result, I cannot distinguish if the parameter is passed by value, or really by a pointer.
Consider the following example:
Input file - cpass.cpp:
class C {
public:
int x;
~C() {}
};
void set(C val, int x) {val.x=x;};
void set(C *ptr, int x) {ptr->x=x;}
Compilation command line:
clang++ -c cpass.cpp -emit-llvm -o cpass.bc; llvm-dis cpass.bc
Produced output file (cpass.ll):
; ModuleID = 'cpass.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"
%class.C = type { i32 }
define void @_Z3set1Ci(%class.C* %val, i32 %x) nounwind {
%1 = alloca i32, align 4
store i32 %x, i32* %1, align 4
%2 = load i32* %1, align 4
%3 = getelementptr inbounds %class.C* %val, i32 0, i32 0
store i32 %2, i32* %3, align 4
ret void
}
define void @_Z3setP1Ci(%class.C* %ptr, i32 %x) nounwind {
%1 = alloca %class.C*, align 8
%2 = alloca i32, align 4
store %class.C* %ptr, %class.C** %1, align 8
store i32 %x, i32* %2, align 4
%3 = load i32* %2, align 4
%4 = load %class.C** %1, align 8
%5 = getelementptr inbounds %class.C* %4, i32 0, i32 0
store i32 %3, i32* %5, align 4
ret void
}
As you can see, the parameters of both set
functions look exactly the same. So how can I tell that the first function was meant to take the parameter by value, instead of a pointer?
One solution could be to somehow parse the mangled function name, but it may not be always viable. What if somebody puts extern "C"
before the function?
Is there a way to tell clang
to keep the byval
annotation, or to produce an extra annotation for each function parameter passed by a value?
Anton Korobeynikov suggests that I should dig into clang's LLVM IR emission. Unfortunately I know almost nothing about clang internals, the documentation is rather sparse. The Internals Manual of clang does not talk about IR emission. So I don't really know how to start, where to go to get the problem solved, hopefully without actually going through all of clang source code. Any pointers? Hints? Further reading?
In response to Anton Korobeynikov:
I know more-or-less how C++ ABI looks like with respect of parameter passing. Found some good reading here: http://agner.org./optimize/calling_conventions.pdf. But this is very platform dependent! This approach might not be feasable on different architectures or in some special circumstances.
In my case, for example, the function is going to be run on a different device than where it is being called from. The two devices don't share memory, so they don't even share the stack. Unless the user is passing a pointer (in which case we assume he knows what he is doing), an object should always be passed within the function-parameters message. If it has a nontrivial copy constructor, it should be executed by the caller, but the object should be created in the parameter area as well.
So, what I would like to do is to somehow override the ABI in clang, without too much intrusion into their source code. Or maybe add some additional annotation, which would be ignored in a normal compilation pipeline, but I could detect when parsing the .bc/.ll file. Or somehow differently reconstruct the function signature.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不幸的是,“byval”不仅仅是“注释”,它是参数属性,这对于优化器和后端来说意味着很多。基本上,如何传递带有或不带有重要函数的小型结构/类的规则是由平台 C++ ABI 管理的,因此您不能总是在这里使用 byval 。
事实上,这里的byval只是前端层面小幅优化的结果。当您按值传递内容时,应在堆栈上构造临时对象(通过默认的复制构造函数)。当你有一个类似 POD 的类时,clang 可以推断出复制 ctor 是微不足道的,并且会优化 ctor / dtor 对,只传递“内容”。
对于非平凡的类(如您的情况),clang 无法执行此类优化,并且必须同时调用 ctor 和 dtor。因此,您会看到创建了指向临时对象的指针。
尝试调用你的 set() 函数,你会看到那里发生了什么。
Unfortunately, "byval" is not just "annotation", it's parameter attribute which means a alot for optimizers and backends. Basically, the rules how to pass small structs / classes with and without non-trivial functions are government by platform C++ ABI, so you cannot just always use byval here.
In fact, byval here is just a result of minor optimization at frontend level. When you're passing stuff by value, then temporary object should be constructed on stack (via the default copy ctor). When you have a class which is something POD-like, then clang can deduce that copy ctor will be trivial and will optimize the pair of ctor / dtor out, passing just the "contents".
For non-trivial classes (like in your case) clang cannot perform such optimization and have to call both ctor and dtor. Thus you're seeing the pointer to temporary object is created.
Try to call your set() functions and you'll see what's going there.