通过引用内联函数传递 __m128i 对象是否会导致这些对象移动到堆栈?
我正在使用 SSE2 内在函数编写 8x16 位向量的转置函数。由于该函数有 8 个参数(8x8x16 位大小的矩阵),因此除了通过引用传递它们之外我无能为力。编译器会对其进行优化吗(我的意思是,这些 __m128i 对象是否会在寄存器而不是堆栈中传递)?
代码片段:
inline void transpose (__m128i &a0, __m128i &a1, __m128i &a2, __m128i &a3,
__m128i &a4, __m128i &a5, __m128i &a6, __m128i &a7) {
....
}
I'm writing transpose function for 8x16bit vectors with SSE2 intrinsics. Since there are 8 arguments for that function (a matrix of 8x8x16bit size), I can't do anything but pass them by reference. Will that be optimized by the compiler (I mean, will these __m128i objects be passed in registers instead of stack)?
Code snippet:
inline void transpose (__m128i &a0, __m128i &a1, __m128i &a2, __m128i &a3,
__m128i &a4, __m128i &a5, __m128i &a6, __m128i &a7) {
....
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
谁能说一下?
为什么不编译一下然后看反汇编呢?这是唯一可以确定的方法。
Who can say?
Why not compile it and look at the disassembly? That is the only way to be sure.
它们很可能不会被推入堆栈。如果函数是内联的,编译器实际上会将操作(代码)从被调用函数推送到被调用函数中,而不是将数据从调用者传递到被调用者。
现在,内联是一个提示,因此编译器可以决定不实际内联调用,然后您必须遵循 Zan 的建议并实际检查编译后的代码是什么样子。
Chances are that they will not be pushed to the stack. If the function is inline the compiler will actually push the operations (code) from the called function into the callee function instead of passing the data from the caller to the callee.
Now, inline is a hint, so the compiler can decide not to actually inline the call and then you would have to follow Zan's advice and actually check what the compiled code looks like.
请注意,此限制仅适用于 Windows 和 MSVC(++)(您可能应该相应地标记您的问题)。
我还没有尝试过使用 C++ 和引用,但是使用 MSVC 和带有内联的指针,编译器似乎确实优化了间接寻址。想必这同样适用于 C++ 引用,但正如另一位发帖者指出的那样,您应该查看生成的代码进行检查。
Note that this limitation only applies to Windows and MSVC(++) (you should probably tag your question accordingly).
I haven't tried this with C++ and references, but using MSVC and pointers with inlines like this the compiler does appear to optimise away the indirection. Presumably the same will apply with C++ references, but as another poster pointed out, you should look at the generated code to check.