C:尝试将变量存储在特定的 XMM 寄存器中
我正在开发一个涉及 SSE 内在函数和 XMM 寄存器的项目,并且我想使用提供的所有 16 个寄存器。我试图明确告诉编译器执行此操作,但它似乎不起作用。例如,我可能会写这样一行:
register __m128 foo __asm__("xmm12") = _mm_setzero_ps();
foo
将存储在寄存器 xmm12
中,并初始化为零(我稍后将添加到 foo 等)
问题是,当我查看汇编代码时,xmm12
没有在任何地方使用,尽管它实际上在代码中是必需的,而且我告诉编译器使用该寄存器。
我很难弄清楚我做错了什么。我的语法不正确吗?编译器是否忽略了我所说的内容,如果是的话,为什么?
任何帮助将非常感激!
I'm working on a project involving SSE intrinsics and XMM registers, and I would like to use all 16 registers offered. I'm trying to explicitly tell the compiler to do this, but it doesn't seem to be working. For instance, I might write a line like this:
register __m128 foo __asm__("xmm12") = _mm_setzero_ps();
Where foo
would be stored at register xmm12
, and be initialized to zero (I would later be adding to foo, etc)
The thing is, when I look at the assembly code, xmm12
isn't being used anywhere, even though it actually is necessary for in the code, and I told the compiler to use that register.
I'm having a hard time figuring out what I'm doing wrong. Is my syntax incorrect? Is the compiler ignoring what I'm saying, and if so why?
Any help would be really appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
事实证明,真正的问题不在于“register”关键字。编译器忽略这一点是正确的;这是一个愚蠢的想法。最终我要做的就是比我已经展开的“for”循环多展开几次。最后,这使我的代码更快,并且恰好使用了更多寄存器。我错误地认为“使用更多的寄存器会导致更快的代码”,而寄存器的使用更多的是副作用而不是其他任何东西。
不过还是谢谢你的帮助!
As it turns out, the real problem wasn't with the 'register' keyword. The compiler was right to ignore that; it was a silly idea. Ultimately what I had to do was unroll my 'for' loop a few more times than I already had. In the end this made my code faster and just happened to use more registers. I made the mistake of thinking "using more registers would lead to faster code", when register usage is more a side effect than anything else.
Thank you for the help though!
我刚刚尝试使用 gcc 4.2 进行实验,看起来只能使用 -O0 成功指定 XMM 寄存器。一旦你打开优化,gcc就会改变寄存器分配。所以看起来你可以完全控制并手动完成所有优化,如果你真的愿意,只要你使用 gcc -O0 ,否则让 gcc 负责优化和寄存器分配你。
I just tried an experiment with gcc 4.2 and it looks like you can only specify the XMM registers successfully with -O0. As soon as you turn on optimisation then gcc will change the register allocation. So it looks like you can either have complete control and do all the optimisation manually, if you really want to, so long as you use
gcc -O0
, otherwise let gcc take care of optimisation and register allocation for you.