gcc 中的内联汇编程序出错
我已经成功地在 gcc 中编写了一些内联汇编程序以向右旋转一位 遵循一些很好的说明: http://www.cs .dartmouth.edu/~sergey/cs108/2009/gcc-inline-asm.pdf
这是一个示例:
static inline int ror(int v) {
asm ("ror %0;" :"=r"(v) /* output */ :"0"(v) /* input */ );
return v;
}
但是,我希望代码能够计算时钟周期,并且看到了一些错误的(可能是微软)格式。我不知道如何在 gcc 中做这些事情。有什么帮助吗?
unsigned __int64 inline GetRDTSC() {
__asm {
; Flush the pipeline
XOR eax, eax
CPUID
; Get RDTSC counter in edx:eax
RDTSC
}
}
我尝试过:
static inline unsigned long long getClocks() {
asm("xor %%eax, %%eax" );
asm(CPUID);
asm(RDTSC : : %%edx %%eax); //Get RDTSC counter in edx:eax
但我不知道如何让 edx:eax 对干净地返回 64 位,也不知道如何真正刷新管道。
另外,我找到的最好的源代码位于: http://www.strchr.com/performance_measurements_with_rdtsc
和那是提到奔腾,所以如果在不同的英特尔/AMD 变体上有不同的方法,请告诉我。我更喜欢在所有 x86 平台上运行的东西,即使它有点难看,而不是针对每个变体的一系列解决方案,但我不介意了解它。
I have successfully written some inline assembler in gcc to rotate right one bit
following some nice instructions: http://www.cs.dartmouth.edu/~sergey/cs108/2009/gcc-inline-asm.pdf
Here's an example:
static inline int ror(int v) {
asm ("ror %0;" :"=r"(v) /* output */ :"0"(v) /* input */ );
return v;
}
However, I want code to count clock cycles, and have seen some in the wrong (probably microsoft) format. I don't know how to do these things in gcc. Any help?
unsigned __int64 inline GetRDTSC() {
__asm {
; Flush the pipeline
XOR eax, eax
CPUID
; Get RDTSC counter in edx:eax
RDTSC
}
}
I tried:
static inline unsigned long long getClocks() {
asm("xor %%eax, %%eax" );
asm(CPUID);
asm(RDTSC : : %%edx %%eax); //Get RDTSC counter in edx:eax
but I don't know how to get the edx:eax pair to return as 64 bits cleanly, and don't know how to really flush the pipeline.
Also, the best source code I found was at: http://www.strchr.com/performance_measurements_with_rdtsc
and that was mentioning pentium, so if there are different ways of doing it on different intel/AMD variants, please let me know. I would prefer something that works on all x86 platforms, even if it's a bit ugly, to a range of solutions for each variant, but I wouldn't mind knowing about it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
以下代码可以满足您的要求:
在代码中放置尽可能少的内联 ASM 非常重要,因为它会阻止编译器进行任何优化。这就是为什么我在 C 代码中完成结果的移位和或运算,而不是在 ASM 中进行编码。同样,我使用“a”输入 0 让编译器决定何时以及如何将 eax 清零。程序中的某些其他代码可能已将其清零,如果编译器知道这一点,则可以保存指令。
另外,上面的“破坏者”也非常重要。
CPUID
覆盖 eax、ebx、ecx 和 edx 中的所有内容。您需要告诉编译器您正在更改这些寄存器,以便它知道不要在其中保留任何重要的内容。您不必列出 eax 和 edx,因为您将它们用作输出。如果您不列出破坏者,您的程序很可能会崩溃,并且您会发现很难找到问题。The following does what you want:
It is important to put as little inline ASM as possible in your code, because it prevents the compiler from doing any optimizations. That's why I've done the shift and oring of the result in C code rather than coding that in ASM as well. Similarly, I use the "a" input of 0 to let the compiler decide when and how to zero out eax. It could be that some other code in your program already zeroed it out, and the compiler could save an instruction if it knows that.
Also, the "clobbers" above are very important.
CPUID
overwrites everything in eax, ebx, ecx, and edx. You need to tell the compiler that you're changing these registers so that it knows not to keep anything important there. You don't have to list eax and edx because you're using them as outputs. If you don't list the clobbers, there's a serious chance your program will crash and you will find it extremely difficult to track down the issue.这会将结果存储在值中。组合结果需要额外的周期,因此调用此代码之间的周期数将比结果差异少一些。
This will store the result in value. Combining the results takes extra cycles, so the number of cycles between calls to this code will be a few less than the difference in results.