如何测量处理器周期中的 x86 和 x86-64 汇编命令执行时间?
我想使用遗传算法为 gcc 编写一堆优化。 我需要测量某些统计函数和拟合函数的汇编函数的执行时间。 不能使用通常的时间测量,因为它受缓存大小的影响。
所以我需要一张可以看到这样的东西的桌子。
command | operands | operands sizes | execution cycles
我是否误解了什么? 抱歉英语不好。
I want to write a bunch of optimizations for gcc using genetic algorithms.
I need to measure execution time of an assembly functions for some stats and fit functions.
The usual time measurement can't be used, 'cause it is influenced by the cache size.
So I need a table where I can see something like this.
command | operands | operands sizes | execution cycles
Am I missunderstanding something?
Sorry for bad English.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对于现代CPU,没有简单的表来查找一条指令需要多长时间才能完成(尽管某些旧处理器存在这样的表,例如486)。有关每条指令的作用以及可能需要多长时间的最佳信息来自芯片制造商。例如 Intel 的文档手册非常好(该页面上还有优化手册) 。
几乎所有现代 CPU 上都有
RDTSC
指令,用于读取运行代码的处理器的时间戳计数器到EDX:EAX
中。这也存在陷阱,但本质上,如果您正在分析的代码代表真实的使用情况,其执行不会被中断或转移到另一个 CPU 核心,那么您可以使用此指令来获得您想要的时序。即用两个 RDTSC 指令包围您正在优化的代码,并将 TSC 中的差异作为计时。 (不同测试/情况下的时间差异可能很大;统计数据是你的朋友。)With modern CPU's, there are no simple tables to look up how long an instruction will take to complete (although such tables exist for some old processors, e.g. 486). Your best information on what each instruction does and how long it might take comes from the chip manufacturer. E.g. Intel's documentation manuals are quite good (there's also an optimisation manual on that page).
On pretty much all modern CPU's there's also the
RDTSC
instruction that reads the time stamp counter for the processor on which the code is running intoEDX:EAX
. There are pitfalls with this also, but essentially if the code you are profiling is representative of a real use situation, its execution doesn't get interrupted or shifted to another CPU core, then you can use this instruction to get the timings you want. I.e. surround the code you are optimising with twoRDTSC
instructions and take the difference in TSC as the timing. (Variances on timings in different tests/situations can be great; statistics is your friend.)读取系统时钟值?
reading the system clock value?
您可以使用程序集(rdtsc 和朋友)或使用类似于 PAPI 的检测 API。然而,准确测量一条指令执行期间所花费的时钟周期是不可能的 - 您可以参考架构开发人员手册以获得最佳估计。
在这两种情况下,您在考虑在 SMP 环境中运行的影响时应该小心。
You can instrument your code using assembly (rdtsc and friends) or using a instrumentation API like PAPI. Accurately measuring clock cycles that were spent during the execution of one instruction is not possible, however - you can refer to your architecture developer manuals for the best estimates.
In both cases, you should be careful when taking into account effects from running on a SMP environment.