绩效评估期间的异常值
我正在尝试使用 Intel 的 RDTSC 进行一些性能测量,结果相当不错 我在不同的测试运行中得到的变化很奇怪。在大多数情况下,我的 C 基准测试 需要 3000000 Mio 周期,但是,在某些情况下完全相同的执行可能需要 500万,几乎是两倍。我尝试不让大量的工作负载并行运行 这样我就可以得到良好的性能评估。任何人都知道这个巨大的时机在哪里 变异可以从何而来?我知道可能会发生中断和事情,但我没想到 时间上的差异如此之大!
PS.:我在运行 Linux 的 Pentium 处理器上运行它。
感谢您的反馈, 约翰
I am trying to do some performance measurements using Intels RDTSC, and it is quite
odd the variations I get during different testruns. In most cases my benchmark in C
needs 3000000 Mio cycles, however, exactly the same execution can in some cases take
5000000, almost double as much. I tried to have no intense workloads running in parallel
so that I get good performance estimations. Anyone an idea where this huge timing
variations can come from? I know that interrupts and stuff can happening, but I did not expect
such huge variations in timing!
PS.: I am running it on a Pentium processor with Linux running on it.
Thanks for feedback,
John
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想答案就在:
在现代操作系统中您对此没有足够的控制。
I think the answer is in:
You have insufficient control over this in a modern OS.
根据这篇维基百科文章,RDTSC(时间戳计数器)不能可靠地用于基准测试多核系统。不保证所有内核在时间戳寄存器中具有相同的值。
在 Linux 上,最好使用 POSIX
clock_gettime
函数。According to this Wikipedia article, the RDTSC (time stamp counter) cannot be used reliably for benchmarking on multi-core systems. There is no promise that all cores have the same value in the time stamp register.
On Linux, it is better to use the POSIX
clock_gettime
function.您必须考虑大多数现代处理器的缓存。在您测量了长时间运行时间的情况下,可能另一个进程会逐出您的程序的缓存内容。
正如 Henk 指出的那样,现代操作系统中会发生很多你无法控制的事情。
You have to take the cache of most modern processors into account. Maybe another process evicts your program's cache content in the case where you measured the long running time.
As Henk pointed out, lots of stuff happen in a modern OS that you can't control that much.