测量进程消耗的 CPU 时钟

发布于 2024-08-27 21:16:09 字数 76 浏览 6 评论 0原文

我用 C 语言编写了一个程序。它是作为研究结果创建的程序。我想计算程序消耗的确切 CPU 周期。精确的循环次数。 知道我怎样才能找到它吗?

I have written a program in C. Its a program created as result of a research. I want to compute exact CPU cycles which program consumes. Exact number of cycles.
Any idea how can I find that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

你在我安 2024-09-03 21:16:09

valgrind 工具 cachegrind (valgrind --tool=cachegrind) 将为您提供详细的输出,包括执行的指令数、缓存未命中和分支预测失误。这些可以归结为汇编程序的各个行,因此原则上(了解您的确切架构)您可以从此输出中得出精确的周期计数。

要知道,由于缓存效应,它会在执行之间发生变化。

cachegrind 工具的文档位于此处

The valgrind tool cachegrind (valgrind --tool=cachegrind) will give you a detailed output including the number of instructions executed, cache misses and branch prediction misses. These can be accounted down to individual lines of assembler, so in principle (with knowledge of your exact architecture) you could derive precise cycle counts from this output.

Know that it will change from execution to execution, due to cache effects.

The documentation for the cachegrind tool is here.

小草泠泠 2024-09-03 21:16:09

不,你不能。 “CPU 周期”的概念没有明确定义。现代芯片可以以多种时钟速率运行,并且它们的不同部分可以在不同时间执行不同的操作。

在某些情况下,“总共有多少管道步骤”的问题可能是有意义的,但不太可能有办法得到它。

No you can't. The concept of a 'CPU cycle' is not well defined. Modern chips can run at multiple clock rates, and different parts of them can be doing different things at different times.

The question of 'how many total pipeline steps' might in some cases be meaningful, but there is not likely to be a way to get it.

要走就滚别墨迹 2024-09-03 21:16:09

抱歉,但是不,至少对于大多数实际目的来说不是——对于大多数普通操作系统来说这是不可能的。举例来说,相当多的操作系统不会执行完整的上下文切换来处理中断,因此服务中断所花费的时间可能而且经常会表现为中断发生时正在执行的任何进程所花费的时间。

“不用于实际目的”将表明在周期精确模拟器下运行您的程序的可能性。这些是可用的,但主要用于主要用于实时嵌入式系统的 CPU,而不是用于像成熟的 PC 这样的东西。更糟糕的是,它们(通常)不是用于运行诸如成熟操作系统之类的东西,而是用于在“裸机”上运行的代码。

从理论上讲,您也许可以使用运行 Windows 或 Linux 之类的虚拟机做一些事情,但我不知道有任何现有虚拟机尝试这样做,而且这绝对不是微不足道的,而且可能有相当严重的问题。也会对性能产生影响(温和地说)。

Sorry, but no, at least not for most practical purposes -- it's simply not possible with most normal OSes. Just for example, quite a few OSes don't do a full context switch to handle an interrupt, so the time spent servicing a interrupt can and often will appear to be time spent in whatever process was executing when the interrupt occurred.

The "not for practical purposes" would indicate the possibility of running your program under a cycle accurate simulator. These are available, but mostly for CPUs used primarily in real-time embedded systems, NOT for anything like a full-blown PC. Worse, they (generally) aren't for running anything like a full-blown OS, but for code that runs on the "bare metal."

In theory, you might be able to do something with a virtual machine running something like Windows or Linux -- but I don't know of any existing virtual machine that attempts to, and it would be decidedly non-trivial and probably have pretty serious consequences in performance as well (to put it mildly).

仅此而已 2024-09-03 21:16:09

尝试OProfile。它使用 CPU 上的各种硬件计数器来测量执行的指令数以及已经经过的周期数。您可以在文章内存第 7 部分:内存性能工具中查看其使用示例。

Try OProfile. It use various hardware counters on the CPU to measure the number of instructions executed and how many cycles have passed. You can see an example of it's use in the article, Memory part 7: Memory performance tools.

菊凝晚露 2024-09-03 21:16:09

我不完全确定我确切知道您要做什么,但是在现代 x86 处理器上可以做的是阅读 您感兴趣的代码块之前和之后的时间戳计数器 (TSC)。在汇编级别,这是使用 RDTSC 指令完成的,该指令为您提供 edx:eax 寄存器对中 TSC 的值。

但请注意,此方法有一些注意事项,例如,如果您的进程在 CPU0 上开始并在 CPU1 上结束,则从 RDTSC 获得的结果将引用执行该指令的特定处理器内核,并且因此可能不具有可比性。 (RDTSC 还缺乏指令序列化,但在这种情况下,我认为这不是一个大问题。)

I am not entirely sure that I know exactly what you're trying to do, but what can be done on modern x86 processors is to read the time stamp counter (TSC) before and after the block of code you're interested in. On the assembly level, this is done using the RDTSC instruction, which gives you the value of the TSC in the edx:eax register pair.

Note however that there are certain caveats to this approach, e.g. if your process starts out on CPU0 and ends up on CPU1, the result you get from RDTSC will refer to the specific processor core that executed the instruction and hence may not be comparable. (There's also the lack of instruction serialisation with RDTSC, but in this context here, I don't think that's so much of an issue.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文