Linux 中微秒级精确(或更好)的进程计时
我需要一种非常准确的方法来对我的程序的各个部分进行计时。 我可以为此使用常规高分辨率时钟,但这将返回挂钟时间,这不是我需要的:我需要仅运行我的进程所花费的时间。
我清楚地记得看到一个 Linux 内核补丁,它可以让我将进程计时到纳秒精度,只是我忘了给它添加书签,而且我也忘记了补丁的名称:(。
但我记得它是如何工作的:
在每次上下文切换时,它将读出高分辨率时钟的值,并将最后两个值的增量添加到正在运行的进程的处理时间中,这会生成进程实际处理时间的高分辨率准确视图
。处理时间使用常规时钟,我相信这是毫秒精确的(1000Hz),这对于我的目的来说太大了
有谁知道我在谈论什么内核补丁?在它之前或之后的字母——类似“rtimer”之类的东西,但我不记得了
(也欢迎其他建议)
Marko 建议的完全公平调度程序不是我想要的,但它看起来很有希望。我遇到的问题是,我可以用来获取处理时间的调用仍然没有返回足够精细的值。
- times() 返回值 21、22(以毫秒为单位)。
- Clock() 返回值 21000、22000,粒度相同。
- getrusage() 返回的值例如 210002、22001(以及类似的值),它们看起来精度更高一些,但这些值看起来明显相同。
所以现在我可能遇到的问题是内核有我需要的信息,我只是不知道将返回它的系统调用。
I need a very accurate way to time parts of my program. I could use the regular high-resolution clock for this, but that will return wallclock time, which is not what I need: I needthe time spent running only my process.
I distinctly remember seeing a Linux kernel patch that would allow me to time my processes to nanosecond accuracy, except I forgot to bookmark it and I forgot the name of the patch as well :(.
I remember how it works though:
On every context switch, it will read out the value of a high-resolution clock, and add the delta of the last two values to the process time of the running process. This produces a high-resolution accurate view of the process' actual process time.
The regular process time is kept using the regular clock, which is I believe millisecond accurate (1000Hz), which is much too large for my purposes.
Does anyone know what kernel patch I'm talking about? I also remember it was like a word with a letter before or after it -- something like 'rtimer' or something, but I don't remember exactly.
(Other suggestions are welcome too)
The Completely Fair Scheduler suggested suggested by Marko is not what I was looking for, but it looks promising. The problem I have with it is that the calls I can use to get process time are still not returning values that are granular enough.
- times() is returning values 21, 22, in milliseconds.
- clock() is returning values 21000, 22000, same granularity.
- getrusage() is returning values like 210002, 22001 (and somesuch), they look to have a bit better accuracy but the values look conspicuously the same.
So now the problem I'm probably having is that the kernel has the information I need, I just don't know the system call that will return it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
如果您正在寻找这种级别的时序分辨率,您可能正在尝试进行一些微观优化。 如果是这种情况,您应该查看 PAPI。 它不仅提供挂钟和虚拟(仅限进程)计时信息,还提供对 CPU 事件计数器的访问,这在您尝试提高性能时是必不可少的。
http://icl.cs.utk.edu/papi/
If you are looking for this level of timing resolution, you are probably trying to do some micro-optimization. If that's the case, you should look at PAPI. Not only does it provide both wall-clock and virtual (process only) timing information, it also provides access to CPU event counters, which can be indispensable when you are trying to improve performance.
http://icl.cs.utk.edu/papi/
请参阅此问题了解更多信息。
我用于此类事情的工具是 gettimeofday()。 它提供了秒和微秒的结构。 在代码之前调用它,然后在代码之后再次调用它。 然后只需使用timersub将这两个结构体相减,您就可以从tv_usec字段中获取以秒为单位的时间。
See this question for some more info.
Something I've used for such things is gettimeofday(). It provides a structure with seconds and microseconds. Call it before the code, and again after. Then just subtract the two structs using timersub, and you can get the time it took in seconds from the tv_usec field.
如果您需要非常小的时间单位来(我假设)测试软件的速度,我建议只在循环中运行您想要计时的部分数百万次,计算循环前后的时间并计算平均值。 这样做的一个很好的副作用(除了不需要弄清楚如何使用纳秒之外)是,您将获得更一致的结果,因为由操作系统调度程序引起的随机开销将被平均掉。
当然,除非您的程序不需要能够在一秒钟内运行数百万次,否则如果您无法测量毫秒的运行时间,那么它可能已经足够快了。
If you need very small time units to for (I assume) testing the speed of your software, I would reccomend just running the parts you want to time in a loop millions of times, take the time before and after the loop and calculate the average. A nice side-effect of doing this (apart from not needing to figure out how to use nanoseconds) is that you would get more consistent results because the random overhead caused by the os sceduler will be averaged out.
Of course, unless your program doesn't need to be able to run millions of times in a second, it's probably fast enough if you can't measure a millisecond running time.
我相信 CFC (完全公平调度程序) 就是您正在寻找的。
I believe CFC (Completely Fair Scheduler) is what you're looking for.
如果您有最新的 2.6 内核,则可以使用高精度事件计时器 (HPET)。 查看 Documentation/hpet.txt 了解如何使用它。 不过,该解决方案依赖于平台,我相信它仅适用于较新的 x86 系统。 HPET 至少有一个 10MHz 定时器,因此它应该可以轻松满足您的要求。
我相信飞思卡尔的几个 PowerPC 实现也支持周期精确指令计数器。 几年前我用它来分析高度优化的代码,但我不记得它叫什么了。 我相信飞思卡尔有一个内核补丁,您必须应用它才能从用户空间访问它。
You can use the High Precision Event Timer (HPET) if you have a fairly recent 2.6 kernel. Check out Documentation/hpet.txt on how to use it. This solution is platform dependent though and I believe it is only available on newer x86 systems. HPET has at least a 10MHz timer so it should fit your requirements easily.
I believe several PowerPC implementations from Freescale support a cycle exact instruction counter as well. I used this a number of years ago to profile highly optimized code but I can't remember what it is called. I believe Freescale has a kernel patch you have to apply in order to access it from user space.
http://allmybrain.com/2008/06/10 /timing-cc-code-on-linux/
可能对你有帮助(如果你在 C/C++ 中这样做的话,直接帮助你,但我希望它能给你指点,即使你不是)。它声称提供微秒级的精度,刚好符合您的标准。 :)
http://allmybrain.com/2008/06/10/timing-cc-code-on-linux/
might be of help to you (directly if you are doing it in C/C++, but I hope it will give you pointers even if you're not)... It claims to provide microsecond accuracy, which just passes your criterion. :)
我想我找到了我正在寻找的内核补丁。 将其发布在这里,这样我就不会忘记链接:
http://user. it.uu.se/~mikpe/linux/perfctr/
http://sourceforge.net/projects/perfctr/
编辑:不过,它适合我的目的不太人性化。
I think I found the kernel patch I was looking for. Posting it here so I don't forget the link:
http://user.it.uu.se/~mikpe/linux/perfctr/
http://sourceforge.net/projects/perfctr/
Edit: It works for my purposes, though not very user-friendly.
尝试一下CPU的时间戳计数器? 维基百科似乎建议使用clock_gettime()。
try the CPU's timestamp counter? Wikipedia seems to suggest using clock_gettime().