如何测量函数执行所需的时间?
如何测量函数执行所需的时间?
这是一个相对较短的函数,执行时间可能在毫秒范围内。
这个特定问题涉及用 C 或 C++ 编程的嵌入式系统。
How can you measure the amount of time a function will take to execute?
This is a relatively short function and the execution time would probably be in the millisecond range.
This particular question relates to an embedded system, programmed in C or C++.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
存在三种可能的解决方案:
硬件解决方案:
使用处理器上的空闲输出引脚,并将示波器或逻辑分析仪连接到该引脚。 将引脚初始化为低电平状态,在调用要测量的函数之前,将引脚断言为高电平状态,并在从函数返回后立即取消断言引脚。
书虫解决方案:
如果函数相当小,并且您可以管理反汇编代码,则可以打开处理器架构数据手册并计算处理器执行每条指令所需的周期。 这将为您提供所需的周期数。
时间 = # 周期 * 处理器时钟速率 / 每条指令的时钟周期
对于较小的函数或用汇编程序编写的代码(例如 PIC 微控制器),这更容易做到
时间戳计数器解决方案:
某些处理器具有时间戳计数器以快速速率递增(每隔几个处理器时钟滴答声)。 只需读取函数前后的时间戳即可。
这将为您提供经过的时间,但请注意您可能需要处理计数器翻转。
There are three potential solutions:
Hardware Solution:
Use a free output pin on the processor and hook an oscilloscope or logic analyzer to the pin. Initialize the pin to a low state, just before calling the function you want to measure, assert the pin to a high state and just after returning from the function, deassert the pin.
Bookworm solution:
If the function is fairly small, and you can manage the disassembled code, you can crack open the processor architecture databook and count the cycles it will take the processor to execute every instructions. This will give you the number of cycles required.
Time = # cycles * Processor Clock Rate / Clock ticks per instructions
This is easier to do for smaller functions, or code written in assembler (for a PIC microcontroller for example)
Timestamp counter solution:
Some processors have a timestamp counter which increments at a rapid rate (every few processor clock ticks). Simply read the timestamp before and after the function.
This will give you the elapsed time, but beware that you might have to deal with the counter rollover.
在嵌入式系统上做到这一点的最佳方法是在进入该功能时设置外部硬件引脚,并在离开该功能时清除它。 最好使用一些汇编指令来完成此操作,这样您的结果就不会偏差太大。
编辑:好处之一是您可以在实际应用程序中执行此操作,并且不需要任何特殊的测试代码。 像这样的外部调试引脚是(应该是!)每个嵌入式系统的标准做法。
The best way to do that on an embedded system is to set an external hardware pin when you enter the function and clear it when you leave the function. This is done preferably with a little assembly instruction so you don't skew your results too much.
Edit: One of the benefits is that you can do it in your actual application and you don't need any special test code. External debug pins like that are (should be!) standard practice for every embedded system.
我多次重复该函数调用(数百万次),但也采用以下方法来减少循环开销:
您可以在第一个循环中调用一次 function() ,而不是在第一个循环中调用 function() 两次,在第二个循环中调用一次。第一个循环并且在第二个循环中根本不调用它(即空循环),但是空循环可以由编译器优化,从而给您带来负面的计时结果:)
I repeat the function call a lot of times (millions) but also employ the following method to discount the loop overhead:
Instead of calling function() twice in the first loop and once in the second loop, you could just call it once in the first loop and don't call it at all (i.e. empty loop) in the second, however the empty loop could be optimized out by the compiler, giving you negative timing results :)
如果您使用的是 Linux,则可以通过在命令行中输入来计时程序的运行时间:
如果您仅运行 main() 中的函数(假设是 C++),则应用程序的其余时间应该可以忽略不计。
if you're using linux, you can time a program's runtime by typing in the command line:
if you run only the function in main() (assuming C++), the rest of the app's time should be negligible.
在具有大量调用的循环中调用它,然后除以调用次数以获得平均时间。
所以:
Invoke it in a loop with a ton of invocations, then divide by the number of invocations to get the average time.
so:
Windows XP/NT Embedded 或 Windows CE/Mobile
您可以使用 QueryPerformanceCounter() 在函数之前和之后获取非常快速的计数器的值。 然后减去这些 64 位值并得到增量“刻度”。 使用 QueryPerformanceCounterFrequency() 您可以将“增量刻度”转换为实际时间单位。 您可以参考有关这些 WIN32 调用的 MSDN 文档。
其他嵌入式系统
没有操作系统或只有基本操作系统,您将必须:
非常重要 不要忘记在获取这些计时器值(机器人进位和寄存器值)之后禁用中断并恢复中断,否则您可能会保存不正确的值。
注释
Windows XP/NT Embedded or Windows CE/Mobile
You an use the QueryPerformanceCounter() to get the value of a VERY FAST counter before and after your function. Then you substract those 64-bits values and get a delta "ticks". Using QueryPerformanceCounterFrequency() you can convert the "delta ticks" to an actual time unit. You can refer to MSDN documentation about those WIN32 calls.
Other embedded systems
Without operating systems or with only basic OSes you will have to:
VERY IMPORTANT Do not forget to disable before and restore interrupts after getting those timer values (bot the carry and the register value) otherwise you risk saving incorrect values.
NOTES
在 OS X 终端(也可能是 Unix)中,使用“时间”:
In OS X terminal (and probably Unix, too), use "time":
我总是实现一个中断驱动的自动收报机例程。 然后,这会更新一个计数器,该计数器计算自启动以来的毫秒数。 然后使用 GetTickCount() 函数访问该计数器。
示例:
在您的代码中,您将按如下方式计时:
I always implement an interrupt driven ticker routine. This then updates a counter that counts the number of milliseconds since start up. This counter is then accessed with a GetTickCount() function.
Example:
In your code you would time the code as follows:
取决于您的嵌入式平台以及您正在寻找的计时类型。 对于嵌入式 Linux,有多种方法可以实现。 如果您希望测量函数使用的 CPU 时间,您可以执行以下操作:
您需要将其与实时库链接,只需使用以下内容来编译您的代码:
您可能还想阅读手册页
clock_gettime
在基于 SMP 的系统上运行此代码会出现一些问题,这些问题可能会使您的测试无效。 您可以使用sched_setaffinity()
或命令行cpuset
来强制代码仅在一个内核上运行。如果您希望测量用户和系统时间,那么您可以使用 times(NULL) ,它返回类似 jiffies 的内容。 或者,您可以将
clock_gettime()
的参数从CLOCK_THREAD_CPUTIME_ID
更改为CLOCK_MONOTONIC
...但要小心CLOCK_MONOTONIC< 的环绕/代码>。
对于其他平台,你就得靠自己了。
德鲁
Depends on your embedded platform and what type of timing you are looking for. For embedded Linux, there are several ways you can accomplish. If you wish to measure the amout of CPU time used by your function, you can do the following:
You will need to link this with the realtime library, just use the following to compile your code:
You may also want to read the man page on
clock_gettime
there is some issues with running this code on SMP based system that could invalidate you testing. You could use something likesched_setaffinity()
or the command linecpuset
to force the code on only one core.If you are looking to measure user and system time, then you could use the
times(NULL)
which returns something like a jiffies. Or you can change the parameter forclock_gettime()
fromCLOCK_THREAD_CPUTIME_ID
toCLOCK_MONOTONIC
...but be careful of wrap around withCLOCK_MONOTONIC
.For other platforms, you are on your own.
Drew
如果代码是 .Net,请使用秒表类 (.net 2.0+) 而不是 DateTime.Now。 DateTime.Now 更新不够准确,会给你带来疯狂的结果
If the code is .Net, use the stopwatch class (.net 2.0+) NOT DateTime.Now. DateTime.Now isn't updated accurately enough and will give you crazy results
如果您正在寻找亚毫秒级分辨率,请尝试以下计时方法之一。 它们都会在至少数十或数百微秒内为您提供解决方案:
如果是嵌入式 Linux,请查看 Linux 计时器:
http://linux.die.net/man/3/clock_gettime
嵌入式Java,看看nanoTime(),尽管我不确定这是在嵌入式版本中:
http://java.sun.com/ j2se/1.5.0/docs/api/java/lang/System.html#nanoTime()
如果您想获取硬件计数器,请尝试 PAPI:
http://icl.cs.utk.edu/papi/
否则你可以随时转到汇编程序。 如果您需要一些帮助,您可以查看您的架构的 PAPI 源代码。
If you're looking for sub-millisecond resolution, try one of these timing methods. They'll all get you resolution in at least the tens or hundreds of microseconds:
If it's embedded Linux, look at Linux timers:
http://linux.die.net/man/3/clock_gettime
Embedded Java, look at nanoTime(), though I'm not sure this is in the embedded edition:
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#nanoTime()
If you want to get at the hardware counters, try PAPI:
http://icl.cs.utk.edu/papi/
Otherwise you can always go to assembler. You could look at the PAPI source for your architecture if you need some help with this.