基准测试注意事项和确定性数据收集
我正在编写一个 C++ 基准测试程序,其中涉及对许多函数调用进行计时。函数会被重复调用,每次都会被记录下来,以便以后进行统计分析。要求函数在多个线程上同时运行,因此为了保证基准测试的准确性和公平性,它在实时操作系统上运行,并控制调度行为。以下是我的担忧:
是否有确定的方法来收集计时数据?我看过 printf 和 stringstream,但由于内存和字符串流,似乎都没有确定性行为。缓冲操作。出于同样的原因,它们也不会在 O(1) 中执行,我是对的吗?目前我正在使用一个大型字符数组和一个自定义 strcat
函数,以便可以在 O(1) 内收集每个时间值。然后在测试结束时收集所有数据后打印该数组。
我使用 clock_gettime
进行计时,clock_getres
为我提供了 1ns 的分辨率。这个值可信吗?
到目前为止,我做的事情是否正确?在编写基准测试时是否还应该注意其他问题?
I am writing a c++ benchmarking program, which involves timing a number of function calls. The functions are called repeatedly and each time is recorded for statistical analysis later. It is required that the functions be run simultaneously on multiple threads and thus to ensure accuracy and fairness of the benchmark, it is run on a real-time OS, with the scheduling behavior being controlled. The following are my concerns:
Are there deterministic ways of collecting the timing data? I have looked at printf and stringstream but neither seems to have deterministic behavior due to memory & buffer operations. They also do not perform in O(1) for the same reason, am I right? Currently I am using a large char array and a custom strcat
function so that each time value can be collected in O(1). This array is then printed at the end of the test, when all data has been collected.
I am using clock_gettime
for timings and clock_getres
gives me a resolution of 1ns. Can this value be trusted?
Am I doing things right so far, and are there any other issues that I should be aware of when writing the benchmark?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
调用高频定时器并将样本写入输出流是获取性能数据的完全明智的方法。但有一些棘手的问题需要小心。
CLOCK_PROCESS_CPUTIME_ID
)的clock_gettime应该是可靠的。如果您想查询CPU,您可以查看性能应用程序编程接口库直接计时器,但这不是必需的。或者,如果您确实需要 100% 的确定性,则需要确保您的线程以相同的顺序进行调度,运行相同的量子,并将每次运行的数据放入相同的内存地址中。
Calling high-frequency timers and writing samples into an output stream is a perfectly sensible way to get performance data. But there are a few tricky gotchas to be careful of.
CLOCK_PROCESS_CPUTIME_ID
) should be reliable if the person who wrote your kernel wasn't a dunce. You can look into the Performance Application Programming Interface library if you want to query the CPU timers directly, but that shouldn't be necessary.Or, if you truly need to have 100% determinism, you'll need to ensure that your threads schedule in the same order, run for the same quanta, and put their data in the same memory addresses for each run.
出于实际性能考虑,不要使用大 O 表示法。
也就是说,对于问题的其余部分:
性能收集将需要一些时间(O(1) 仍然是有意义的时间,只是它不依赖于您的数据)。您需要使其最有效。
这意味着:
不要使用
printf
等,而是写入特殊的内存区域,稍后您将从中提取数据。出于同样的原因,不要使用
strcat
,而是使用二进制数据的struct
。完成后最后解析它。不要测量每个呼叫,而是考虑测量平均值(即:测量不是每个呼叫,而是每个 1000 个呼叫,并取平均值以提取单个呼叫的大致成本)。这将使您的测量开销倍数减少。虽然这种可能性并不总是存在,但请考虑一下。
clock_gettime
通常是可信的,但这取决于您的操作系统和硬件 - 检查一下,有时硬件时钟分辨率可能不会像您希望的那么小。Do not use the big-O notation for the real life performance considerations.
That said, to the rest of the question:
The performance gathering will take some time (O(1) can still be meaningful time, it's just that it won't depend on your data). You need to make it the most efficient.
That means:
Not to use
printf
and likes, but rather write to a special memory area, from which you'll extract the data later.For the same reason don't use
strcat
, instead usestruct
s of binary data. Parse it in the end when you're done.Instead of measuring each call, consider measuring averages (i.e.: measure not each call, but each 1000 and average to extract the approximate cost of a single call). That will make your measurement overhead times lesser. That is not always a possibility though, but consider it.
The
clock_gettime
can usually be trusted, but it depends on your OS and hardware - check them out, sometimes the hardware clock resolution might not be as small as you'd wish.