如何在多核处理器上进行基准测试

发布于 2024-08-31 23:38:52 字数 790 浏览 5 评论 0原文

我正在寻找在多核处理器上执行微基准测试的方法。

背景：

大约在同一时间，桌面处理器引入了乱序执行，这使得性能难以预测，但它们（也许并非巧合）还引入了特殊指令来获得非常精确的计时。这些指令的示例是 x86 上的 rdtsc 和 PowerPC 上的 rftb 。这些指令给出的计时比系统调用所允许的更精确，允许程序员对他们的心进行微基准测试，无论是好是坏。

在具有多个核心的更现代的处理器上，其中一些核心有时会休眠，计数器在核心之间不同步。我们被告知 rdtsc 不再可以安全地用于基准测试，但当我们向我们解释替代解决方案时，我一定是在打瞌睡。

问题：

某些系统可能会保存和恢复性能计数器，并提供 API 调用来读取正确的总和。如果您知道此调用对于任何操作系统来说是什么，请在回答中告诉我们。

某些系统可能允许关闭核心，只留下一个运行。我知道当从开发人员工具安装了正确的首选项窗格时，Mac OS X Leopard 就会这样做。您认为这会让 rdtsc 再次安全使用吗？

更多背景：

请假设我在尝试进行微基准测试时知道自己在做什么。如果您认为如果优化的收益无法通过对整个应用程序进行计时来衡量，那么它就不值得优化，我同意您的观点，但是

在替代数据结构完成之前我无法对整个应用程序进行计时，这将导致需要很长时间。事实上，如果微基准测试没有希望，我现在可以决定放弃实施；
我需要在我无法控制截止日期的出版物中提供数据。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

过去的过去 2024-09-07 23:38:52

在 OSX（ARM、Intel 和 PowerPC）上，您想要使用 mach_absolute_time( )：

#include <mach/mach_time.h>
#include <stdint.h>    

// Utility function for getting timings in nanoseconds.
double machTimeUnitsToNanoseconds(uint64_t mtu) {
    static double mtusPerNanosecond = 0.0;
    if (0.0 == mtusPerNanosecond) {
        mach_timebase_info_data_t info;
        if (mach_timebase_info(&info)) {
            // Handle an error gracefully here, whatever that means to you.
            // If you do get an error, something is seriously wrong, so
            // I generally just report it and exit( ).
        }
        mtusPerNanosecond = (double)info.numer / info.denom;
    }
    return mtu * mtusPerNanosecond;
}

// In your code:
uint64_t startTime = mach_absolute_time( );
// Stuff that you want to time.
uint64_t endTime = mach_absolute_time( );
double elapsedNanoseconds = machTimeUnitsToNanoseconds(endTime - startTime);

请注意，无需为此限制为一个内核。操作系统处理 mach_absolute_time( ) 所需的幕后修复，以便在多核（和多插槽）环境中提供有意义的结果。

On OSX (ARM, Intel and PowerPC), you want to use mach_absolute_time( ):

#include <mach/mach_time.h>
#include <stdint.h>    

// Utility function for getting timings in nanoseconds.
double machTimeUnitsToNanoseconds(uint64_t mtu) {
    static double mtusPerNanosecond = 0.0;
    if (0.0 == mtusPerNanosecond) {
        mach_timebase_info_data_t info;
        if (mach_timebase_info(&info)) {
            // Handle an error gracefully here, whatever that means to you.
            // If you do get an error, something is seriously wrong, so
            // I generally just report it and exit( ).
        }
        mtusPerNanosecond = (double)info.numer / info.denom;
    }
    return mtu * mtusPerNanosecond;
}

// In your code:
uint64_t startTime = mach_absolute_time( );
// Stuff that you want to time.
uint64_t endTime = mach_absolute_time( );
double elapsedNanoseconds = machTimeUnitsToNanoseconds(endTime - startTime);

Note that there's no need to limit to one core for this. The OS handles the fix-up required behind the scenes for mach_absolute_time( ) to give meaninful results in a multi-core (and multi-socket) environment.

回复收藏 0 原文