如何读取 i5、i7 CPU 上的性能计数器

发布于 2024-12-15 08:16:51 字数 365 浏览 1 评论 0 原文

现代 CPU 有相当多的性能计数器 - http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384。 html 如何读取它们? 我对缓存未命中和分支预测错误感兴趣。

Modern CPUs have quite a lot of performance counters - http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html how to read them?
I'm interested in cache misses and branch mispredictions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

枫以 2024-12-22 08:16:51

看起来 PAPI 有非常干净的 API 并且工作得很好乌班图11.04。
安装后,以下应用程序将执行我想要的操作:

#include <stdio.h>
#include <stdlib.h>
#include <papi.h>

#define NUM_EVENTS 4

void matmul(const double *A, const double *B,
        double *C, int m, int n, int p)
{
    int i, j, k;
    for (i = 0; i < m; ++i)
        for (j = 0; j < p; ++j) {
            double sum = 0;
            for (k = 0; k < n; ++k)
                sum += A[i*n + k] * B[k*p + j];
            C[i*p + j] = sum;
        }
}

int main(int /* argc */, char ** /* argv[] */)
{
    const int size = 300;
    double a[size][size];
    double b[size][size];
    double c[size][size];

    int event[NUM_EVENTS] = {PAPI_TOT_INS, PAPI_TOT_CYC, PAPI_BR_MSP, PAPI_L1_DCM };
    long long values[NUM_EVENTS];

    /* Start counting events */
    if (PAPI_start_counters(event, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_start_counters - FAILED\n");
        exit(1);
    }

    matmul((double *)a, (double *)b, (double *)c, size, size, size);

    /* Read the counters */
    if (PAPI_read_counters(values, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_read_counters - FAILED\n");
        exit(1);
    }

    printf("Total instructions: %lld\n", values[0]);
    printf("Total cycles: %lld\n", values[1]);
    printf("Instr per cycle: %2.3f\n", (double)values[0] / (double) values[1]);
    printf("Branches mispredicted: %lld\n", values[2]);
    printf("L1 Cache misses: %lld\n", values[3]);

    /* Stop counting events */
    if (PAPI_stop_counters(values, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_stoped_counters - FAILED\n");
        exit(1);
    }

    return 0;
}

在 Intel Q6600 上对此进行了测试,它最多支持 4 个性能事件。您的处理器可能支持或多或少。

Looks like PAPI has very clean API and works just fine on Ubuntu 11.04.
Once it's installed, following app will do what I wanted:

#include <stdio.h>
#include <stdlib.h>
#include <papi.h>

#define NUM_EVENTS 4

void matmul(const double *A, const double *B,
        double *C, int m, int n, int p)
{
    int i, j, k;
    for (i = 0; i < m; ++i)
        for (j = 0; j < p; ++j) {
            double sum = 0;
            for (k = 0; k < n; ++k)
                sum += A[i*n + k] * B[k*p + j];
            C[i*p + j] = sum;
        }
}

int main(int /* argc */, char ** /* argv[] */)
{
    const int size = 300;
    double a[size][size];
    double b[size][size];
    double c[size][size];

    int event[NUM_EVENTS] = {PAPI_TOT_INS, PAPI_TOT_CYC, PAPI_BR_MSP, PAPI_L1_DCM };
    long long values[NUM_EVENTS];

    /* Start counting events */
    if (PAPI_start_counters(event, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_start_counters - FAILED\n");
        exit(1);
    }

    matmul((double *)a, (double *)b, (double *)c, size, size, size);

    /* Read the counters */
    if (PAPI_read_counters(values, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_read_counters - FAILED\n");
        exit(1);
    }

    printf("Total instructions: %lld\n", values[0]);
    printf("Total cycles: %lld\n", values[1]);
    printf("Instr per cycle: %2.3f\n", (double)values[0] / (double) values[1]);
    printf("Branches mispredicted: %lld\n", values[2]);
    printf("L1 Cache misses: %lld\n", values[3]);

    /* Stop counting events */
    if (PAPI_stop_counters(values, NUM_EVENTS) != PAPI_OK) {
        fprintf(stderr, "PAPI_stoped_counters - FAILED\n");
        exit(1);
    }

    return 0;
}

Tested this on Intel Q6600, it supports up to 4 performance events. Your processor may support more or less.

浅唱ヾ落雨殇 2024-12-22 08:16:51

perf 怎么样? perf list hw cache 显示 33 个不同的事件,手册页显示如何使用原始性能计数器描述符。

What about perf? perf list hw cache shows 33 different events and the man page shows how to use raw performance counter descriptors.

没有伤那来痛 2024-12-22 08:16:51

性能计数器通过 RDPMC insn 读取。

编辑:要添加更多信息,读取性能计数器并不是很容易,如果我们要在这里描述它,则需要一页又一页,此外它还涉及对模型特定寄存器的写入,这需要特权指示。相反,我建议使用现成的分析器 - oprofile 或 Intel VTune,它们基于性能计数器构建。

Performance counters are read with the RDPMC insn.

EDIT: To add a bit more info, reading performance counters is not very easy and it would take pages upon pages if we are to describe it here, besides it involves writes to Model Specific Registers, which require privileged instructions. I would instead advise to use ready profilers - oprofile or Intel VTune, which are built upon performance counters.

白馒头 2024-12-22 08:16:51

我认为有一个可用的库可以使用,称为perfmon2,http://perfmon2.sourceforge.net/,文档可在 http://www.hpl.hp.com/research/linux/perfmon/perfmon.php4http://www.hpl.hp.com/techreports/2004/HPL-2004-200R1.html,我最近正在挖掘这个库,我会尽快发布示例代码我明白了~

I think there is a available library that can be used, called perfmon2, http://perfmon2.sourceforge.net/, and documentations are available at http://www.hpl.hp.com/research/linux/perfmon/perfmon.php4 and http://www.hpl.hp.com/techreports/2004/HPL-2004-200R1.html, I am recently digging this lib out, I would post example code as soon as I figure it out~

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文