Kcachegrind/callgrind 对于调度程序功能不准确？

发布于 2024-12-05 08:01:43 字数 1510 浏览 5 评论 0原文

我有一个模型代码，kcachegrind/callgrind 报告了奇怪的结果。这是一种调度程序功能。从 4 个地方呼叫调度员；每次调用都会说明要运行哪个实际的 do_J 函数（因此 first2 将仅调用 do_1 和 do_2 等等on)

源代码（这是实际代码的模型）

#define N 1000000

int a[N];
int do_1(int *a) { int i; for(i=0;i<N/4;i++) a[i]+=1; }
int do_2(int *a) { int i; for(i=0;i<N/2;i++) a[i]+=2; }
int do_3(int *a) { int i; for(i=0;i<N*3/4;i++) a[i]+=3; }
int do_4(int *a) { int i; for(i=0;i<N;i++) a[i]+=4; }

int dispatcher(int *a, int j) {
    if(j==1) do_1(a);
    else if(j==2) do_2(a);
    else if(j==3) do_3(a);
    else do_4(a);
}

int first2(int *a) { dispatcher(a,1); dispatcher(a,2); }
int last2(int *a) { dispatcher(a,4); dispatcher(a,3); }
int inner2(int *a) { dispatcher(a,2); dispatcher(a,3); }
int outer2(int *a) { dispatcher(a,1); dispatcher(a,4); }

int main(){
    first2(a);
    last2(a);
    inner2(a);
    outer2(a);
}

使用 gcc -O0 编译；使用 valgrind --tool=callgrind 进行 Callgrind ； kcachegrind 与 kcachegrind 和 qcachegrind-0.7 一起使用。

这是该应用程序的完整调用图。到 do_J 的所有路径都经过调度程序，这很好（do_1 隐藏得太快了，但它确实在这里，只剩下 do_2）

Full

让我们关注 do_1 并检查是谁调用了它（这张图片不正确）：

在此处输入图像描述

我认为这很奇怪，只有 first2 和 outer2 调用了 do_1 而不是全部。

这是 callgrind/kcachegrind 的限制吗？如何获得带有权重的准确调用图（与每个函数的运行时间成比例，无论有没有子函数）？

原文

I have a model code on which kcachegrind/callgrind reports strange results. It is kind of dispatcher function. The dispatcher is called from 4 places; each call says, which actual do_J function to run (so the first2 will call only do_1 and do_2 and so on)

Source (this is a model of actual code)

#define N 1000000

int a[N];
int do_1(int *a) { int i; for(i=0;i<N/4;i++) a[i]+=1; }
int do_2(int *a) { int i; for(i=0;i<N/2;i++) a[i]+=2; }
int do_3(int *a) { int i; for(i=0;i<N*3/4;i++) a[i]+=3; }
int do_4(int *a) { int i; for(i=0;i<N;i++) a[i]+=4; }

int dispatcher(int *a, int j) {
    if(j==1) do_1(a);
    else if(j==2) do_2(a);
    else if(j==3) do_3(a);
    else do_4(a);
}

int first2(int *a) { dispatcher(a,1); dispatcher(a,2); }
int last2(int *a) { dispatcher(a,4); dispatcher(a,3); }
int inner2(int *a) { dispatcher(a,2); dispatcher(a,3); }
int outer2(int *a) { dispatcher(a,1); dispatcher(a,4); }

int main(){
    first2(a);
    last2(a);
    inner2(a);
    outer2(a);
}

Compiled with gcc -O0; Callgrinded with valgrind --tool=callgrind; kcachegrinded with kcachegrind and qcachegrind-0.7.

Here is a full callgraph of the application. All paths to do_J go through dispatcher and this is good (the do_1 is just hided as too fast, but it is here really, just left to do_2)

Full

Lets focus on do_1 and check, who called it (this picture is incorrect):

enter image description here

And this is very strange, I think, only first2 and outer2 called do_1 but not all.

Is it a limitation of callgrind/kcachegrind? How can I get accurate callgraph with weights (proportional to running time of every function, with and without its childs)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

友谊不毕业 2024-12-12 08:01:43

是的，这是 callgrind 格式的限制。它不存储完整的跟踪；它只存储父子调用信息。

有一个带有 pprof/libprofiler.so CPU 分析器的 google-perftools 项目，http ://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html。 libprofiler.so 可以通过调用跟踪获取配置文件，并且它将存储带有完整回溯的每个跟踪事件。 pprof 是将 libprofile 的输出转换为图形格式或 callgrind 格式的转换器。在完整视图中，结果将与 kcachegrind 中的结果相同；但如果您要关注某些功能，例如 do_1 使用 pprof 的选项 focus;当专注于函数时，它将显示准确的调用树。

回复收藏 0 原文

~没有更多了~