奇怪的分析器行为:相同的功能,不同的性能
我正在学习使用 gprof,然后我得到了这段代码的奇怪结果:
int one(int a, int b)
{
int i, r = 0;
for (i = 0; i < 1000; i++)
{
r += b / (a + 1);
}
return r;
}
int two(int a, int b)
{
int i, r = 0;
for (i = 0; i < 1000; i++)
{
r += b / (a + 1);
}
return r;
}
int main()
{
for (int i = 1; i < 50000; i++)
{
one(i, i * 2);
two(i, i * 2);
}
return 0;
}
这是分析器输出
% cumulative self self total
time seconds seconds calls us/call us/call name
50.67 1.14 1.14 49999 22.80 22.80 two(int, int)
49.33 2.25 1.11 49999 22.20 22.20 one(int, int)
如果我调用一然后调用二,结果是相反的,两个比一个花费更多的时间
两者都是相同的功能,但第一次调用总是比第二次调用花费更少的时间,
为什么呢?
注意:汇编代码完全相同,并且代码正在编译,没有优化
I was learning to use gprof and then i got weird results for this code:
int one(int a, int b)
{
int i, r = 0;
for (i = 0; i < 1000; i++)
{
r += b / (a + 1);
}
return r;
}
int two(int a, int b)
{
int i, r = 0;
for (i = 0; i < 1000; i++)
{
r += b / (a + 1);
}
return r;
}
int main()
{
for (int i = 1; i < 50000; i++)
{
one(i, i * 2);
two(i, i * 2);
}
return 0;
}
and this is the profiler output
% cumulative self self total
time seconds seconds calls us/call us/call name
50.67 1.14 1.14 49999 22.80 22.80 two(int, int)
49.33 2.25 1.11 49999 22.20 22.20 one(int, int)
If i call one then two the result is the inverse, two takes more time than one
both are the same functions, but the first calls always take less time then the second
Why is that?
Note: The assembly code is exactly the same and code is being compiled with no optimizations
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我的猜测:这是 mcount 数据解释方式的产物。 mcount (monitor.h) 的粒度约为 32 位长字 - 在我的系统上为 4 字节。所以你不会想到这一点:我从 prof 和 gprof 那里得到了关于完全相同的 mon.out 文件的不同报告。
索拉里斯 9 -
My guess: it is an artifact of the way mcount data gets interpreted. The granularity for mcount (monitor.h) is on the order of a 32 bit longword - 4 bytes on my system. So you would not expect this: I get different reports from prof vs gprof on the EXACT same mon.out file.
solaris 9 -
是否总是第一个调用的速度稍慢?如果是这样的话,我猜是 CPU 缓存在做这件事。或者它可能是操作系统的惰性分页。
顺便说一句:编译时使用哪些优化标志?
Is it always the first one called that is slightly slower? If that's the case, I would guess it is a CPU cache doing it's thing. or it could be lazy paging by the operating system.
BTW: what optimization flags are compiling with?
我猜想这是运行时优化中的一些侥幸——一个使用寄存器,另一个不使用或者类似的小东西。
系统时钟的运行精度可能为 100 纳秒。平均调用时间 30 纳秒或 25 纳秒不到一个时钟周期。时钟周期 5% 的舍入误差非常小。这两个时间都足够接近零。
I'd guess it is some fluke in run-time optimisation - one uses a register and the other doesn't or something minor like that.
The system clock probably runs to a precision of 100nsec. The average call time 30nsec or 25nsec is less than one clock tick. A rounding error of 5% of a clock tick is pretty small. Both times are near enough zero.