Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question 2 years ago and left it closed:
Original close reason(s) were not resolved
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(6)
使用
gcc
,我用-pg
进行编译和链接(如解释,例如 此处),然后继续运行该程序(根据建议的原则该 URL)并使用gprof
。如果您使用不同的编译器,这些工具会有所不同,但即使如此,对于有关如何以及为何分析代码的一般想法的部分,仍然建议使用 URL。Using
gcc
, I compile and link with-pg
(as explained e.g. here), then continue by running the program (according to the principles also suggested at that URL) and usinggprof
. The tools will vary if you're using different compilers &c, but the URL is still recommended, even then, for the parts that are about general ideas on how and why to profile your code.如果您使用的是 Linux,那么我建议结合使用 ValGrind 和 CallGrind 和 KCacheGrind。 ValGrind 是查找内存泄漏的绝佳方法,而 CallGrind 扩展则成为一个很好的分析器。
注意:我刚刚学到了 ValGrind 现在也可以在 Mac OSX 上运行。然而,CallGrind 和 KCacheGrind 自 2005 年以来就没有更新过。您可能想看看其他前端。
If you are using Linux, then I recommend the combination of ValGrind and CallGrind and KCacheGrind. ValGrind is a superb method for finding memory leaks, and the CallGrind extension makes for a good profiler.
NOTE: I just learned that ValGrind now also works on Mac OSX. However, CallGrind and KCacheGrind haven't been updated since 2005. You might want to look at other front-ends.
很高兴您提出这个问题:-)如果您不介意逆向思维,请检查以下答案:
让我尝试简而言之:
是程序在等你,还是你在等它?如果它不让你等待,那么你就没有问题,所以不要管它。
如果它确实让你等待,那么就继续。
我建议进行采样,即在程序繁忙时(不等你)对程序正在执行的操作进行频闪 X 射线检查。至少获取调用堆栈的样本,而不仅仅是程序计数器。如果您只获取程序计数器的样本,那么如果您的程序在 I/O 或库例程中花费大量时间,那么它将毫无意义,因此不要满足于此。
如果你想获得大量样本,你需要一个分析器。如果您只需要几个,调试器中的暂停按钮就可以正常工作。根据我的经验,20 个就足够了,5 个通常就足够了。
为什么?假设您有 1000 个调用堆栈样本。每个样本都代表一小部分挂钟时间,这些时间的消耗只是因为堆栈上的每一行代码都请求它。因此,如果一行代码出现在 1000 个样本中的 557 个样本上,您可以假设它负责 557/1000 的时间,给出或取出一些样本 (15)。这意味着,如果整个执行时间花费您 100 美元,那么该行本身的成本为 55.70 美元,相差 1.50 美元**,所以您应该看看您是否真的需要它。
但你需要 1000 个样本吗?如果该生产线花费了大约 55.7% 的时间,那么如果您只抽取 10 个样本,您会在其中 6 个样本上看到它,给出或抽取 1.5 个样本。因此,如果您确实看到 10 个样品中有 6 个的声明,您就知道这 100 美元中您的花费大约为 45 至 75 美元。即使它的价格只有 45 美元,您难道不想看看您是否真的需要它吗?
这就是为什么您不需要大量样本 - 您不需要太多准确性。您真正需要的是堆栈示例为您提供的内容 - 它们精确地为您指出了最有价值的优化行。
** 样本数量的标准差为
sqrt( f * (1-f) * nsamp )
,其中f
是包含该行的样本分数。Glad You Asked :-) If you don't mind contrarian, check these answers:
Let me try to put it in a nutshell:
Does the program wait for you, or do you wait for it? If it doesn't make you wait for it, then you don't have a problem, so leave it alone.
If it does make you wait, then proceed.
I recommend sampling, which is get stroboscopic X-rays of what the program is doing when it's busy (not waiting for you). Get samples at least of the call stack, not just the program counter. If you only get samples of the program counter, it will be meaningless if your program spends significant time in I/O or in library routines, so don't settle for that.
If you want to get a lot of samples, you need a profiler. If you only need a few, the pause button in the debugger works just fine. In my experience, 20 is more than enough, and 5 is often sufficient.
Why? Suppose you have 1000 samples of the call stack. Each sample represents a sliver of wall-clock time that is being spent only because every single line of code on the stack requested it. So, if there is a line of code that appears on 557 samples out of 1000, you can assume it is responsible for 557/1000 of the time, give or take a few samples (15). That means, if the entire execution time was costing you $100, that line by itself is costing $55.70, give or take $1.50 **, so you should look to see if you really need it.
But do you need 1000 samples? If that line is costing about 55.7% of the time, then if you only took 10 samples, you would see it on 6 of them, give or take 1.5 samples. So if you do see a statement on 6 out of 10 samples, you know it is costing you roughly between $45 and $75 out of that $100. Even if it's only costing as little as $45, wouldn't you want to see if you really need it?
That's why you don't need a lot of samples - you don't need a lot of accuracy. What you do need is what the stack samples give you - they point you precisely at the most valuable lines to optimize.
** The standard deviation of the number of samples is
sqrt( f * (1-f) * nsamp )
wheref
is the fraction of samples containing the line.为了完整起见,我将添加 oprofile。如果您想对内核进行基准测试,这会特别有趣。
For the sake of completion i would add oprofile. It is especially interesting if you want to benchmark the kernel.
Shark / Instruments(使用 dtrace)是 Mac 上可用的分析器。他们都很好。
Shark / Instruments (using dtrace) are the profilers available on a Mac. They're pretty good.
Visual Studio Team System 附带了一个很好的分析器。
此外,英特尔 VTune 也不错。
Visual Studio Team System comes with a good profiler.
Also, Intel VTune is not bad.