为什么 NYTProf 的逐行统计数据与总数不匹配?

发布于 2024-09-29 05:33:07 字数 857 浏览 4 评论 0原文

我有一个 Perl 脚本,需要大约 30 分钟才能运行,所以我当然运行 Devel::NYTProf。很棒的分析器。对于我的许多潜艇来说,我得到了一些对我来说没有意义的数据。

我使用默认的 NYTProf 设置在 Linux 上运行 perl 5.10.0。

在 HTML 输出中,每个子进程都有一个摘要部分,说明子进程及其子进程花费了多少时间,然后继续提供行信息。

行统计数据不等于该函数的总花费。什么给?

例如,我有一个函数,据报告使用 233s (57+166)。逐行数字报告有一行使用了 20 秒,另一行使用了 4 秒,另一行使用了 2 秒。其他行都 <1 秒,函数也没有那么长。

我可以采取什么措施来解决这种不匹配问题?

我可以迁移到 Perl 5.12,但这需要一些工作来安装依赖项。我很高兴以较慢的模式运行它。有没有办法提高采样频率?在较慢的机器上运行?

单击此处获取示例:我的 NYTProf 输出。在本例中,据报告该子系统使用了 225 秒,但将所有数字相加得出 56 秒。此运行已关闭优化:

setenv NYTPROF optimize=0:file=nytprof.optout

更新 我已按照建议使用 findcaller=1 选项标志使用 Perl 5.12 重新运行,结果或多或少相同。 (我在不同的数据集上运行)

更新 Tim B 是对的。我已经更改了一些关键子程序来自行缓存,而不是使用 memoize,并且 NYTProf 结果再次有用。谢谢蒂姆。

I have a Perl script that takes about 30 minutes to run, so of course I run Devel::NYTProf. Great profiler. For many of my subs, I'm getting some data that doesn't make sense to me.

I'm running with perl 5.10.0 on Linux using the default NYTProf settings.

In the HTML output, each of the subs has a summary section stating how much time is spent in the sub and its children and then goes on to give me line information.

The line statistics don't add up to the total spent in the function. What gives?

For example, I have a function that's reported to use 233s (57+166). The line-by-line number report has one line that uses 20s, another that uses 4 and one that uses 2. The other lines are <1s and the function is not that long.

What can I do to resolve this mismatch?

I could move to Perl 5.12 but that would take some work to install the dependencies. I'm happy to run it in a slower mode. Is there a way to increase the sampling frequency? Run on a slower machine?

Click here for a sample: my NYTProf output. In this case, the sub is reported to use 225 seconds, but adding all of the numbers yields 56 seconds. This run had optimization turned off:

setenv NYTPROF optimize=0:file=nytprof.optout

Update I've rerun with Perl 5.12 using the findcaller=1 option flag as suggested with more or less the same results. (I ran on a different dataset)

Update Tim B is right. I have changed some of my key subs to do caching themselves instead of using memoize and the NYTProf results are useful again. Thank you Tim.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最偏执的依靠 2024-10-06 05:33:07

我刚刚将其添加到 NYTProf 文档中:

=head2 If 语句和子例程时序不匹配

NYTProf 有两个分析器:一个声明
perl 时调用的探查器
从一个 Perl 语句移动到
另一个,还有一个子程序分析器
当 perl 调用或
从子例程返回。

单个语句的时间
子程序通常加起来略
小于独占时间
子程序。那是因为
子程序调用的处理和
返回间接费用包含在
子程序的独占时间。这
差异可能只是一个新的
微秒,但这可能会变成
对于子程序来说值得注意的是
调用了数十万次。

语句分析器跟踪如何
大量时间花在管理费用上,例如
将语句配置文件数据写入
磁盘。子程序分析器
减去间接费用
进入和之间累积
离开子程序以便
提供更准确的个人资料。这
语句分析器通常非常
速度快,因为大多数写入都会被缓冲
对于 zip 压缩,所以分析器
每个语句的开销往往是
非常小,通常是一个“滴答声”。这
结果就是累计的
头顶上很吵。这变成了
对于子程序来说更重要
被频繁调用并且也
快速地。这可能是另一个更小的
对差异的贡献
声明时间和独占时间之间
次。

这可能解释了语句时间列的总和(31.7 秒)与为子例程报告的独占时间(57.2 秒)之间的差异。每次调用的差异约为 100 微秒(这似乎有点高,但并非不合理)。

语句分析器跟踪开销上花费了多少时间,例如将语句分析数据写入磁盘。子例程分析器减去进入和离开子例程之间的开销差异,以便给出更准确的分析。

语句探查器通常非常快,因为大多数写入都会为 zip 压缩进行缓冲,因此每个语句的探查器开销往往非常小,通常是单个“滴答”。结果是累积的开销相当嘈杂。对于频繁调用且速度很快的子例程(在本例中,每次调用 899μs 调用 250303 个),这一点变得更加重要。因此,我怀疑这是造成语句时间和独占时间之间差异的另一个较小的原因。

更重要的是,我还添加了这一部分:

=head2 如果标题子例程时序与调用的子例程不匹配

报告总体子程序时间
标题为“花费 10 秒 (2+8)
在......”内。在此示例中,10
几秒钟内花费了
子程序(“包含时间”)以及,
其中,8 秒花费在
由这个调用的子例程。那
剩下 2 秒作为花费的时间
子程序代码本身(
“专属时间”,有时也
称为“自拍时间”)。

报告显示了源代码
子程序。拨打电话的线路
对其他子程序进行注释
以及所花费时间的详细信息
这些电话。

有时,时间的总和
由以下代码行进行的调用
子程序小于
报告的包含时间和不包含时间
标题(10-2 = 8 秒)
上面的例子)。

这里发生的事情是调用
其他子程序正在制作中,但是
NYTProf 无法确定
正确调用位置,以便
呼叫不会出现在报告中
正确的地方。

使用旧版本的 perl 是其中之一
原因 (见下文)。另一个正在打电话
通过“goto”退出的子例程
⊂” - 最常遇到
在 AUTOLOAD 子程序和代码中使用 Memoize
模块。

总的来说,整个子程序
时间是准确的并且应该是
可信度超过声明的总和
或嵌套子调用计时。

Memoize 模块是报告中出现差异的主要原因。对 Memoize::__ANON__[...] 的调用执行由 Memoize 生成的子函数,如下所示 sub { unshift @_, $cref;转到&_memoizer; }。 goto &_memoizer 是由 perl 实现的,作为一种返回给调用者,然后调用指定的子函数,这就是 NYTProf 对其进行分析的方式。

造成混乱的原因是,尽管 add_bit_to_map 被记录为 _memoizer 的调用者,因此调用时间会添加到 add_bit_to_map ,调用的文件和行号位置被记录为goto的位置。

在未来的版本中可能会改进这一点。

感谢您促使我调查此问题并改进文档。

蒂姆·邦斯.

ps 我建议在邮件列表上询问有关 NYTProf 的问题。

I've just added this to the NYTProf documentation:

=head2 If Statement and Subroutine Timings Don't Match

NYTProf has two profilers: a statement
profiler that's invoked when perl
moves from one perl statement to
another, and a subroutine profiler
that's invoked when perl calls or
returns from a subroutine.

The individual statement timings for a
subroutine usually add up to slightly
less than the exclusive time for the
subroutine. That's because the
handling of the subroutine call and
return overheads is included in the
exclusive time for the subroutine. The
difference may only be a new
microseconds but that may become
noticeable for subroutines that are
called hundreds of thousands of times.

The statement profiler keeps track how
much time was spent on overheads, like
writing statement profile data to
disk. The subroutine profiler
subtracts the overheads that have
accumulated between entering and
leaving the subroutine in order to
give a more accurate profile. The
statement profiler is generally very
fast because most writes get buffered
for zip compression so the profiler
overhead per statement tends to be
very small, often a single 'tick'. The
result is that the accumulated
overhead is quite noisy. This becomes
more significant for subroutines that
are called frequently and are also
fast. This may be another, smaller,
contribution to the discrepancy
between statement time and exclusive
times.

That probably explains the difference between the sum of the statement time column (31.7s) and the exclusive time reported for the subroutine (57.2s). The difference amounts to approximately 100 microseconds per call (which seems a little high, but not unreasonably so).

The statement profiler keeps track of how much time was spent on overheads, like writing statement profile data to disk. The subroutine profiler subtracts the difference in overheads between entering and leaving the subroutine in order to give a more accurate profile.

The statement profiler is generally very fast because most writes get buffered for zip compression so the profiler overhead per statement tends to be very small, often a single 'tick'. The result is that the accumulated overhead is quite noisy. This becomes more significant for subroutines that are called frequently and are also fast (in this case 250303 calls at 899µs/call). So I suspect this is another, smaller, contribution to the discrepancy between statement time and exclusive times.

More importantly, I've also added this section:

=head2 If Headline Subroutine Timings Don't Match the Called Subs

Overall subroutine times are reported
with a headline like "spent 10s (2+8)
within ...". In this example, 10
seconds were spent inside the
subroutine (the "inclusive time") and,
of that, 8 seconds were spent in
subroutines called by this one. That
leaves 2 seconds as the time spent in
the subroutine code itself (the
"exclusive time", sometimes also
called the "self time").

The report shows the source code of
the subroutine. Lines that make calls
to other subroutines are annotated
with details of the time spent in
those calls.

Sometimes the sum of the times for
calls made by the lines of code in the
subroutine is less than the
inclusive-exclusive time reported in
the headline (10-2 = 8 seconds in the
example above).

What's happening here is that calls to
other subroutines are being made but
NYTProf isn't able to determine the
calling location correctly so the
calls don't appear in the report in
the correct place.

Using an old version of perl is one
cause (see below). Another is calling
subroutines that exit via "goto
⊂" - most frequently encountered
in AUTOLOAD subs and code using the Memoize
module.

In general the overall subroutine
timing is accurate and should be
trusted more than the sum of statement
or nested sub call timings.

The Memoize module is primary the cause of the discrepancy in your report. The calls to Memoize::__ANON__[...] execute a sub generated by Memoize that looks like sub { unshift @_, $cref; goto &_memoizer; }. That goto &_memoizer is implemented by perl as a kind of return to the caller followed by a call to the specified sub, and that's the way NYTProf profiles it.

The confusion is caused by the fact that, although add_bit_to_map is being recorded as the caller of _memoizer so the time in the call gets added to add_bit_to_map, the file and line number location of the call is recorded as the location of the goto.

It may be possible to improve this in a future release.

Thank you for prompting me to investigate this and improve the documentation.

Tim Bunce.

p.s. I recommend asking questions about NYTProf on the mailing list.

风流物 2024-10-06 05:33:07

尝试禁用 Perl 的优化器。来自 CPAN 的 Devel::NYTProf 文档

优化=0

禁用 perl 优化器。

默认情况下 NYTProf 离开 perl
优化器已启用。这给你更多
整体上准确的配置文件计时,但是
可能会导致奇数语句计数
单独的线组。那是
因为perl的窥视孔优化器
有效地改写了
声明,但你看不到什么
重写版本看起来像。

例如:

<前><代码>1 if (...) {
2返回;
3 } 可以重写为

1 如果 (...) 返回

所以个人资料
不会显示行的语句计数
2 在你的源代码中,因为
return 被合并到 if 中
前一行的声明。

使用optimize=0选项禁用
优化器,所以你会变得更低
整体表现但更多
准确分配的语句计数。

如果您发现任何其他示例
优化器对 NYTProf 输出的影响
(显然除了性能之外)
请告诉我们。

Try disabling perl's optimizer. From CPAN's Devel::NYTProf docs:

optimize=0

Disable the perl optimizer.

By default NYTProf leaves perl's
optimizer enabled. That gives you more
accurate profile timing overall, but
can lead to odd statement counts for
individual sets of lines. That's
because the perl's peephole optimizer
has effectively rewritten the
statements but you can't see what the
rewritten version looks like.

For example:

1     if (...) {   
2         return;
3     } may be rewritten as

1    return if (...)

so the profile
won't show a statement count for line
2 in your source code because the
return was merged into the if
statement on the preceding line.

Using the optimize=0 option disables
the optimizer so you'll get lower
overall performance but more
accurately assigned statement counts.

If you find any other examples of the
effect of optimizer on NYTProf output
(other than performance, obviously)
please let us know.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文