如何在XPerf中定位空闲时间(以及网络IO时间等)?

发布于 2024-10-02 11:18:06 字数 748 浏览 3 评论 0原文

假设我有一个人为的程序:

#include <Windows.h>

void useless_function()
{
    Sleep(5000);
}

void useful_function()
{
    // ... do some work
    useless_function();
    // ... do some more work
}

int main()
{
    useful_function();
    return 0;
}

目标:我希望探查器告诉我 useful_function() 不必要地调用 useless_function() ,它等待没有明显的原因。在 XPerf 下,这不会出现在我的任何图表中,因为对 WaitForMultipleObjects() 的调用似乎是由 Idle.exe 而不是我自己的程序负责的。

这是我当前运行的 xperf 命令行:

xperf -on Latency -stackwalk Profile

有什么想法吗?

(这不仅限于等待函数。上述问题可能可以通过在 NtWaitForMultipleObjects 处放置断点来解决。理想情况下,可以有一种方法来查看占用大量内存的堆栈示例 -时钟时间而不只是 CPU 时间)

Let's say I have a contrived program:

#include <Windows.h>

void useless_function()
{
    Sleep(5000);
}

void useful_function()
{
    // ... do some work
    useless_function();
    // ... do some more work
}

int main()
{
    useful_function();
    return 0;
}

Objective: I want the profiler to tell me useful_function() is needlessly calling useless_function() which waits for no obvious reasons. Under XPerf, this doesn't show up in any of the graphs I have because the call to WaitForMultipleObjects() seem to be accounted to Idle.exe instead of my own program.

And here's the xperf command line that I currently run:

xperf -on Latency -stackwalk Profile

Any ideas?

(This is not restricted to wait functions. The above might have been solved by placing breakpoints at NtWaitForMultipleObjects. Ideally there could be a way to see the stack sample that's taking up a lot of wall-clock time as opposed to only CPU time)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

余罪 2024-10-09 11:18:06

我认为您正在寻找的是 Xperf 中的使用就绪线程进行等待分析功能。它捕获每个上下文切换,并在线程从睡眠(或其他阻塞操作)唤醒后为您提供线程的调用堆栈。在您的情况下,您会在调用 sleep(5000) 后看到堆栈以及睡眠时间。

该功能使用起来有点晦涩。但幸运的是,这里有很好的描述:

使用 Xperf 的等待分析进行应用程序性能故障排除

I think what you are looking for is the Wait analysis with Ready Thread functionality in Xperf. It captures every context switch and gives you the call stack of the thread once it wakes up from sleep (or an otherwise blocked operation). In your case, you would see the stack just after the call sleep(5000) as well as the time spend sleeping.

The functionality is a bit obscure to use. But it is fortunately well described here:

Use Xperf's Wait Analysis for Application-Performance Troubleshooting

疧_╮線 2024-10-09 11:18:06

等待分析就是做到这一点的方法。您应该:

  • 记录 CSWITCH 提供程序,以便获取所有上下文切换
  • 通过将 +CSWITCH 添加到 -stackwalk 参数来记录上下文切换上的调用堆栈
  • 可能在就绪线程上记录调用堆栈,以获取有关谁准备好您的更多信息(即;谁)释放互斥体或 CS 或信号量以及在哪里)通过将 +READYTHREAD 添加到您的 -stackwalk

然后您使用 WPA(或 xperfview,但这是古老的)中的 CPU 使用率(精确)来查看上下文切换并找到您的 TimeSinceLast 处于高位的位置一个不应该闲置的线程。您通常希望 CPU 使用率(精确)中的列按以下顺序排列:

  • NewProcess(正在切换的进程)
  • NewThreadId
  • NewThreadStack
  • ReadyingProcess(谁让您的线程准备好运行)
  • ReadyingThreadId(可选)
  • ReadyThreadStack(可选,需要 +ReadyThread) on -stackwalk)
  • 橙色条
  • Count
  • TimeSinceLast (us) - 按此列排序,通常是
  • 您想要的任何其他列

有关详细信息,请参阅我的博客中的这些特定文章:
- https://randomascii.wordpress.com/2014 /08/19/etw-training-videos-available-now/
- https://randomascii.wordpress.com/2012/06 /19/wpaxperf-trace-analysis-reimagined/

Wait Analysis is the way to do this. You should:

  • Record the CSWITCH provider, in order to get all context switches
  • Record call stacks on context switches by adding +CSWITCH to your -stackwalk argument
  • Probably record call stacks on the ready thread to get more information on who readied you (i.e.; who released the Mutex or CS or semaphore and where) by adding +READYTHREAD to your -stackwalk

Then you use CPU Usage (Precise) in WPA (or xperfview, but that's ancient) to look at the context switches and find where your TimeSinceLast is high on a thread that shouldn't be going idle. You'll typically want the columns in CPU Usage (Precise) in this sort of order:

  • NewProcess (your process being switched in)
  • NewThreadId
  • NewThreadStack
  • ReadyingProcess (who made your thread ready to run)
  • ReadyingThreadId (optional)
  • ReadyThreadStack (optional, requires +ReadyThread on -stackwalk)
  • Orange bar
  • Count
  • TimeSinceLast (us) - sort by this column, usually
  • Whatever other columns you want

For details see these particular articles from my blog:
- https://randomascii.wordpress.com/2014/08/19/etw-training-videos-available-now/
- https://randomascii.wordpress.com/2012/06/19/wpaxperf-trace-analysis-reimagined/

梅倚清风 2024-10-09 11:18:06

这个“分析器”会告诉您 - 只需随机暂停几次并查看堆栈即可。如果做一些工作需要 5 秒,而做更多工作需要 5 秒,那么 33% 的时间堆栈将如下所示

main: calling useful_function
useful_function: calling useless_function
useless_function: calling Sleep

所以大约 33% 的堆栈样品将准确地表明这一点。任何花费了挂钟时间一部分的代码行都会出现在大约该部分的样本上。

在其余的示例中,您将看到它执行其他操作。

有一些自动分析器可以以更漂亮的方式执行相同的操作,例如 ZoomLTProf,尽管他们实际上并没有向您展示示例。

我查看了 xperf 文档,试图弄清楚是否可以在挂钟时间上获取堆栈样本并获取行级分辨率的百分比。看来你必须使用 Windows 7 或 Vista。他们只关心功能,而不关心线路,如果您有真正的大功能,线路就很重要。我不知道如何访问各个样本,我认为这对于了解该程序为何花费时间非常重要。

This "profiler" will tell you - just randomly pause it a few times and look at the stack. If do some work takes 5 seconds, and do some more work takes 5 seconds, then 33% of the time the stack will look like this

main: calling useful_function
useful_function: calling useless_function
useless_function: calling Sleep

So roughly 33% of your stack samples will show exactly that. Any line of code that's costing some fraction of wall-clock time will appear on roughly that fraction of samples.

On the rest of the samples you will see it doing the other things.

There are automated profilers that do the same thing in a more pretty way, such as Zoom and LTProf, although they don't actually show you the samples.

I looked at the xperf doc, trying to figure out if you could get stack samples on wall-clock time and get percents at line-level resolution. It seems you gotta be on Windows 7 or Vista. They only bother with functions, not lines, which if you have realistically big functions, is important. I couldn't figure out how to get access to the individual samples, which I think is important for seeing why the program is spending its time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文