当前位置：文江博客话题详情

C# visual-studio-2010 refactoring profiling

函数分析问题 - Visual Studio 2010 Ultimate

发布于 2024-10-29 07:49:45 字数 1717 浏览 2 评论 0 原文

我正在尝试分析我的应用程序以监视重构之前和之后函数的效果。我对我的应用程序进行了分析，并查看了摘要，我注意到 Hot Path 列表没有提到我使用的任何函数，它只提到了 Application.Run() 之前的函数

我对分析还很陌生，想知道如何获得有关 Hot Path 的更多信息通过 MSDN 文档演示的路径；

MSDN 示例：

MSDN 示例

我的结果：

Hot Path Summary

我注意到在输出窗口中有很多与加载符号失败相关的消息，其中一些是以下;

Failed to load symbols for C:\Windows\system32\USP10.dll.  
Failed to load symbols for C:\Windows\system32\CRYPTSP.dll.
Failed to load symbols for (Omitted)\WindowsFormsApplication1\bin\Debug\System.Data.SQLite.dll.
Failed to load symbols for C:\Windows\system32\GDI32.dll.  
Failed to load symbols for C:\Windows\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.7601.17514_none_41e6975e2bd6f2b2\comctl32.dll.
Failed to load symbols for C:\Windows\system32\msvcrt.dll. 
Failed to load symbols for C:\Windows\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll.
Failed to load symbols for C:\Windows\Microsoft.Net\assembly\GAC_32\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll.  Failed to load symbols for
C:\Windows\Microsoft.Net\assembly\GAC_32\System.Transactions\v4.0_4.0.0.0__b77a5c561934e089\System.Transactions.dll.
Unable to open file to serialize symbols: Error VSP1737: File could not be opened due to sharing violation: - D:\(Omitted)\WindowsFormsApplication1110402.vsp

（使用代码工具格式化，以便可读）

感谢您的指点。

原文

I am trying to profile my application to monitor the effects of a function, both before and after refactoring. I have performed an analysis of my application and having looked at the Summary I've noticed that the Hot Path list does not mention any of my functions used, it only mentions functions up to Application.Run()

I'm fairly new to profiling and would like to know how I could get more information about the Hot Path as demonstrated via the MSDN documentation;

MSDN Example:

MSDN Example

My Results:

Hot Path Summary

I've noticed in the Output Window there are a lot of messages relating to a failure when loading symbols, a few of them are below;

Failed to load symbols for C:\Windows\system32\USP10.dll.  
Failed to load symbols for C:\Windows\system32\CRYPTSP.dll.
Failed to load symbols for (Omitted)\WindowsFormsApplication1\bin\Debug\System.Data.SQLite.dll.
Failed to load symbols for C:\Windows\system32\GDI32.dll.  
Failed to load symbols for C:\Windows\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.7601.17514_none_41e6975e2bd6f2b2\comctl32.dll.
Failed to load symbols for C:\Windows\system32\msvcrt.dll. 
Failed to load symbols for C:\Windows\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll.
Failed to load symbols for C:\Windows\Microsoft.Net\assembly\GAC_32\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll.  Failed to load symbols for
C:\Windows\Microsoft.Net\assembly\GAC_32\System.Transactions\v4.0_4.0.0.0__b77a5c561934e089\System.Transactions.dll.
Unable to open file to serialize symbols: Error VSP1737: File could not be opened due to sharing violation: - D:\(Omitted)\WindowsFormsApplication1110402.vsp

(Formatted using code tool so it's readable)

Thanks for any pointers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拧巴小姐 2024-11-05 07:49:45

摘要视图中显示的“热路径”是基于包容性样本数量（来自函数的样本以及函数所调用的函数的样本）和独家样本（仅来自函数的样本）的最昂贵的呼叫路径。当Profiler的驱动程序捕获堆栈时，“示例”只是该功能在堆栈顶部的事实（以很小的定时间隔发生）。因此，函数的样本越多，执行越多。

默认用于采样分析，启用了一个称为“ 只是我的代码”的功能，它可以隐藏来自非用户模块的堆栈上的函数（如果通过调用，它将显示1个非用户函数的深度用户功能; case application.run ）。来自未加载符号或来自Microsoft已知的模块的模块的功能将被排除在外。摘要视图中您的“热路径”表明，最昂贵的堆栈没有任何内容，从探索者认为是您的代码（ main ）之外。来自MSDN的示例显示了更多功能，因为 PeopletRax。*和 peoplens。*函数来自“用户代码”。可以通过单击摘要视图上的“显示所有代码”链接来关闭“仅我的代码”，但我不建议这样做。

查看摘要视图中的“ 函数”。这显示了具有最高独家样本计数的功能，因此，根据分析方案，是最昂贵的函数。您应该在此处查看更多功能（或功能调用的功能）。此外，“ 函数”和“ 呼叫树”视图可能会显示更多详细信息（报告的顶部有一个下拉列表以选择当前视图）。

至于您的符号警告，大多数是因为它们是Microsoft模块（不包括System.data.sqlite.dll）。虽然您不需要这些模块的符号来正确分析您的报告，但如果您在“工具 - ＆gt; options - ＆gt; debugging - ＆gt;符号”中检查了“ Microsoft符号服务器”，并重新打开了报告，则这些符号，这些符号模块应加载。请注意，第一次打开报告需要更长的时间，因为需要下载和缓存符号。

关于未能序列化符号输入报告文件的另一个警告是该文件无法写入的结果，因为它是通过阻止写作的其他内容打开的。符号序列化是一种优化，允许探索器直接从下一个分析的报告文件中加载符号信息。没有符号序列化，分析只需要执行与首次开放报告时相同的工作。

最后，您可能还想尝试 instrumentation ，而不是在分析会话设置中进行采样。仪器修改了您指定的模块，以捕获每个功能调用上的数据（请注意，这可能会导致大得多的.vsp文件）。仪器是专注于特定代码时机的理想选择，而采样是一般低空概要数据收集的理想选择。

The "Hot Path" shown on the summary view is the most expensive call path based on the number of inclusive samples (samples from the function and also samples from functions called by the function) and exclusive samples (samples only from the function). A "sample" is just the fact the function was at the top of the stack when the profiler's driver captured the stack (this occurs at very small timed intervals). Thus, the more samples a function has, the more it was executing.

By default for sampling analysis, a feature called "Just My Code" is enabled that hides functions on the stack coming from non-user modules (it will show a depth of 1 non-user functions if called by a user function; in your case Application.Run). Functions coming from modules without symbols loaded or from modules known to be from Microsoft would be excluded. Your "Hot Path" on the summary view indicates that the most expensive stack didn't have anything from what the profiler considers to be your code (other than Main). The example from MSDN shows more functions because the PeopleTrax.* and PeopleNS.* functions are coming from "user code". "Just My Code" can be turned off by clicking the "Show All Code" link on the summary view, but I would not recommend doing so here.

Take a look at the "Functions Doing The Most Individual Work" on the summary view. This displays functions that have the highest exclusive sample counts and are therefore, based on the profiling scenario, the most expensive functions to call. You should see more of your functions (or functions called by your functions) here. Additionally, the "Functions" and "Call Tree" view might show you more details (there's a drop-down at the top of the report to select the current view).

As for your symbol warnings, most of those are expected because they are Microsoft modules (not including System.Data.SQLite.dll). While you don't need the symbols for these modules to properly analyze your report, if you checked "Microsoft Symbol Servers" in "Tools -> Options -> Debugging -> Symbols" and reopened the report, the symbols for these modules should load. Note that it'll take much longer to open the report the first time because the symbols need to be downloaded and cached.

The other warning about the failure to serialize symbols into the report file is the result of the file not being able to be written to because it is open by something else that prevents writing. Symbol serialization is an optimization that allows the profiler to load symbol information directly from the report file on the next analysis. Without symbol serialization, analysis simply needs to perform the same amount of work as when the report was opened for the first time.

And finally, you may also want to try instrumentation instead of sampling in your profiling session settings. Instrumentation modifies modules that you specify to capture data on each and every function call (be aware that this can result in a much, much larger .vsp file). Instrumentation is ideal for focusing in on the timing of specific pieces of code, whereas sampling is ideal for general low-overhead profiling data collection.

回复收藏 0 原文

帅的被狗咬 2024-11-05 07:49:45

如果我谈论分析，什么有效，什么无效，您是否介意太多？

让我们弥补一个人工程序，其中一些陈述正在从事可以优化的工作 - 即它们并不是真正的必要条件。
它们是“瓶颈”。

子例程 foo 运行一个需要一秒钟的CPU结合循环。
与其他所有内容相比，还假设子例程呼叫和返回指令需要微不足道或零时间。

子例程 bar 调用 foo 10次，但是其中9个次是不必要的，直到您的注意力指向在那里之前，您才提前不知道并且无法分辨。

子例程 a ， b ， c ，...， j 是10个子例程，他们各自调用 bar 一次。

顶级例程 main 一次通过 a j 调用每一个。

因此，总通话树看起来像这样：

main
  A
    bar
      foo
      foo
      ... total 10 times for 10 seconds
  B
    bar
      foo
      foo
      ...
  ...
  J
    ...
(finished)

全部需要多长时间？显然，100秒。

现在让我们看一下分析策略。
堆栈样品（如说1000个样品）以统一的间隔采集。

有自我时间吗？是的。 foo 需要100％的自我时间。
这是一个真正的“热点”。
这有助于您找到瓶颈吗？否。因为它不在 foo 中。
什么是热路？好吧，堆栈样品看起来像这样：

main-＆gt; a - ＆gt;酒吧 - ＆gt; foo（100个样本，或10％）
主 - ＆gt; b - ＆gt;酒吧 - ＆gt; foo（100个样本，或10％）
...
主 - ＆gt; j - ＆gt;酒吧 - ＆gt; foo（100个样本，或10％）

有10条热路径，而且它们看起来都足够大，无法获得太大的加速。

如果您碰巧猜测，如果Profiler允许，则可以制作 bar 呼叫树的“根”。然后，您会看到这一点：

bar -> foo (1000 samples, or 100%)

那么您会知道 foo 和 bar 每个人都对100％的时间独立责任，因此是寻求优化的地方。
您查看 foo ，但是您当然知道问题不存在。
然后，您查看 bar ，然后看到 foo 的10个调用，并且看到其中9个是不必要的。问题解决了。

如果您没有碰巧猜测，而是剖面人员只是向您展示了包含每个例程的样本的百分比，那么您会看到：

main 100%
bar  100%
foo  100%
A    10%
B    10%
...
J    10%

告诉您 main ， bar 代码>和 foo 。您会发现 main 和 foo 是无辜的。您可以查看 bar 调用 foo ，然后看到问题，因此已解决。

如果除了显示功能外，还可以向您显示调用功能的行，这一点更清楚。这样，无论源文本的功能有多大，您都可以找到问题。

现在，让我们更改 foo ，以便它做 sleep（OneseCond）而不是CPU绑定。这如何改变事物？

这意味着壁钟仍然需要100秒，但CPU时间为零。仅CPU的采样器中的采样将显示 nothing 。

因此，现在告诉您尝试仪器而不是采样。它包含在它告诉您的所有内容中，还告诉您上面显示的百分比，因此在这种情况下，您可以找到问题，假设 bar 不是很大。（可能有理由编写小功能，但是应该满足剖面师是其中的一员吗？）

实际上，采样器最主要的是，它不能在 sleep> sleep （或i/i/ o或其他阻塞），它不会显示您的代码行percents，而仅显示功能percents。

顺便说一句，有1000个样品为您提供了漂亮的精确易感。假设您采集的样本更少。您实际需要几个找到瓶颈？好吧，由于瓶颈在90％的堆栈上，如果您只采用10个样本，则大约有9个样本，因此您仍然会看到它。
如果您甚至只有3个样本，它将出现在两个或多个样本中的概率为97.2％。**

当您的目标是找到瓶颈时，高样本率就被高估了。

无论如何，这就是为什么我依靠随机-pausing 。

**我是如何获得97.2％的？可以将其视为将硬币扔3次，这是一个非常不公平的硬币，“ 1”是指看到瓶颈。有8种可能性：

       #1s  probabality
0 0 0  0    0.1^3 * 0.9^0 = 0.001
0 0 1  1    0.1^2 * 0.9^1 = 0.009
0 1 0  1    0.1^2 * 0.9^1 = 0.009
0 1 1  2    0.1^1 * 0.9^2 = 0.081
1 0 0  1    0.1^2 * 0.9^1 = 0.009
1 0 1  2    0.1^1 * 0.9^2 = 0.081
1 1 0  2    0.1^1 * 0.9^2 = 0.081
1 1 1  3    0.1^0 * 0.9^3 = 0.729

因此看到它2或3次的概率为.081*3 + .729 = .972

Do you mind too much if I talk a bit about profiling, what works and what doesn't?

Let's make up an artificial program, some of whose statements are doing work that can be optimized away - i.e. they are not really necessary.
They are "bottlenecks".

Subroutine foo runs a CPU-bound loop that takes one second.
Also assume subroutine CALL and RETURN instructions take insignificant or zero time, compared to everything else.

Subroutine bar calls foo 10 times, but 9 of those times are unnecessary, which you don't know in advance and can't tell until your attention is directed there.

Subroutines A, B, C, ..., J are 10 subroutines, and they each call bar once.

The top-level routine main calls each of A through J once.

So the total call tree looks like this:

main
  A
    bar
      foo
      foo
      ... total 10 times for 10 seconds
  B
    bar
      foo
      foo
      ...
  ...
  J
    ...
(finished)

How long does it all take? 100 seconds, obviously.

Now let's look at profiling strategies.
Stack samples (like say 1000 samples) are taken at uniform intervals.

Is there any self time? Yes. foo takes 100% of the self time.
It's a genuine "hot spot".
Does that help you find the bottleneck? No. Because it is not in foo.
What is the hot path? Well, the stack samples look like this:

main -> A -> bar -> foo (100 samples, or 10%)
main -> B -> bar -> foo (100 samples, or 10%)
...
main -> J -> bar -> foo (100 samples, or 10%)

There are 10 hot paths, and none of them look big enough to gain you much speedup.

IF YOU HAPPEN TO GUESS, and IF THE PROFILER ALLOWS, you could make bar the "root" of your call tree. Then you would see this:

bar -> foo (1000 samples, or 100%)

Then you would know that foo and bar were each independently responsible for 100% of the time and therefore are places to look for optimization.
You look at foo, but of course you know the problem isn't there.
Then you look at bar and you see the 10 calls to foo, and you see that 9 of them are unnecessary. Problem solved.

IF YOU DIDN'T HAPPEN TO GUESS, and instead the profiler simply showed you the percent of samples containing each routine, you would see this:

main 100%
bar  100%
foo  100%
A    10%
B    10%
...
J    10%

That tells you to look at main, bar, and foo. You see that main and foo are innocent. You look at where bar calls foo and you see the problem, so it's solved.

It's even clearer if in addition to showing you the functions, you can be shown the lines where the functions are called. That way, you can find the problem no matter how large the functions are in terms of source text.

NOW, let's change foo so that it does sleep(oneSecond) rather than be CPU bound. How does that change things?

What it means is it still takes 100 seconds by the wall clock, but the CPU time is zero. Sampling in a CPU-only sampler will show nothing.

So now you are told to try instrumentation instead of sampling. Contained among all the things it tells you, it also tells you the percentages shown above, so in this case you could find the problem, assuming bar was not very big. (There may be reasons to write small functions, but should satisfying the profiler be one of them?)

Actually, the main thing wrong with the sampler was that it can't sample during sleep (or I/O or other blocking), and it doesn't show you code line percents, only function percents.

By the way, 1000 samples gives you nice precise-looking percents. Suppose you took fewer samples. How many do you actually need to find the bottleneck? Well, since the bottleneck is on the stack 90% of the time, if you took only 10 samples, it would be on about 9 of them, so you'd still see it.
If you even took as few as 3 samples, the probability it would appear on two or more of them is 97.2%.**

High sample rates are way overrated, when your goal is to find bottlenecks.

Anyway, that's why I rely on random-pausing.

** How did I get 97.2 percent? Think of it as tossing a coin 3 times, a very unfair coin, where "1" means seeing the bottleneck. There are 8 possibilities:

       #1s  probabality
0 0 0  0    0.1^3 * 0.9^0 = 0.001
0 0 1  1    0.1^2 * 0.9^1 = 0.009
0 1 0  1    0.1^2 * 0.9^1 = 0.009
0 1 1  2    0.1^1 * 0.9^2 = 0.081
1 0 0  1    0.1^2 * 0.9^1 = 0.009
1 0 1  2    0.1^1 * 0.9^2 = 0.081
1 1 0  2    0.1^1 * 0.9^2 = 0.081
1 1 1  3    0.1^0 * 0.9^3 = 0.729

so the probability of seeing it 2 or 3 times is .081*3 + .729 = .972

回复收藏 0 原文

~没有更多了~