如何考虑 CPU 缓存的影响来分析 .net 应用程序？

发布于 2024-09-12 02:53:41 字数 723 浏览 16 评论 0原文

我所知道的所有 .net 分析器都没有考虑 CPU 缓存的影响。

鉴于从 CPU 缓存读取字段比从主内存读取字段快 100 倍，这可能是一个重要因素。（我只需在答案中解释这一点）

我有看到太多人花了很长时间来加速分析器认为很慢的循环，而在现实生活中，CPU 缓存使它们变得更快。

例如，我希望能够查看数据访问是否缺少 CPU 缓存除了获得基本的分析结果之外，我还可以更加信任。

过去我发现，通过使数据更加紧凑，它可以全部放入 CPU 缓存中，或者更改访问数据的其他方式可以产生很大的效果。例如

AccessArrarFromStartAndDoSomething()  
AccessArrayFromEndAndDoSomethingElse()

，如果数组不适合 CPU 缓存，那就更好

AccessArrarFromStartAndDoSomething()  
AccessArrayStartEndAndDoSomethingElse()

，但很难找到这种类型的改进。

花费更多的 CPU 周期来缩小数据，使其更好地适合 CPU 缓存，可以分散许多系统，但大多数分析器都会向您指出另一个方向。

原文

All the .net profilers I know don’t take into the account the effect of the CPU cache.

Given that reading a field from the CPU cache can be 100 faster than reading it from main memory, it can be a big factor. (I just had to explain this in an answer)

I have seen too many people spend a long timer speeding up loops that a profiler says are slow, when in real life the cpu cache makes them fast.

E.g I wish to be able to see if a data access is missing the cpu cache a lot as well as just getting basic profiling results I can trust more.

In the past I have found that by making my data more compacted it would all fit in the CPU cache, or changing the other the data is access in can have a big effect. E.g.

AccessArrarFromStartAndDoSomething()  
AccessArrayFromEndAndDoSomethingElse()

Is better then

AccessArrarFromStartAndDoSomething()  
AccessArrayStartEndAndDoSomethingElse()

if the array will not fit in the CPU Cache, but it is very hard to find that type of inprovment.

Spending more cpu cycles to make the data smaller so it fits in the CPU Cache better can spread up a lot of systems, but most profilers will point you in the other direction.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ヤ经典坏疍 2024-09-19 02:53:45

我可能误解了你的问题，但我认为答案只是将你的分析器切换到高精度、低细节模式。一个示例是使用 ANTS Performance Profiler 新的采样模式：

http://www.simple-talk. com/community/blogs/andrewh/archive/2009/11/13/76420.aspx

回复收藏 0 原文

岁月打碎记忆 2024-09-19 02:53:44

我见过太多人花了
长定时器加速循环
分析器说速度很慢，但实际上却很慢
CPU 缓存使它们变得更快。

有些分析员真的很擅长这样的废话。

您的总体目标是什么？您希望在更短的挂钟时间内完成计算吗？

如果不是，请忽略此答案。

如果是这样，您需要知道是什么导致了您可以摆脱的挂钟时间的消耗。

这与计时的准确性无关。这关系到位置的准确性。我建议你真正需要知道的是哪些代码行 1) 花费了合理的时间，2) 可以做得更好或根本不做。这就是您需要知道的，因为如果没有这样的代码行，那么您要优化什么？

查找此类代码行的一个极好的方法是使用以下任何分析器：1) 在调用堆栈的挂钟时间（而不是CPU时间）进行采样，并且2 ) 告诉您，对于出现在调用堆栈上的每一行代码（不是函数），它出现在堆栈中的百分比。您的优化候选品系属于比例较大的品系之一。（一些非 .net 示例：Zoom 和 LTProf。）

坦率地说，我使用的分析器是您已经拥有的。我只是在程序运行缓慢时暂停它并查看堆栈。我不需要很多样品。事实上，如果有一行代码我可以不用，如果它只出现在两个个样本上，我就知道它值得修复，并且达到这一点所需的样本越少，它更大。这里有更彻底的解释。

几乎总是存在多个“瓶颈”。所以我找到一个大的，修复它，然后再做一遍。修复瓶颈对剩余瓶颈的作用是——让它们变得更大。这种“放大效应”可以让你继续前进，直到没有更多的速度可以挤出。