对 C# 中的小代码示例进行基准测试,可以改进此实现吗?
我经常发现自己对小块代码进行基准测试,以查看哪种实现速度最快。
我经常看到评论说基准测试代码没有考虑抖动或垃圾收集器。
我有以下我慢慢发展的简单基准测试功能:
static void Profile(string description, int iterations, Action func) {
// warm up
func();
// clean up
GC.Collect();
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++) {
func();
}
watch.Stop();
Console.Write(description);
Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
用法:
Profile("a descriptions", how_many_iterations_to_run, () =>
{
// ... code being profiled
});
这个实现有任何缺陷吗? 是否足以证明在 Z 次迭代中实现 X 比实现 Y 更快? 您能想出什么方法可以改进这一点吗?
编辑 很明显,基于时间的方法(而不是迭代)是首选,是否有人有任何时间检查不会影响性能的实现?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
这是修改后的功能:根据社区的建议,请随意修改它,这是一个社区wiki。
确保在启用优化的情况下在发布版中进行编译,并在 Visual Studio 外部运行测试。 最后一部分很重要,因为即使在发布模式下,JIT 也会通过附加的调试器来限制其优化。
Here is the modified function: as recommended by the community, feel free to amend this its a community wiki.
Make sure you compile in Release with optimizations enabled, and run the tests outside of Visual Studio. This last part is important because the JIT stints its optimizations with a debugger attached, even in Release mode.
终结不一定会在
GC.Collect
返回之前完成。 终结将排队,然后在单独的线程上运行。 该线程在您的测试期间可能仍然处于活动状态,从而影响结果。如果您想确保在开始测试之前完成最终确定,那么您可能需要调用
GC.WaitForPendingFinalizers
,它将阻塞,直到清除终结队列:Finalisation won't necessarily be completed before
GC.Collect
returns. The finalisation is queued and then run on a separate thread. This thread could still be active during your tests, affecting the results.If you want to ensure that finalisation has completed before starting your tests then you might want to call
GC.WaitForPendingFinalizers
, which will block until the finalisation queue is cleared:如果您想将 GC 交互排除在外,您可能需要在 GC.Collect 调用之后(而不是之前)运行“预热”调用。 这样您就知道 .NET 已经从操作系统中为您的函数的工作集分配了足够的内存。
请记住,您正在为每次迭代进行非内联方法调用,因此请确保将正在测试的内容与空主体进行比较。 您还必须接受这样的事实:您只能可靠地对比方法调用长几倍的事情进行计时。
此外,根据您要分析的内容,您可能希望基于计时运行一定的时间而不是一定数量的迭代 - 它往往会导致更容易比较的数字,而无需必须有一个非常短的运行以获得最佳实施和/或一个很长的运行以达到最坏的效果。
If you want to take GC interactions out of the equation, you may want to run your 'warm up' call after the GC.Collect call, not before. That way you know .NET will already have enough memory allocated from the OS for the working set of your function.
Keep in mind that you're making a non-inlined method call for each iteration, so make sure you compare the things you're testing to an empty body. You'll also have to accept that you can only reliably time things that are several times longer than a method call.
Also, depending on what kind of stuff you're profiling, you may want to do your timing based running for a certain amount of time rather than for a certain number of iterations -- it can tend to lead to more easily-comparable numbers without having to have a very short run for the best implementation and/or a very long one for the worst.
我认为像这样的基准测试方法最难克服的问题是考虑边缘情况和意外情况。 例如 - “这两个代码片段在高 CPU 负载/网络使用/磁盘抖动/等情况下如何工作。” 它们非常适合进行基本逻辑检查,以查看特定算法的运行速度是否明显快于其他算法。 但要正确测试大多数代码性能,您必须创建一个测试来测量特定代码的特定瓶颈。
我仍然想说,测试小代码块通常没有什么投资回报,并且会鼓励使用过于复杂的代码而不是简单的可维护代码。 编写其他开发人员或我自己 6 个月后能够快速理解的清晰代码将比高度优化的代码具有更多的性能优势。
I think the most difficult problem to overcome with benchmarking methods like this is accounting for edge cases and the unexpected. For example - "How do the two code snippets work under high CPU load/network usage/disk thrashing/etc." They're great for basic logic checks to see if a particular algorithm works significantly faster than another. But to properly test most code performance you'd have to create a test that measures the specific bottlenecks of that particular code.
I'd still say that testing small blocks of code often has little return on investment and can encourage using overly complex code instead of simple maintainable code. Writing clear code that other developers, or myself 6 months down the line, can understand quickly will have more performance benefits than highly optimized code.
我完全避免传递委托:
导致使用闭包的示例代码:
如果您不了解闭包,请查看 .NET Reflector 中的此方法。
I'd avoid passing the delegate at all:
An example code leading to closure usage:
If you're not aware about closures, take a look at this method in .NET Reflector.
我会多次调用
func()
进行热身,而不只是一次。I'd call
func()
several times for the warm-up, not just one.改进建议
检测执行环境是否适合基准测试(例如检测是否附加了调试器或是否禁用 jit 优化,这会导致测量不正确)。
独立测量部分代码(以准确了解瓶颈所在)。
关于#1:
要检测是否附加了调试器,请读取属性
System.Diagnostics.Debugger.IsAttached
(记住还要处理调试器附加的情况)最初未附加,但在一段时间后附加)。要检测 jit 优化是否已禁用,请读取相关程序集的属性
DebuggableAttribute.IsJITOptimizerDisabled
:关于 #2:
这可以在许多情况下完成方法。 一种方法是允许提供多个代表,然后单独测量这些代表。
关于#3:
这也可以通过多种方式完成,不同的用例将需要非常不同的解决方案。 如果手动调用基准测试,则写入控制台可能没问题。 但是,如果基准测试是由构建系统自动执行的,那么写入控制台可能不太好。
实现此目的的一种方法是将基准测试结果作为强类型对象返回,以便可以在不同的上下文中轻松使用。
Etimo.Benchmarks
另一种方法是使用现有组件来执行基准测试。 实际上,在我的公司,我们决定将我们的基准测试工具发布到公共领域。 它的核心是管理垃圾收集器、抖动、预热等,就像这里的其他一些答案所建议的那样。 它还具有我上面建议的三个功能。 它解决了 Eric Lippert 博客。
这是一个示例输出,其中比较两个组件并将结果写入控制台。 在本例中,比较的两个组件称为“KeyedCollection”和“MultiplyIndexedKeyedCollection”:
有一个 NuGet 包,一个 示例 NuGet 包,源代码可在 GitHub。 还有一篇博客文章。
如果您很着急,我建议您获取示例包并根据需要简单地修改示例委托。 如果您不着急,最好阅读博客文章以了解详细信息。
Suggestions for improvement
Detecting if the execution environment is good for benchmarking (such as detecting if a debugger is attached or if jit optimization is disabled which would result in incorrect measurements).
Measuring parts of the code independently (to see exactly where the bottleneck is).
Regarding #1:
To detect if a debugger is attached, read the property
System.Diagnostics.Debugger.IsAttached
(Remember to also handle the case where the debugger is initially not attached, but is attached after some time).To detect if jit optimization is disabled, read property
DebuggableAttribute.IsJITOptimizerDisabled
of the relevant assemblies:Regarding #2:
This can be done in many ways. One way is to allow several delegates to be supplied and then measure those delegates individually.
Regarding #3:
This could also be done in many ways, and different use-cases would demand very different solutions. If the benchmark is invoked manually, then writing to the console might be fine. However if the benchmark is performed automatically by the build system, then writing to the console is probably not so fine.
One way to do this is to return the benchmark result as a strongly typed object that can easily be consumed in different contexts.
Etimo.Benchmarks
Another approach is to use an existing component to perform the benchmarks. Actually, at my company we decided to release our benchmark tool to public domain. At it's core, it manages the garbage collector, jitter, warmups etc, just like some of the other answers here suggest. It also has the three features I suggested above. It manages several of the issues discussed in Eric Lippert blog.
This is an example output where two components are compared and the results are written to the console. In this case the two components compared are called 'KeyedCollection' and 'MultiplyIndexedKeyedCollection':
There is a NuGet package, a sample NuGet package and the source code is available at GitHub. There is also a blog post.
If you're in a hurry, I suggest you get the sample package and simply modify the sample delegates as needed. If you're not in a hurry, it might be a good idea to read the blog post to understand the details.
您还必须在实际测量之前运行“预热”过程,以排除 JIT 编译器在抖动代码上花费的时间。
You must also run a "warm up" pass prior to actual measurement to exclude the time JIT compiler spends on jitting your code.
根据您进行基准测试的代码及其运行的平台,您可能需要考虑代码对齐如何影响性能。 为此,可能需要一个多次运行测试的外部包装器(在单独的应用程序域或进程中?),有时首先调用“填充代码”以强制对其进行 JIT 编译,从而导致代码被基准以不同方式对齐。 完整的测试结果将给出各种代码对齐的最佳情况和最坏情况时序。
Depending on the code you are benchmarking and the platform it runs on, you may need to account for how code alignment affects performance. To do so would probably require a outer wrapper that ran the test multiple times (in separate app domains or processes?), some of the times first calling "padding code" to force it to be JIT compiled, so as to cause the code being benchmarked to be aligned differently. A complete test result would give the best-case and worst-case timings for the various code alignments.
如果您试图完全消除基准测试中垃圾收集的影响,是否值得设置 GCSettings.LatencyMode ?
如果不是,并且您希望 func 中创建的垃圾的影响成为基准测试的一部分,那么您是否也应该在测试结束时(在计时器内)强制进行收集?
If you're trying to eliminate Garbage Collection impact from the benchmark complete, is it worth setting
GCSettings.LatencyMode
?If not, and you want the impact of garbage created in
func
to be part of the benchmark, then shouldn't you also force collection at the end of the test (inside the timer)?你的问题的基本问题是假设单个
测量可以回答您的所有问题。 你需要测量
多次进行,以有效了解情况并
尤其是在像 C# 这样的垃圾收集语言中。
另一个答案给出了衡量基本性能的好方法。
然而,这个单一的测量并没有考虑到垃圾
收藏。 正确的配置文件还可以考虑最坏情况下的性能
垃圾收集分布在许多调用中(这个数字排序
无用的,因为虚拟机可以终止而不收集剩余的
垃圾,但对于比较两个不同的仍然有用
func
的实现。)并且人们可能还想测量最坏情况下的性能
仅调用一次的方法的垃圾回收。
但比推荐任何具体的可能的附加措施更重要
测量轮廓的想法是应该测量多个
不同的统计数据,而不仅仅是一种统计数据。
The basic problem with your question is the assumption that a single
measurement can answer all your questions. You need to measure
multiple times to get an effective picture of the situation and
especially in a garbage collected langauge like C#.
Another answer gives an okay way of measuring the basic performance.
However, this single measurement does not account for garbage
collection. A proper profile additionally accounts for the worst case performance
of garbage collection spread out over many calls (this number is sort
of useless as the VM can terminate without ever collecting left over
garbage but is still useful for comparing two different
implementations of
func
.)And one might also want to measure the worst case performance of
garbage collection for a method that is only called once.
But more important than recommending any specific possible additional
measurements to profile is the idea that one should measure multiple
different statistics and not just one kind of statistic.