当前位置：文江博客话题详情

C# .NET performance profiling

对 C# 中的小代码示例进行基准测试，可以改进此实现吗？

发布于 2024-07-25 10:57:25 字数 851 浏览 7 评论 0 原文

我经常发现自己对小块代码进行基准测试，以查看哪种实现速度最快。

我经常看到评论说基准测试代码没有考虑抖动或垃圾收集器。

我有以下我慢慢发展的简单基准测试功能：

  static void Profile(string description, int iterations, Action func) {
        // warm up 
        func();
        // clean up
        GC.Collect();

        var watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < iterations; i++) {
            func();
        }
        watch.Stop();
        Console.Write(description);
        Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
    }

用法：

Profile("a descriptions", how_many_iterations_to_run, () =>
{
   // ... code being profiled
});

这个实现有任何缺陷吗？是否足以证明在 Z 次迭代中实现 X 比实现 Y 更快？您能想出什么方法可以改进这一点吗？

编辑很明显，基于时间的方法（而不是迭代）是首选，是否有人有任何时间检查不会影响性能的实现？

原文

Quite often on SO I find myself benchmarking small chunks of code to see which implemnetation is fastest.

Quite often I see comments that benchmarking code does not take into account jitting or the garbage collector.

I have the following simple benchmarking function which I have slowly evolved:

  static void Profile(string description, int iterations, Action func) {
        // warm up 
        func();
        // clean up
        GC.Collect();

        var watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < iterations; i++) {
            func();
        }
        watch.Stop();
        Console.Write(description);
        Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
    }

Usage:

Profile("a descriptions", how_many_iterations_to_run, () =>
{
   // ... code being profiled
});

Does this implementation have any flaws? Is it good enough to show that implementaion X is faster than implementation Y over Z iterations? Can you think of any ways you would improve this?

EDIT
Its pretty clear that a time based approach (as opposed to iterations), is preferred, does anyone have any implementations where the time checks do not impact performance?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

翻了热茶 2024-08-01 10:57:25

这是修改后的功能：根据社区的建议，请随意修改它，这是一个社区wiki。

static double Profile(string description, int iterations, Action func) {
    //Run at highest priority to minimize fluctuations caused by other processes/threads
    Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
    Thread.CurrentThread.Priority = ThreadPriority.Highest;

    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
    return watch.Elapsed.TotalMilliseconds;
}

确保在启用优化的情况下在发布版中进行编译，并在 Visual Studio 外部运行测试。最后一部分很重要，因为即使在发布模式下，JIT 也会通过附加的调试器来限制其优化。

Here is the modified function: as recommended by the community, feel free to amend this its a community wiki.

static double Profile(string description, int iterations, Action func) {
    //Run at highest priority to minimize fluctuations caused by other processes/threads
    Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
    Thread.CurrentThread.Priority = ThreadPriority.Highest;

    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
    return watch.Elapsed.TotalMilliseconds;
}

Make sure you compile in Release with optimizations enabled, and run the tests outside of Visual Studio. This last part is important because the JIT stints its optimizations with a debugger attached, even in Release mode.

回复收藏 0 原文

少年亿悲伤 2024-08-01 10:57:25

终结不一定会在 GC.Collect 返回之前完成。终结将排队，然后在单独的线程上运行。该线程在您的测试期间可能仍然处于活动状态，从而影响结果。

如果您想确保在开始测试之前完成最终确定，那么您可能需要调用 GC.WaitForPendingFinalizers，它将阻塞，直到清除终结队列：

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

Finalisation won't necessarily be completed before GC.Collect returns. The finalisation is queued and then run on a separate thread. This thread could still be active during your tests, affecting the results.

If you want to ensure that finalisation has completed before starting your tests then you might want to call GC.WaitForPendingFinalizers, which will block until the finalisation queue is cleared:

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

回复收藏 0 原文

霞映澄塘 2024-08-01 10:57:25

如果您想将 GC 交互排除在外，您可能需要在 GC.Collect 调用之后（而不是之前）运行“预热”调用。这样您就知道 .NET 已经从操作系统中为您的函数的工作集分配了足够的内存。

请记住，您正在为每次迭代进行非内联方法调用，因此请确保将正在测试的内容与空主体进行比较。您还必须接受这样的事实：您只能可靠地对比方法调用长几倍的事情进行计时。

此外，根据您要分析的内容，您可能希望基于计时运行一定的时间而不是一定数量的迭代 - 它往往会导致更容易比较的数字，而无需必须有一个非常短的运行以获得最佳实施和/或一个很长的运行以达到最坏的效果。

回复收藏 0 原文

孤独患者 2024-08-01 10:57:25

我认为像这样的基准测试方法最难克服的问题是考虑边缘情况和意外情况。例如 - “这两个代码片段在高 CPU 负载/网络使用/磁盘抖动/等情况下如何工作。” 它们非常适合进行基本逻辑检查，以查看特定算法的运行速度是否明显快于其他算法。但要正确测试大多数代码性能，您必须创建一个测试来测量特定代码的特定瓶颈。

我仍然想说，测试小代码块通常没有什么投资回报，并且会鼓励使用过于复杂的代码而不是简单的可维护代码。编写其他开发人员或我自己 6 个月后能够快速理解的清晰代码将比高度优化的代码具有更多的性能优势。

回复收藏 0 原文

迟月 2024-08-01 10:57:25

我完全避免传递委托：

委托调用是〜虚拟方法调用。不便宜：~ .NET 中最小内存分配的 25%。如果您对详细信息感兴趣，请参阅例如此链接。
匿名委托可能会导致使用闭包，而您甚至不会注意到。同样，访问闭包字段明显比访问堆栈上的变量要重要。

导致使用闭包的示例代码：

public void Test()
{
  int someNumber = 1;
  Profiler.Profile("Closure access", 1000000, 
    () => someNumber + someNumber);
}

如果您不了解闭包，请查看 .NET Reflector 中的此方法。

I'd avoid passing the delegate at all:

Delegate call is ~ virtual method call. Not cheap: ~ 25% of smallest memory allocation in .NET. If you're interested in details, see e.g. this link.
Anonymous delegates may lead to usage of closures, that you won't even notice. Again, accessing closure fields is noticeably than e.g. accessing a variable on the stack.

An example code leading to closure usage:

public void Test()
{
  int someNumber = 1;
  Profiler.Profile("Closure access", 1000000, 
    () => someNumber + someNumber);
}

If you're not aware about closures, take a look at this method in .NET Reflector.

回复收藏 0 原文

∞梦里开花 2024-08-01 10:57:25

我会多次调用 func() 进行热身，而不只是一次。

回复收藏 0 原文

沉睡月亮 2024-08-01 10:57:25

改进建议

检测执行环境是否适合基准测试（例如检测是否附加了调试器或是否禁用 jit 优化，这会导致测量不正确）。
独立测量部分代码（以准确了解瓶颈所在）。
比较不同版本/组件/代码块（在你的第一句话中，你说“......对小块代码进行基准测试，看看哪种实现速度最快。”）。

关于#1：

要检测是否附加了调试器，请读取属性System.Diagnostics.Debugger.IsAttached（记住还要处理调试器附加的情况）最初未附加，但在一段时间后附加）。

要检测 jit 优化是否已禁用，请读取相关程序集的属性 DebuggableAttribute.IsJITOptimizerDisabled：

private bool IsJitOptimizerDisabled（Assembly 程序集） 
  { 
      返回 assembly.GetCustomAttributes(typeof (DebuggableAttribute), false) 
          .Select(customAttribute => (DebuggableAttribute) customAttribute) 
          .Any(属性 => 属性.IsJITOptimizerDisabled); 
  }

关于 #2：

这可以在许多情况下完成方法。一种方法是允许提供多个代表，然后单独测量这些代表。

关于#3：

这也可以通过多种方式完成，不同的用例将需要非常不同的解决方案。如果手动调用基准测试，则写入控制台可能没问题。但是，如果基准测试是由构建系统自动执行的，那么写入控制台可能不太好。

实现此目的的一种方法是将基准测试结果作为强类型对象返回，以便可以在不同的上下文中轻松使用。

Etimo.Benchmarks

另一种方法是使用现有组件来执行基准测试。实际上，在我的公司，我们决定将我们的基准测试工具发布到公共领域。它的核心是管理垃圾收集器、抖动、预热等，就像这里的其他一些答案所建议的那样。它还具有我上面建议的三个功能。它解决了 Eric Lippert 博客。

这是一个示例输出，其中比较两个组件并将结果写入控制台。在本例中，比较的两个组件称为“KeyedCollection”和“MultiplyIndexedKeyedCollection”：

Etimo.Benchmarks - Sample Console Output

有一个 NuGet 包，一个示例 NuGet 包，源代码可在 GitHub。还有一篇博客文章。

如果您很着急，我建议您获取示例包并根据需要简单地修改示例委托。如果您不着急，最好阅读博客文章以了解详细信息。

Suggestions for improvement

Detecting if the execution environment is good for benchmarking (such as detecting if a debugger is attached or if jit optimization is disabled which would result in incorrect measurements).
Measuring parts of the code independently (to see exactly where the bottleneck is).
Comparing different versions/components/chunks of code (In your first sentence you say '... benchmarking small chunks of code to see which implementation is fastest.').

Regarding #1:

To detect if a debugger is attached, read the property System.Diagnostics.Debugger.IsAttached (Remember to also handle the case where the debugger is initially not attached, but is attached after some time).

To detect if jit optimization is disabled, read property DebuggableAttribute.IsJITOptimizerDisabled of the relevant assemblies:

private bool IsJitOptimizerDisabled(Assembly assembly)
{
    return assembly.GetCustomAttributes(typeof (DebuggableAttribute), false)
        .Select(customAttribute => (DebuggableAttribute) customAttribute)
        .Any(attribute => attribute.IsJITOptimizerDisabled);
}

Regarding #2:

This can be done in many ways. One way is to allow several delegates to be supplied and then measure those delegates individually.

Regarding #3:

This could also be done in many ways, and different use-cases would demand very different solutions. If the benchmark is invoked manually, then writing to the console might be fine. However if the benchmark is performed automatically by the build system, then writing to the console is probably not so fine.

One way to do this is to return the benchmark result as a strongly typed object that can easily be consumed in different contexts.

Etimo.Benchmarks

Another approach is to use an existing component to perform the benchmarks. Actually, at my company we decided to release our benchmark tool to public domain. At it's core, it manages the garbage collector, jitter, warmups etc, just like some of the other answers here suggest. It also has the three features I suggested above. It manages several of the issues discussed in Eric Lippert blog.

This is an example output where two components are compared and the results are written to the console. In this case the two components compared are called 'KeyedCollection' and 'MultiplyIndexedKeyedCollection':

Etimo.Benchmarks - Sample Console Output

There is a NuGet package, a sample NuGet package and the source code is available at GitHub. There is also a blog post.

If you're in a hurry, I suggest you get the sample package and simply modify the sample delegates as needed. If you're not in a hurry, it might be a good idea to read the blog post to understand the details.

回复收藏 0 原文

遇到 2024-08-01 10:57:25

您还必须在实际测量之前运行“预热”过程，以排除 JIT 编译器在抖动代码上花费的时间。

回复收藏 0 原文

各自安好 2024-08-01 10:57:25

根据您进行基准测试的代码及其运行的平台，您可能需要考虑代码对齐如何影响性能。为此，可能需要一个多次运行测试的外部包装器（在单独的应用程序域或进程中？），有时首先调用“填充代码”以强制对其进行 JIT 编译，从而导致代码被基准以不同方式对齐。完整的测试结果将给出各种代码对齐的最佳情况和最坏情况时序。

回复收藏 0 原文

冰葑 2024-08-01 10:57:25

如果您试图完全消除基准测试中垃圾收集的影响，是否值得设置 GCSettings.LatencyMode ？

如果不是，并且您希望 func 中创建的垃圾的影响成为基准测试的一部分，那么您是否也应该在测试结束时（在计时器内）强制进行收集？

回复收藏 0 原文

祁梦 2024-08-01 10:57:25

你的问题的基本问题是假设单个
测量可以回答您的所有问题。你需要测量
多次进行，以有效了解情况并
尤其是在像 C# 这样的垃圾收集语言中。

另一个答案给出了衡量基本性能的好方法。

static void Profile(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

然而，这个单一的测量并没有考虑到垃圾
收藏。正确的配置文件还可以考虑最坏情况下的性能
垃圾收集分布在许多调用中（这个数字排序
无用的，因为虚拟机可以终止而不收集剩余的
垃圾，但对于比较两个不同的仍然有用
func 的实现。）

static void ProfileGarbageMany(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

并且人们可能还想测量最坏情况下的性能
仅调用一次的方法的垃圾回收。

static void ProfileGarbage(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();

        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

但比推荐任何具体的可能的附加措施更重要
测量轮廓的想法是应该测量多个
不同的统计数据，而不仅仅是一种统计数据。

The basic problem with your question is the assumption that a single
measurement can answer all your questions. You need to measure
multiple times to get an effective picture of the situation and
especially in a garbage collected langauge like C#.

Another answer gives an okay way of measuring the basic performance.

static void Profile(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

However, this single measurement does not account for garbage
collection. A proper profile additionally accounts for the worst case performance
of garbage collection spread out over many calls (this number is sort
of useless as the VM can terminate without ever collecting left over
garbage but is still useful for comparing two different
implementations of func.)

static void ProfileGarbageMany(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

And one might also want to measure the worst case performance of
garbage collection for a method that is only called once.

static void ProfileGarbage(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();

        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

But more important than recommending any specific possible additional
measurements to profile is the idea that one should measure multiple
different statistics and not just one kind of statistic.

回复收藏 0 原文

~没有更多了~

关于作者

昨迟人

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

对 C# 中的小代码示例进行基准测试，可以改进此实现吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（11）

改进建议

Etimo.Benchmarks

Suggestions for improvement

Etimo.Benchmarks

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

对 C# 中的小代码示例进行基准测试，可以改进此实现吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（11）

改进建议

Etimo.Benchmarks

Suggestions for improvement

Etimo.Benchmarks

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。