bitoperations.ispow2基准不平衡

发布于 2025-01-25 07:09:20 字数 5070 浏览 4 评论 0原文

我正在基准测试方法 bitoperations.ispow2 带有benchmarkDotnet，但我得到的结果远非我的期望。这是您需要重现基准的最小代码：

powerof2bench.cs

using System.Numerics;
using BenchmarkDotNet.Attributes;

public class PowerOf2Benchmark {
    [Params(2048, 10003457, 20000123, 16777216)]
    public int n;

    [Benchmark]
    public bool CheckWithBitOperationsBuiltIn() 
    {
        return BitOperations.IsPow2(n);
    }
}

program.cs

using BenchmarkDotNet.Running;

BenchmarkRunner.Run<PowerOf2Benchmark>();

，这是基准的摘要：

BenchmarkDotNet=v0.13.1, OS=ubuntu 20.04
Intel Core i5-6200U CPU 2.30GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.202
  [Host]     : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
  DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT


|                        Method |        n |      Mean |     Error |    StdDev | Code Size |
|------------------------------ |--------- |----------:|----------:|----------:|----------:|
| CheckWithBitOperationsBuiltIn |     2048 | 0.0955 ns | 0.0092 ns | 0.0081 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 10003457 | 1.1815 ns | 0.0046 ns | 0.0040 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 16777216 | 0.1000 ns | 0.0054 ns | 0.0051 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 20000123 | 1.1750 ns | 0.0126 ns | 0.0112 ns |      28 B |

// * Hints *
Outliers
  PowerOf2Benchmark.CheckWithBitOperationsBuiltIn: Default -> 1 outlier  was  removed (2.33 ns)
  PowerOf2Benchmark.CheckWithBitOperationsBuiltIn: Default -> 1 outlier  was  removed (3.38 ns)
  PowerOf2Benchmark.CheckWithBitOperationsBuiltIn: Default -> 1 outlier  was  removed (3.42 ns)

我希望我正确解释了结果，但是但是似乎bitoperations.ispow2与不相比的n是2（2048，16777216）的功率时（10003457，200000000123））。为什么那是

的源代码bitoperations.ispow2 应该是这样的：

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsPow2(int value) => (value & (value - 1)) == 0 && value > 0;

我还假设Aggressive> AggressiveInling优化不是结果不平衡结果的原因。我也不是硬件优化的专家，但是生产的ASM代码非常简单（由于methodimploptions.gagertiverinlining>）：

; CheckIfNumberIsPowerOf2.PowerOf2Benchmark.CheckWithBitOperationsBuiltIn()
       push      rbp
       mov       rbp,rsp
       mov       eax,[rdi+8]
       lea       edi,[rax-1]
       test      edi,eax
       jne       short M00_L01
       test      eax,eax
       setg      al
       movzx     eax,al
M00_L00:
       pop       rbp
       ret
M00_L01:
       xor       eax,eax
       jmp       short M00_L00
; Total bytes of code 28

编辑

出于好奇，我有方法的类

static public class PowerOf2Verifier
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    static public bool CheckWithBitMaskV3(int n) => (n & (n - 1)) == 0 && n > 0;
}

。

[Benchmark]
public bool CheckWithBitMaskV3() 
{
    return PowerOf2Verifier.CheckWithBitMaskV3(n);
}

创建了一个

|                        Method |        n |      Mean |     Error |    StdDev | Code Size |
|------------------------------ |--------- |----------:|----------:|----------:|----------:|
|            CheckWithBitMaskV3 |     2048 | 0.5141 ns | 0.0098 ns | 0.0087 ns |      28 B |
| CheckWithBitOperationsBuiltIn |     2048 | 0.1040 ns | 0.0085 ns | 0.0079 ns |      28 B |
|            CheckWithBitMaskV3 | 10003457 | 0.3589 ns | 0.0091 ns | 0.0081 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 10003457 | 1.1824 ns | 0.0091 ns | 0.0081 ns |      28 B |
|            CheckWithBitMaskV3 | 16777216 | 0.5143 ns | 0.0063 ns | 0.0059 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 16777216 | 0.0991 ns | 0.0076 ns | 0.0071 ns |      28 B |
|            CheckWithBitMaskV3 | 20000123 | 0.4513 ns | 0.0190 ns | 0.0177 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 20000123 | 1.1257 ns | 0.0108 ns | 0.0090 ns |      28 B |

包含具有相同实现的情况，checkwithbitmaskv3结果是一致的，这使我更加惊讶，因为我正在基准测试的两种方法现在以相同的方式实现。有什么解释？

编辑2

出于某种原因，checkwithbitmaskv3的组装与ispow2的组件略有不同：

; CheckIfNumberIsPowerOf2.PowerOf2Benchmark.CheckWithBitMaskV3()
       push      rbp
       mov       rbp,rsp
       mov       eax,[rdi+8]
       lea       edi,[rax-1]
       test      edi,eax
       jne       short M00_L00
       test      eax,eax
       setg      al
       movzx     eax,al
       jmp       short M00_L01
M00_L00:
       xor       eax,eax
M00_L01:
       pop       rbp
       ret
; Total bytes of code 28

原文

I am benchmarking the method BitOperations.IsPow2 with BenchmarkDotNet, but the results I got were far from my expectations. Here is the minimal code you need to reproduce the benchmark:

PowerOf2Benchmark.cs

using System.Numerics;
using BenchmarkDotNet.Attributes;

public class PowerOf2Benchmark {
    [Params(2048, 10003457, 20000123, 16777216)]
    public int n;

    [Benchmark]
    public bool CheckWithBitOperationsBuiltIn() 
    {
        return BitOperations.IsPow2(n);
    }
}

Program.cs

using BenchmarkDotNet.Running;

BenchmarkRunner.Run<PowerOf2Benchmark>();

And here is the summary of the benchmark:

BenchmarkDotNet=v0.13.1, OS=ubuntu 20.04
Intel Core i5-6200U CPU 2.30GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.202
  [Host]     : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
  DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT


|                        Method |        n |      Mean |     Error |    StdDev | Code Size |
|------------------------------ |--------- |----------:|----------:|----------:|----------:|
| CheckWithBitOperationsBuiltIn |     2048 | 0.0955 ns | 0.0092 ns | 0.0081 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 10003457 | 1.1815 ns | 0.0046 ns | 0.0040 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 16777216 | 0.1000 ns | 0.0054 ns | 0.0051 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 20000123 | 1.1750 ns | 0.0126 ns | 0.0112 ns |      28 B |

// * Hints *
Outliers
  PowerOf2Benchmark.CheckWithBitOperationsBuiltIn: Default -> 1 outlier  was  removed (2.33 ns)
  PowerOf2Benchmark.CheckWithBitOperationsBuiltIn: Default -> 1 outlier  was  removed (3.38 ns)
  PowerOf2Benchmark.CheckWithBitOperationsBuiltIn: Default -> 1 outlier  was  removed (3.42 ns)

I hope I interpreted the results correctly, but it seems that BitOperations.IsPow2 is more than 10x faster when n is a power of 2 (2048, 16777216) compared to when it is not (10003457, 20000123). Why is that?

The source code of BitOperations.IsPow2 should be this one:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool IsPow2(int value) => (value & (value - 1)) == 0 && value > 0;

I am also assuming the AggressiveInlining optimization is not the reason for the unbalanced results. I am not an expert on hardware optimizations either, but the ASM code produced is quite simple (it is inlined because of MethodImplOptions.AggressiveInlining):

; CheckIfNumberIsPowerOf2.PowerOf2Benchmark.CheckWithBitOperationsBuiltIn()
       push      rbp
       mov       rbp,rsp
       mov       eax,[rdi+8]
       lea       edi,[rax-1]
       test      edi,eax
       jne       short M00_L01
       test      eax,eax
       setg      al
       movzx     eax,al
M00_L00:
       pop       rbp
       ret
M00_L01:
       xor       eax,eax
       jmp       short M00_L00
; Total bytes of code 28

EDIT

Out of curiosity, I have created a class containing a method with the same implementation of BitOperations.IsPow2:

static public class PowerOf2Verifier
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    static public bool CheckWithBitMaskV3(int n) => (n & (n - 1)) == 0 && n > 0;
}

Then, in the PowerOf2Benchmark class, I have added this method:

[Benchmark]
public bool CheckWithBitMaskV3() 
{
    return PowerOf2Verifier.CheckWithBitMaskV3(n);
}

This is the updated summary:

|                        Method |        n |      Mean |     Error |    StdDev | Code Size |
|------------------------------ |--------- |----------:|----------:|----------:|----------:|
|            CheckWithBitMaskV3 |     2048 | 0.5141 ns | 0.0098 ns | 0.0087 ns |      28 B |
| CheckWithBitOperationsBuiltIn |     2048 | 0.1040 ns | 0.0085 ns | 0.0079 ns |      28 B |
|            CheckWithBitMaskV3 | 10003457 | 0.3589 ns | 0.0091 ns | 0.0081 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 10003457 | 1.1824 ns | 0.0091 ns | 0.0081 ns |      28 B |
|            CheckWithBitMaskV3 | 16777216 | 0.5143 ns | 0.0063 ns | 0.0059 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 16777216 | 0.0991 ns | 0.0076 ns | 0.0071 ns |      28 B |
|            CheckWithBitMaskV3 | 20000123 | 0.4513 ns | 0.0190 ns | 0.0177 ns |      28 B |
| CheckWithBitOperationsBuiltIn | 20000123 | 1.1257 ns | 0.0108 ns | 0.0090 ns |      28 B |

In this case, CheckWithBitMaskV3 results are consistent, and this surprises me even more because the two methods I am benchmarking are implemented in the same way now. What could be the explanation?

EDIT 2

For some reason, the assembly of CheckWithBitMaskV3 is slightly different from that of IsPow2:

; CheckIfNumberIsPowerOf2.PowerOf2Benchmark.CheckWithBitMaskV3()
       push      rbp
       mov       rbp,rsp
       mov       eax,[rdi+8]
       lea       edi,[rax-1]
       test      edi,eax
       jne       short M00_L00
       test      eax,eax
       setg      al
       movzx     eax,al
       jmp       short M00_L01
M00_L00:
       xor       eax,eax
M00_L01:
       pop       rbp
       ret
; Total bytes of code 28

分享到QQ

分享到微博