为什么在没有约束的泛型方法上将可为 null 的值类型与 null 进行比较的速度较慢？

发布于 2024-11-02 09:51:57 字数 1153 浏览 8 评论 0原文

我遇到了一个非常有趣的情况，在泛型方法中比较可空类型与 null 比比较值类型或引用类型慢 234 倍。代码如下：

static bool IsNull<T>(T instance)
{
    return instance == null;
}

执行代码为：

int? a = 0;
string b = "A";
int c = 0;

var watch = Stopwatch.StartNew();

for (int i = 0; i < 1000000; i++)
{
    var r1 = IsNull(a);
}

Console.WriteLine(watch.Elapsed.ToString());

watch.Restart();

for (int i = 0; i < 1000000; i++)
{
    var r2 = IsNull(b);
}

Console.WriteLine(watch.Elapsed.ToString());

watch.Restart();

for (int i = 0; i < 1000000; i++)
{
    var r3 = IsNull(c);
}

watch.Stop();

Console.WriteLine(watch.Elapsed.ToString());
Console.ReadKey();

上面代码的输出为：

00:00:00.1879827

00:00:00.0008779

00:00:00.0008532

如您所见，比较可空 int 与 null 的速度比比较慢 234 倍一个整数或一个字符串。如果我添加具有正确约束的第二个重载，结果会发生巨大变化：

static bool IsNull<T>(T? instance) where T : struct
{
    return instance == null;
}

现在结果是：

00:00:00.0006040

00:00:00.0006017

00:00:00.0006014

这是为什么？我没有检查字节码，因为我对它不太熟悉，但即使字节码有点不同，我也希望 JIT 能够对此进行优化，但事实并非如此（我正在运行优化）。

原文

I came across a very funny situation where comparing a nullable type to null inside a generic method is 234x slower than comparing an value type or a reference type. The code is as follows:

static bool IsNull<T>(T instance)
{
    return instance == null;
}

The execution code is:

int? a = 0;
string b = "A";
int c = 0;

var watch = Stopwatch.StartNew();

for (int i = 0; i < 1000000; i++)
{
    var r1 = IsNull(a);
}

Console.WriteLine(watch.Elapsed.ToString());

watch.Restart();

for (int i = 0; i < 1000000; i++)
{
    var r2 = IsNull(b);
}

Console.WriteLine(watch.Elapsed.ToString());

watch.Restart();

for (int i = 0; i < 1000000; i++)
{
    var r3 = IsNull(c);
}

watch.Stop();

Console.WriteLine(watch.Elapsed.ToString());
Console.ReadKey();

The output for the code above is:

00:00:00.1879827

00:00:00.0008779

00:00:00.0008532

As you can see, comparing an nullable int to null is 234x slower than comparing an int or a string. If I add a second overload with the right constraints, the results change dramatically:

static bool IsNull<T>(T? instance) where T : struct
{
    return instance == null;
}

Now the results are:

00:00:00.0006040

00:00:00.0006017

00:00:00.0006014

Why is that? I didn't check the byte code because I'm not fluent on it, but even if the byte code was a little bit different, I would expect the JIT to optimize this, and it is not (I'm running with optimizations).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅唱々樱花落 2024-11-09 09:51:57

您应该执行以下操作来调查此问题。

首先重写程序，使其将所有事情执行两次。在两次迭代之间放置一个消息框。在启用优化的情况下编译程序，然后不在调试器中运行该程序。这确保了抖动能够生成最佳的代码。抖动知道调试器何时被连接，并且如果它认为您正在做的事情，可以生成更糟糕的代码以使调试更容易。

当消息框弹出时，附加调试器，然后在汇编代码级别跟踪代码的三个不同版本，如果实际上甚至存在三个不同版本的话。我愿意打赌，赌第一个代码不会生成任何代码，因为抖动知道整个事情可以优化为“返回 false”，然后可以内联返回 false，也许甚至可以删除循环。

（将来，您在编写性能测试时可能应该考虑这一点。请记住，如果您不使用结果，那么抖动就可以完全优化掉一切产生该结果，只要它没有副作用。）

一旦您可以查看汇编代码，您就会明白发生了什么。

我本人没有对此进行过调查，但很有可能发生的事情是这样的：

在 int 代码路径中，抖动是意识到装箱的 int 永远不会为 null 并将该方法转换为“return false”
在字符串代码路径中，抖动意识到测试字符串是否为 null 相当于测试是否被管理的指向字符串的指针为零，因此它生成一条指令来测试寄存器是否为零。
在 int? codepath，可能抖动是意识到测试 int ？对于无效性可以通过装箱 int 来完成吗？ -- 因为装箱的 null int 是一个 null 引用，所以这就减少了之前测试托管指针是否为零的问题。但是你承担了拳击的成本。

如果是这样的话，那么这里的抖动可能会更加复杂，并且意识到测试 int 了吗？ for null 可以通过返回 int? 内 HasValue bool 的逆来完成。

但就像我说的，这只是一个猜测。如果您有兴趣，可以自己生成代码并看看它在做什么。

Here's what you should do to investigate this.

Start by rewriting the program so that it does everything twice. Put a message box between the two iterations. Compile the program with optimizations on, and run the program not in the debugger. This ensures that the jitter generates the most optimal code it can. The jitter knows when a debugger is attached and can generate worse code to make it easier to debug if it thinks that's what you're doing.

When the message box pops up, attach the debugger and then trace at the assembly code level into the three different versions of the code, if in fact there even are three different versions. I'd be willing to bet as much as a dollar that no code whatsoever is generated for the first one, because the jitter knows that the whole thing can be optimized away to "return false", and then that return false can be inlined, and perhaps even the loop can be removed.

(In the future, you should probably consider this when writing performance tests. Remember that if you don't use the result then the jitter is free to completely optimize away everything that produces that result, as long as it has no side effect.)

Once you can look at the assembly code you'll see what's going on.

I have not investigated this myself personally, but odds are good that what is going on is this:

in the int codepath, the jitter is realizing that a boxed int is never null and turning the method into "return false"
in the string codepath, the jitter is realizing that testing a string for nullity is equivalent to testing whether the managed pointer to the string is zero, so it is generating a single instruction that tests whether a register is zero.
in the int? codepath, probably the jitter is realizing that testing an int? for nullity can be accomplished by boxing the int? -- since a boxed null int is a null reference, that then reduces to the earlier problem of testing a managed pointer against zero. But you take on the cost of the boxing.

If that's the case then the jitter could be more sophisticated here and realize that testing an int? for null can be accomplished by returning the inverse of the HasValue bool inside the int?.

But like I said, that's just a guess. Generate the code yourself and see what it's doing if you're interested.

回复收藏 0 原文

咽泪装欢 2024-11-09 09:51:57

如果比较两个重载生成的 IL，您可以看到涉及装箱：

第一个看起来像：

.method private hidebysig static bool IsNull<T>(!!T instance) cil managed
{
    .maxstack 2
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: box !!T
    L_0007: ldnull 
    L_0008: ceq 
    L_000a: stloc.0 
    L_000b: br.s L_000d
    L_000d: ldloc.0 
    L_000e: ret 
}

而第二个看起来像：

.method private hidebysig static bool IsNull<valuetype ([mscorlib]System.ValueType) .ctor T>(valuetype [mscorlib]System.Nullable`1<!!T> instance) cil managed
{
    .maxstack 2
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarga.s instance
    L_0003: call instance bool [mscorlib]System.Nullable`1<!!T>::get_HasValue()
    L_0008: ldc.i4.0 
    L_0009: ceq 
    L_000b: stloc.0 
    L_000c: br.s L_000e
    L_000e: ldloc.0 
    L_000f: ret 
}

在第二种情况下，编译器知道该类型是 Nullable，因此它可以针对该类型进行优化。在第一种情况下，它必须处理任何类型，包括引用类型和值类型。所以它必须跳过一些额外的障碍。

至于为什么 int 比 int 快？我想其中涉及一些 JIT 优化。

If you compare the IL produced by the two overloads, you can see that there is boxing involved:

The first looks like:

.method private hidebysig static bool IsNull<T>(!!T instance) cil managed
{
    .maxstack 2
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: box !!T
    L_0007: ldnull 
    L_0008: ceq 
    L_000a: stloc.0 
    L_000b: br.s L_000d
    L_000d: ldloc.0 
    L_000e: ret 
}

While the second looks like:

.method private hidebysig static bool IsNull<valuetype ([mscorlib]System.ValueType) .ctor T>(valuetype [mscorlib]System.Nullable`1<!!T> instance) cil managed
{
    .maxstack 2
    .locals init (
        [0] bool CS$1$0000)
    L_0000: nop 
    L_0001: ldarga.s instance
    L_0003: call instance bool [mscorlib]System.Nullable`1<!!T>::get_HasValue()
    L_0008: ldc.i4.0 
    L_0009: ceq 
    L_000b: stloc.0 
    L_000c: br.s L_000e
    L_000e: ldloc.0 
    L_000f: ret 
}

In the second case, the compiler knows the type is a Nullable so it can optimize for that. In the first case, it has to handle any type, both reference and value types. So it has to jump through some extra hoops.

As for why int is faster than int?, I'd imagine there are some JIT optimizations involved there.

回复收藏 0 原文