.NET:ThreadStatic 与锁 { }。为什么 ThreadStaticAttribute 会降低性能?

发布于 2024-11-30 11:11:35 字数 1401 浏览 1 评论 0原文

我编写了小型测试程序,并且很惊讶为什么 lock {} 解决方案比无锁解决方案执行得更快,但在静态变量上具有 [ThreadStatic] 属性。

[ThreadStatic] snippet:

[ThreadStatic]
private static long ms_Acc;
public static void RunTest()
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    int one = 1;
    for (int i = 0; i < 100 * 1000 * 1000; ++i) {
        ms_Acc += one;
        ms_Acc /= one;
    }
    stopwatch.Stop();
    Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}

lock {} snippet:

private static long ms_Acc;
private static object ms_Lock = new object();
public static void RunTest()
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    int one = 1;
    for (int i = 0; i < 100 * 1000 * 1000; ++i) {
        lock (ms_Lock) {
            ms_Acc += one;
            ms_Acc /= one;
        }
    }
    stopwatch.Stop();
    Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}

在我的机器上,第一个片段需要 4.2 秒;秒 - 3.2 秒,快了 1 秒。没有 ThreadStatic 和锁 - 1.2 秒。

我很好奇为什么这个简单示例中的 [ThreadStatic] 属性会增加程序执行时间这么多?

更新:我感到非常抱歉,但这些结果是针对DEBUG构建的。对于RELEASE,我得到了完全不同的数字:(1.2; 2.4; 1.2)。对于DEBUG,数字是(4.2;3.2;1.2)。

因此,对于RELEASE构建,似乎没有[ThreadStatic]性能损失。

I've written small test program and was surprised why lock {} solution performs faster than lock-free but with [ThreadStatic] attribute over static variable.

[ThreadStatic] snippet:

[ThreadStatic]
private static long ms_Acc;
public static void RunTest()
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    int one = 1;
    for (int i = 0; i < 100 * 1000 * 1000; ++i) {
        ms_Acc += one;
        ms_Acc /= one;
    }
    stopwatch.Stop();
    Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}

lock {} snippet:

private static long ms_Acc;
private static object ms_Lock = new object();
public static void RunTest()
{
    Stopwatch stopwatch = new Stopwatch();
    stopwatch.Start();
    int one = 1;
    for (int i = 0; i < 100 * 1000 * 1000; ++i) {
        lock (ms_Lock) {
            ms_Acc += one;
            ms_Acc /= one;
        }
    }
    stopwatch.Stop();
    Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}

On my machine first snippet takes 4.2 seconds; second - 3.2 seconds, which is 1 second faster. Without ThreadStatic and lock - 1.2 seconds.

I'm curious why [ThreadStatic] attribute in this simple example adds so many to program execution time?

UPDATE: I feel very sorry, but these results are for DEBUG build. For RELEASE one I got completely different numbers: (1.2; 2.4; 1.2). For DEBUG numbers were (4.2; 3.2; 1.2).

So, for RELEASE build there seems to be no [ThreadStatic] performance penalty.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

北风几吹夏 2024-12-07 11:11:35

对于 RELEASE 构建,似乎几乎没有 [ThreadStatic] 性能损失(在现代 CPU 上只有轻微的损失)。

这是 ms_Acc += one 的反汇编代码;对于 RELEASE 优化已启用:

[ThreadStatic]DEBUG

00000060  mov         eax,dword ptr [ebp-40h] 
00000063  add         dword ptr ds:[00511718h],eax 

<代码>[ThreadStatic],RELEASE

00000051  mov         eax,dword ptr [00040750h]
00000057  add         eax,dword ptr [rsp+20h]
0000005b  mov         dword ptr [00040750h],eax

[ThreadStatic]DEBUG

00000066  mov         edx,1 
0000006b  mov         ecx,4616E0h 
00000070  call        664F7450 
00000075  mov         edx,1 
0000007a  mov         ecx,4616E0h 
0000007f  mov         dword ptr [ebp-50h],eax 
00000082  call        664F7450 
00000087  mov         edx,dword ptr [eax+18h] 
0000008a  add         edx,dword ptr [ebp-40h] 
0000008d  mov         eax,dword ptr [ebp-50h] 
00000090  mov         dword ptr [eax+18h],edx 

[ThreadStatic],发布:

00000058  mov         edx,1 
0000005d  mov         rcx,7FF001A3F28h 
00000067  call        FFFFFFFFF6F9F740 
0000006c  mov         qword ptr [rsp+30h],rax 
00000071  mov         rbx,qword ptr [rsp+30h] 
00000076  mov         ebx,dword ptr [rbx+20h] 
00000079  add         ebx,dword ptr [rsp+20h] 
0000007d  mov         edx,1 
00000082  mov         rcx,7FF001A3F28h 
0000008c  call        FFFFFFFFF6F9F740 
00000091  mov         qword ptr [rsp+38h],rax 
00000096  mov         rax,qword ptr [rsp+38h] 
0000009b  mov         dword ptr [rax+20h],ebx 

For RELEASE build there seems to be almost no [ThreadStatic] performance penalty (only slight penalty on modern CPUs).

Here comes dis-assembly code for ms_Acc += one; for RELEASE optimization is enabled:

No [ThreadStatic], DEBUG:

00000060  mov         eax,dword ptr [ebp-40h] 
00000063  add         dword ptr ds:[00511718h],eax 

No [ThreadStatic], RELEASE:

00000051  mov         eax,dword ptr [00040750h]
00000057  add         eax,dword ptr [rsp+20h]
0000005b  mov         dword ptr [00040750h],eax

[ThreadStatic], DEBUG:

00000066  mov         edx,1 
0000006b  mov         ecx,4616E0h 
00000070  call        664F7450 
00000075  mov         edx,1 
0000007a  mov         ecx,4616E0h 
0000007f  mov         dword ptr [ebp-50h],eax 
00000082  call        664F7450 
00000087  mov         edx,dword ptr [eax+18h] 
0000008a  add         edx,dword ptr [ebp-40h] 
0000008d  mov         eax,dword ptr [ebp-50h] 
00000090  mov         dword ptr [eax+18h],edx 

[ThreadStatic], RELEASE:

00000058  mov         edx,1 
0000005d  mov         rcx,7FF001A3F28h 
00000067  call        FFFFFFFFF6F9F740 
0000006c  mov         qword ptr [rsp+30h],rax 
00000071  mov         rbx,qword ptr [rsp+30h] 
00000076  mov         ebx,dword ptr [rbx+20h] 
00000079  add         ebx,dword ptr [rsp+20h] 
0000007d  mov         edx,1 
00000082  mov         rcx,7FF001A3F28h 
0000008c  call        FFFFFFFFF6F9F740 
00000091  mov         qword ptr [rsp+38h],rax 
00000096  mov         rax,qword ptr [rsp+38h] 
0000009b  mov         dword ptr [rax+20h],ebx 
苦妄 2024-12-07 11:11:35

您有两行代码用于更新 ms_Acc。在 lock 情况下,这两个方面都有一个锁,而在 ThreadStatic 情况下,每次访问 ms_Acc 都会发生一次,即循环的每次迭代两次。这通常是使用锁定的好处,您可以选择所需的粒度。我猜测 RELEASE 版本优化了这种差异。

我很想知道,如果将 for 循环更改为对 ms_Acc 的单次访问,​​性能是否会变得非常相似或相同。

You have two lines of code that update ms_Acc. In the lock case, you have a single lock around both of these, while in the ThreadStatic case, it happens once for each access to ms_Acc, i.e. twice for each iteration of your loop. This is generally the benefit of using lock, you get to choose the granularity you want. I am guessing that the RELEASE build optimised this difference away.

I would be interested to see if the performance becomes very similar, or identical, if you change the for loop to a single access to ms_Acc.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文