.NET:ThreadStatic 与锁 { }。为什么 ThreadStaticAttribute 会降低性能?
我编写了小型测试程序,并且很惊讶为什么 lock {}
解决方案比无锁解决方案执行得更快,但在静态变量上具有 [ThreadStatic]
属性。
[ThreadStatic] snippet:
[ThreadStatic]
private static long ms_Acc;
public static void RunTest()
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int one = 1;
for (int i = 0; i < 100 * 1000 * 1000; ++i) {
ms_Acc += one;
ms_Acc /= one;
}
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}
lock {} snippet:
private static long ms_Acc;
private static object ms_Lock = new object();
public static void RunTest()
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int one = 1;
for (int i = 0; i < 100 * 1000 * 1000; ++i) {
lock (ms_Lock) {
ms_Acc += one;
ms_Acc /= one;
}
}
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}
在我的机器上,第一个片段需要 4.2 秒;秒 - 3.2 秒,快了 1 秒。没有 ThreadStatic 和锁 - 1.2 秒。
我很好奇为什么这个简单示例中的 [ThreadStatic]
属性会增加程序执行时间这么多?
更新:我感到非常抱歉,但这些结果是针对DEBUG
构建的。对于RELEASE
,我得到了完全不同的数字:(1.2; 2.4; 1.2)。对于DEBUG
,数字是(4.2;3.2;1.2)。
因此,对于RELEASE
构建,似乎没有[ThreadStatic]
性能损失。
I've written small test program and was surprised why lock {}
solution performs faster than lock-free but with [ThreadStatic]
attribute over static variable.
[ThreadStatic] snippet:
[ThreadStatic]
private static long ms_Acc;
public static void RunTest()
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int one = 1;
for (int i = 0; i < 100 * 1000 * 1000; ++i) {
ms_Acc += one;
ms_Acc /= one;
}
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}
lock {} snippet:
private static long ms_Acc;
private static object ms_Lock = new object();
public static void RunTest()
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int one = 1;
for (int i = 0; i < 100 * 1000 * 1000; ++i) {
lock (ms_Lock) {
ms_Acc += one;
ms_Acc /= one;
}
}
stopwatch.Stop();
Console.WriteLine("Time taken: {0}", stopwatch.Elapsed.TotalSeconds);
}
On my machine first snippet takes 4.2 seconds; second - 3.2 seconds, which is 1 second faster. Without ThreadStatic and lock - 1.2 seconds.
I'm curious why [ThreadStatic]
attribute in this simple example adds so many to program execution time?
UPDATE: I feel very sorry, but these results are for DEBUG
build. For RELEASE
one I got completely different numbers: (1.2; 2.4; 1.2). For DEBUG
numbers were (4.2; 3.2; 1.2).
So, for RELEASE
build there seems to be no [ThreadStatic]
performance penalty.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于 RELEASE 构建,似乎几乎没有 [ThreadStatic] 性能损失(在现代 CPU 上只有轻微的损失)。
这是
ms_Acc += one
的反汇编代码;对于RELEASE
优化已启用:否
[ThreadStatic]
、DEBUG
:否 <代码>[ThreadStatic],
RELEASE
:[ThreadStatic]
,DEBUG
:[ThreadStatic]
,发布:For RELEASE build there seems to be almost no [ThreadStatic] performance penalty (only slight penalty on modern CPUs).
Here comes dis-assembly code for
ms_Acc += one
; forRELEASE
optimization is enabled:No
[ThreadStatic]
,DEBUG
:No
[ThreadStatic]
,RELEASE
:[ThreadStatic]
,DEBUG
:[ThreadStatic]
,RELEASE
:您有两行代码用于更新
ms_Acc
。在lock
情况下,这两个方面都有一个锁,而在ThreadStatic
情况下,每次访问ms_Acc
都会发生一次,即循环的每次迭代两次。这通常是使用锁定的好处,您可以选择所需的粒度。我猜测 RELEASE 版本优化了这种差异。我很想知道,如果将 for 循环更改为对
ms_Acc
的单次访问,性能是否会变得非常相似或相同。You have two lines of code that update
ms_Acc
. In thelock
case, you have a single lock around both of these, while in theThreadStatic
case, it happens once for each access toms_Acc
, i.e. twice for each iteration of your loop. This is generally the benefit of usinglock
, you get to choose the granularity you want. I am guessing that the RELEASE build optimised this difference away.I would be interested to see if the performance becomes very similar, or identical, if you change the for loop to a single access to
ms_Acc
.