为什么 ConcurrentBag.Net (4.0) 这么慢?我做错了吗?
在开始项目之前,我编写了一个简单的测试来比较 (System.Collections.Concurrent) 中的 ConcurrentBag 相对于锁定和同步的性能。列表。我非常惊讶 ConcurrentBag 比使用简单 List 锁定慢 10 倍以上。据我了解,当读者和作者是同一线程时,ConcurrentBag 效果最好。但没想到它的性能会比传统锁差这么多。
我已经运行了一个测试,使用两个并行 for 循环写入和读取列表/包。然而,写入本身显示出巨大的差异:
private static void ConcurrentBagTest()
{
int collSize = 10000000;
Stopwatch stopWatch = new Stopwatch();
ConcurrentBag<int> bag1 = new ConcurrentBag<int>();
stopWatch.Start();
Parallel.For(0, collSize, delegate(int i)
{
bag1.Add(i);
});
stopWatch.Stop();
Console.WriteLine("Elapsed Time = {0}",
stopWatch.Elapsed.TotalSeconds);
}
在我的机器上,这需要 3-4 秒才能运行,而此代码需要 0.5 - 0.9 秒:
private static void LockCollTest()
{
int collSize = 10000000;
object list1_lock=new object();
List<int> lst1 = new List<int>(collSize);
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Parallel.For(0, collSize, delegate(int i)
{
lock(list1_lock)
{
lst1.Add(i);
}
});
stopWatch.Stop();
Console.WriteLine("Elapsed = {0}",
stopWatch.Elapsed.TotalSeconds);
}
正如我提到的,进行并发读取和写入并不能帮助并发袋测试。我做错了什么或者这个数据结构真的很慢吗?
[编辑] - 我删除了任务,因为我在这里不需要它们(完整代码有另一个任务读取)
[编辑] 非常感谢您的回答。我很难选择“正确的答案”,因为它似乎是几个答案的混合体。
正如 Michael Goldshteyn 指出的那样,速度实际上取决于数据。 Darin 指出,应该有更多的争用,让 ConcurrentBag 更快,而 Parallel.For 不一定启动相同数量的线程。需要注意的一点是,不要在锁内做任何不必要的事情。在上面的情况下,除了将值分配给临时变量之外,我没有看到自己在锁内执行任何操作。
此外,sixlettervariables 指出,碰巧正在运行的线程数量也可能会影响结果,尽管我尝试以相反的顺序运行原始测试,并且 ConcurrentBag 仍然较慢。
我从 15 个任务开始进行了一些测试,结果取决于集合大小等。然而,对于多达 100 万次插入,ConcurrentBag 的性能几乎与锁定列表一样好,甚至更好。超过 100 万时,锁定有时似乎要快得多,但我的项目可能永远不会有更大的数据结构。 这是我运行的代码:
int collSize = 1000000;
object list1_lock=new object();
List<int> lst1 = new List<int>();
ConcurrentBag<int> concBag = new ConcurrentBag<int>();
int numTasks = 15;
int i = 0;
Stopwatch sWatch = new Stopwatch();
sWatch.Start();
//First, try locks
Task.WaitAll(Enumerable.Range(1, numTasks)
.Select(x => Task.Factory.StartNew(() =>
{
for (i = 0; i < collSize / numTasks; i++)
{
lock (list1_lock)
{
lst1.Add(x);
}
}
})).ToArray());
sWatch.Stop();
Console.WriteLine("lock test. Elapsed = {0}",
sWatch.Elapsed.TotalSeconds);
// now try concurrentBag
sWatch.Restart();
Task.WaitAll(Enumerable.Range(1, numTasks).
Select(x => Task.Factory.StartNew(() =>
{
for (i = 0; i < collSize / numTasks; i++)
{
concBag.Add(x);
}
})).ToArray());
sWatch.Stop();
Console.WriteLine("Conc Bag test. Elapsed = {0}",
sWatch.Elapsed.TotalSeconds);
Before I started a project, I wrote a simple test to compare the performance of ConcurrentBag from (System.Collections.Concurrent) relative to locking & lists. I am extremely surprised that ConcurrentBag is over 10 times slower than locking with a simple List. From what I understand, the ConcurrentBag works best when the reader and writer is the same thread. However, I hadn't thought it's performance would be so much worse than traditional locks.
I have run a test with two Parallel for loops writing to and reading from a list/bag. However, the write by itself shows a huge difference:
private static void ConcurrentBagTest()
{
int collSize = 10000000;
Stopwatch stopWatch = new Stopwatch();
ConcurrentBag<int> bag1 = new ConcurrentBag<int>();
stopWatch.Start();
Parallel.For(0, collSize, delegate(int i)
{
bag1.Add(i);
});
stopWatch.Stop();
Console.WriteLine("Elapsed Time = {0}",
stopWatch.Elapsed.TotalSeconds);
}
On my box, this takes between 3-4 secs to run, compared to 0.5 - 0.9 secs of this code:
private static void LockCollTest()
{
int collSize = 10000000;
object list1_lock=new object();
List<int> lst1 = new List<int>(collSize);
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
Parallel.For(0, collSize, delegate(int i)
{
lock(list1_lock)
{
lst1.Add(i);
}
});
stopWatch.Stop();
Console.WriteLine("Elapsed = {0}",
stopWatch.Elapsed.TotalSeconds);
}
As I mentioned, doing concurrent reads and writes doesn't help the concurrent bag test. Am I doing something wrong or is this data structure just really slow?
[EDIT] - I removed the Tasks because I don't need them here (Full code had another task reading)
[EDIT]
Thanks a lot for the answers. I am having a hard time picking "the right answer" since it seems to be a mix of a few answers.
As Michael Goldshteyn pointed out, the speed really depends on the data.
Darin pointed out that there should be more contention for ConcurrentBag to be faster, and Parallel.For doesn't necessarily start the same number of threads. One point to take away is to not do anything you don't have to inside a lock. In the above case, I don't see myself doing anything inside the lock except may be assigning the value to a temp variable.
Additionally, sixlettervariables pointed out that the number of threads that happen to be running may also affect results, although I tried running the original test in reverse order and ConcurrentBag was still slower.
I ran some tests with starting 15 Tasks and the results depended on the collection size among other things. However, ConcurrentBag performed almost as well as or better than locking a list, for up to 1 million insertions. Above 1 million, locking seemed to be much faster sometimes, but I'll probably never have a larger datastructure for my project.
Here's the code I ran:
int collSize = 1000000;
object list1_lock=new object();
List<int> lst1 = new List<int>();
ConcurrentBag<int> concBag = new ConcurrentBag<int>();
int numTasks = 15;
int i = 0;
Stopwatch sWatch = new Stopwatch();
sWatch.Start();
//First, try locks
Task.WaitAll(Enumerable.Range(1, numTasks)
.Select(x => Task.Factory.StartNew(() =>
{
for (i = 0; i < collSize / numTasks; i++)
{
lock (list1_lock)
{
lst1.Add(x);
}
}
})).ToArray());
sWatch.Stop();
Console.WriteLine("lock test. Elapsed = {0}",
sWatch.Elapsed.TotalSeconds);
// now try concurrentBag
sWatch.Restart();
Task.WaitAll(Enumerable.Range(1, numTasks).
Select(x => Task.Factory.StartNew(() =>
{
for (i = 0; i < collSize / numTasks; i++)
{
concBag.Add(x);
}
})).ToArray());
sWatch.Stop();
Console.WriteLine("Conc Bag test. Elapsed = {0}",
sWatch.Elapsed.TotalSeconds);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
让我问您这个问题:您拥有一个不断向集合添加内容并且从不读取集合的应用程序,这有多现实?这样的收藏有什么用呢? (这不是一个纯粹的反问。我可以想象有一些用途,例如,您只在关闭时(用于日志记录)或在用户请求时从集合中读取。我相信这些场景是不过,相当罕见。)
这就是您的代码正在模拟的内容。除了列表必须调整其内部数组大小的偶尔情况外,调用
List.Add
的速度都快如闪电;但这会被很快发生的所有其他添加所消除。因此,在这种情况下,您不太可能看到大量的争用,尤其在具有例如 8 个核心的个人电脑上进行测试(正如您在某处评论中所说的那样)。 也许您可能会在 24 核计算机等设备上看到更多争用,其中许多内核可能会同时尝试将内容添加到列表中。争用更有可能发生在您从收藏中阅读的地方,尤其是在。在
foreach
循环(或相当于foreach
底层的 LINQ 查询)中,需要锁定整个操作,以便您在迭代集合时不会修改集合。如果您能够真实地重现此场景,我相信您会看到
ConcurrentBag
的规模比当前测试显示的要好得多。更新:这里是我编写的一个程序,用于在上述场景中比较这些集合(多个作者,多个读者)。使用集合大小为 10000 和 8 个读者线程运行 25 次试验,我得到了以下结果:
很明显,这取决于您对这些集合所做的操作。
Let me ask you this: how realistic is it that you'd have an application which is constantly adding to a collection and never reading from it? What's the use of such a collection? (This is not a purely rhetorical question. I could imagine there being uses where, e.g., you only read from the collection on shutdown (for logging) or when requested by the user. I believe these scenarios are fairly rare, though.)
This is what your code is simulating. Calling
List<T>.Add
is going to be lightning-fast in all but the occasional case where the list has to resize its internal array; but this is smoothed out by all the other adds that happen quite quickly. So you're not likely to see a significant amount of contention in this context, especially testing on a personal PC with, e.g., even 8 cores (as you stated you have in a comment somewhere). Maybe you might see more contention on something like a 24-core machine, where many cores can be trying to add to the list literally at the same time.Contention is much more likely to creep in where you read from your collection, esp. in
foreach
loops (or LINQ queries which amount toforeach
loops under the hood) which require locking the entire operation so that you aren't modifying your collection while iterating over it.If you can realistically reproduce this scenario, I believe you will see
ConcurrentBag<T>
scale much better than your current test is showing.Update: Here is a program I wrote to compare these collections in the scenario I described above (multiple writers, many readers). Running 25 trials with a collection size of 10000 and 8 reader threads, I got the following results:
So clearly it depends on exactly what you're doing with these collections.
.NET Framework 4 中似乎有一个 bug,微软在 4.5 中修复了这个错误,看来他们没想到 ConcurrentBag 会被大量使用。
请参阅以下 Ayende 帖子了解更多信息
http ://ayende.com/blog/156097/the-high-cost-of-concurrentbag-in-net-4-0
There seems to be a bug in the .NET Framework 4 that Microsoft fixed in 4.5, it seems they didn't expect ConcurrentBag to be used a lot.
See the following Ayende post for more info
http://ayende.com/blog/156097/the-high-cost-of-concurrentbag-in-net-4-0
作为一般答案:
总而言之,决定哪种方式更快的是高度依赖于所使用的数据结构以及其他问题中的锁争用量(例如,共享/独占类型安排中的读取器数量与写入器数量)。
你的具体例子有很高的争议程度,所以我必须说我对这种行为感到惊讶。另一方面,保留锁时完成的工作量非常小,所以毕竟锁本身可能很少有争用。 ConcurrentBag 并发处理的实现也可能存在缺陷,这使得您的特定示例(频繁插入且没有读取)成为一个糟糕的用例。
As a general answer:
To summarize, the decision of which way is faster is highly dependant on the data structures employed and the amount of contention for the locks among other issues (e.g., num readers vs. writers in a shared/exclusive type arrangement).
Your particular example has a very high degree of contention, so I must say I am surprised by the behavior. On the other hand, the amount of work done while the lock is kept is very small, so maybe there is little contention for the lock itself, after all. There could also be deficiencies in the implementation of ConcurrentBag's concurrency handling which makes your particular example (with frequent inserts and no reads) a bad use case for it.
使用 MS 的竞争可视化工具查看该程序表明,与简单锁定
List
相比,ConcurrentBag
与并行插入相关的成本要高得多。我注意到的一件事是,启动第一个ConcurrentBag
运行(冷运行)似乎会产生与启动 6 个线程(在我的机器上使用)相关的成本。然后将 5 或 6 个线程与List
代码一起使用,这样速度更快(热运行)。在列表后添加另一个ConcurrentBag
运行表明它比第一次(热运行)花费的时间更少。从我在争用中看到的情况来看,ConcurrentBag实现分配内存花费了大量时间。从
List
代码中删除显式大小分配会减慢速度,但不足以产生影响。编辑:似乎是
ConcurrentBag
在内部为每个Thread.CurrentThread
保留一个列表,锁定 2-4 次,具体取决于是否它在一个新线程上运行,并至少执行一个Interlocked.Exchange
。正如 MSDN 中所述:“针对同一线程同时生成和使用存储在包中的数据的场景进行了优化。”这是与原始列表相比性能下降的最可能的解释。Looking at the program using MS's contention visualizer shows that
ConcurrentBag<T>
has a much higher cost associated with parallel insertion than simply locking on aList<T>
. One thing I noticed is there appears to be a cost associated with spinning up the 6 threads (used on my machine) to begin the firstConcurrentBag<T>
run (cold run). 5 or 6 threads are then used with theList<T>
code, which is faster (warm run). Adding anotherConcurrentBag<T>
run after the list shows it takes less time than the first (warm run).From what I'm seeing in the contention, a lot of time is spent in the
ConcurrentBag<T>
implementation allocating memory. Removing the explicit allocation of size from theList<T>
code slows it down, but not enough to make a difference.EDIT: it appears to be that the
ConcurrentBag<T>
internally keeps a list perThread.CurrentThread
, locks 2-4 times depending on if it is running on a new thread, and performs at least oneInterlocked.Exchange
. As noted in MSDN: "optimized for scenarios where the same thread will be both producing and consuming data stored in the bag." This is the most likely explanation for your performance decrease versus a raw list.这个问题已经在 .NET 4.5 中得到解决。根本问题是 ConcurrentBag 使用的 ThreadLocal 没有预期会有很多实例。该问题已得到修复,现在可以运行得相当快。
来源 - . NET 4.0
This is already resolved in .NET 4.5. The underlying issue was that ThreadLocal, which ConcurrentBag uses, didn’t expect to have a lot of instances. That has been fixed, and now can run fairly fast.
source - The HIGH cost of ConcurrentBag in .NET 4.0
正如 @Darin-Dimitrov 所说,我怀疑您的 Parallel.For 实际上并没有在两个结果中产生相同数量的线程。尝试手动创建 N 个线程,以确保在这两种情况下您确实看到了线程争用。
As @Darin-Dimitrov said, I suspect that your Parallel.For isn't actually spawning the same number of threads in each of the two results. Try manually creating N threads to ensure that you are actually seeing thread contention in both cases.
您基本上只有很少的并发写入并且没有争用(
Parallel.For
并不一定意味着很多线程)。尝试并行写入,您将观察到不同的结果:You basically have very few concurrent writes and no contention (
Parallel.For
doesn't necessarily mean many threads). Try parallelizing the writes and you will observe different results:我的猜测是锁不会经历太多争用。我建议阅读以下文章: Java 理论与实践:有缺陷的微基准剖析。本文讨论了锁微基准测试。正如文章中所述,在这种情况下需要考虑很多事情。
My guess is that locks don't experience much contention. I would recommend reading following article: Java theory and practice: Anatomy of a flawed microbenchmark. The article discusses a lock microbenchmark. As stated in the article there are a lot of things to take into consideration in this kind of situations.
看到他们两者之间的扩展将会很有趣。
两个问题
1)bag 与 list 的读取速度有多快,记得在 list 上加锁
2)当另一个线程正在写入时,bag 与 list 的读取速度有多快
It would be interesting to see scaling between the two of them.
Two questions
1) how fast is bag vs list for reading, remember to put a lock on the list
2) how fast is bag vs list for reading while another thread is writing
由于循环体很小,您可以尝试使用 Partitioner 类的 Create 方法...
如何:加速小循环体
Because the loop body is small, you could try using the Partitioner class Create method...
How to: Speed Up Small Loop Bodies
看起来 ConcurrentBag 只是比其他并发集合慢。
我认为这是一个实现问题 - ANTS Profiler 显示它在几个地方陷入困境 - 包括数组副本。
使用并发字典的速度要快数千倍。
It appears that ConcurrentBag is just slower than the other concurrent collections.
I think it's an implementation problem- ANTS Profiler shows that it is gets bogged down in a couple of places - including an array copy.
Using concurrent dictionary is thousands of times faster.