为什么 ConcurrentBag.Net (4.0) 这么慢?我做错了吗?

发布于 2024-10-14 04:38:12 字数 3302 浏览 15 评论 0原文

在开始项目之前,我编写了一个简单的测试来比较 (System.Collections.Concurrent) 中的 ConcurrentBag 相对于锁定和同步的性能。列表。我非常惊讶 ConcurrentBag 比使用简单 List 锁定慢 10 倍以上。据我了解,当读者和作者是同一线程时,ConcurrentBag 效果最好。但没想到它的性能会比传统锁差这么多。

我已经运行了一个测试,使用两个并行 for 循环写入和读取列表/包。然而,写入本身显示出巨大的差异:

private static void ConcurrentBagTest()
   {
        int collSize = 10000000;
        Stopwatch stopWatch = new Stopwatch();
        ConcurrentBag<int> bag1 = new ConcurrentBag<int>();

        stopWatch.Start();


        Parallel.For(0, collSize, delegate(int i)
        {
            bag1.Add(i);
        });


        stopWatch.Stop();
        Console.WriteLine("Elapsed Time = {0}", 
                          stopWatch.Elapsed.TotalSeconds);
 }

在我的机器上,这需要 3-4 秒才能运行,而此代码需要 0.5 - 0.9 秒:

       private static void LockCollTest()
       {
        int collSize = 10000000;
        object list1_lock=new object();
        List<int> lst1 = new List<int>(collSize);

        Stopwatch stopWatch = new Stopwatch();
        stopWatch.Start();


        Parallel.For(0, collSize, delegate(int i)
            {
                lock(list1_lock)
                {
                    lst1.Add(i);
                }
            });

        stopWatch.Stop();
        Console.WriteLine("Elapsed = {0}", 
                          stopWatch.Elapsed.TotalSeconds);
       }

正如我提到的,进行并发读取和写入并不能帮助并发袋测试。我做错了什么或者这个数据结构真的很慢吗?

[编辑] - 我删除了任务,因为我在这里不需要它们(完整代码有另一个任务读取)

[编辑] 非常感谢您的回答。我很难选择“正确的答案”,因为它似乎是几个答案的混合体。

正如 Michael Goldshteyn 指出的那样,速度实际上取决于数据。 Darin 指出,应该有更多的争用,让 ConcurrentBag 更快,而 Parallel.For 不一定启动相同数量的线程。需要注意的一点是,不要在锁内做任何不必要的事情。在上面的情况下,除了将值分配给临时变量之外,我没有看到自己在锁内执行任何操作。

此外,sixlettervariables 指出,碰巧正在运行的线程数量也可能会影响结果,尽管我尝试以相反的顺序运行原始测试,并且 ConcurrentBag 仍然较慢。

我从 15 个任务开始进行了一些测试,结果取决于集合大小等。然而,对于多达 100 万次插入,ConcurrentBag 的性能几乎与锁定列表一样好,甚至更好。超过 100 万时,锁定有时似乎要快得多,但我的项目可能永远不会有更大的数据结构。 这是我运行的代码:

        int collSize = 1000000;
        object list1_lock=new object();
        List<int> lst1 = new List<int>();
        ConcurrentBag<int> concBag = new ConcurrentBag<int>();
        int numTasks = 15;

        int i = 0;

        Stopwatch sWatch = new Stopwatch();
        sWatch.Start();
         //First, try locks
        Task.WaitAll(Enumerable.Range(1, numTasks)
           .Select(x => Task.Factory.StartNew(() =>
            {
                for (i = 0; i < collSize / numTasks; i++)
                {
                    lock (list1_lock)
                    {
                        lst1.Add(x);
                    }
                }
            })).ToArray());

        sWatch.Stop();
        Console.WriteLine("lock test. Elapsed = {0}", 
            sWatch.Elapsed.TotalSeconds);

        // now try concurrentBag
        sWatch.Restart();
        Task.WaitAll(Enumerable.Range(1, numTasks).
                Select(x => Task.Factory.StartNew(() =>
            {
                for (i = 0; i < collSize / numTasks; i++)
                {
                    concBag.Add(x);
                }
            })).ToArray());

        sWatch.Stop();
        Console.WriteLine("Conc Bag test. Elapsed = {0}",
               sWatch.Elapsed.TotalSeconds);

Before I started a project, I wrote a simple test to compare the performance of ConcurrentBag from (System.Collections.Concurrent) relative to locking & lists. I am extremely surprised that ConcurrentBag is over 10 times slower than locking with a simple List. From what I understand, the ConcurrentBag works best when the reader and writer is the same thread. However, I hadn't thought it's performance would be so much worse than traditional locks.

I have run a test with two Parallel for loops writing to and reading from a list/bag. However, the write by itself shows a huge difference:

private static void ConcurrentBagTest()
   {
        int collSize = 10000000;
        Stopwatch stopWatch = new Stopwatch();
        ConcurrentBag<int> bag1 = new ConcurrentBag<int>();

        stopWatch.Start();


        Parallel.For(0, collSize, delegate(int i)
        {
            bag1.Add(i);
        });


        stopWatch.Stop();
        Console.WriteLine("Elapsed Time = {0}", 
                          stopWatch.Elapsed.TotalSeconds);
 }

On my box, this takes between 3-4 secs to run, compared to 0.5 - 0.9 secs of this code:

       private static void LockCollTest()
       {
        int collSize = 10000000;
        object list1_lock=new object();
        List<int> lst1 = new List<int>(collSize);

        Stopwatch stopWatch = new Stopwatch();
        stopWatch.Start();


        Parallel.For(0, collSize, delegate(int i)
            {
                lock(list1_lock)
                {
                    lst1.Add(i);
                }
            });

        stopWatch.Stop();
        Console.WriteLine("Elapsed = {0}", 
                          stopWatch.Elapsed.TotalSeconds);
       }

As I mentioned, doing concurrent reads and writes doesn't help the concurrent bag test. Am I doing something wrong or is this data structure just really slow?

[EDIT] - I removed the Tasks because I don't need them here (Full code had another task reading)

[EDIT]
Thanks a lot for the answers. I am having a hard time picking "the right answer" since it seems to be a mix of a few answers.

As Michael Goldshteyn pointed out, the speed really depends on the data.
Darin pointed out that there should be more contention for ConcurrentBag to be faster, and Parallel.For doesn't necessarily start the same number of threads. One point to take away is to not do anything you don't have to inside a lock. In the above case, I don't see myself doing anything inside the lock except may be assigning the value to a temp variable.

Additionally, sixlettervariables pointed out that the number of threads that happen to be running may also affect results, although I tried running the original test in reverse order and ConcurrentBag was still slower.

I ran some tests with starting 15 Tasks and the results depended on the collection size among other things. However, ConcurrentBag performed almost as well as or better than locking a list, for up to 1 million insertions. Above 1 million, locking seemed to be much faster sometimes, but I'll probably never have a larger datastructure for my project.
Here's the code I ran:

        int collSize = 1000000;
        object list1_lock=new object();
        List<int> lst1 = new List<int>();
        ConcurrentBag<int> concBag = new ConcurrentBag<int>();
        int numTasks = 15;

        int i = 0;

        Stopwatch sWatch = new Stopwatch();
        sWatch.Start();
         //First, try locks
        Task.WaitAll(Enumerable.Range(1, numTasks)
           .Select(x => Task.Factory.StartNew(() =>
            {
                for (i = 0; i < collSize / numTasks; i++)
                {
                    lock (list1_lock)
                    {
                        lst1.Add(x);
                    }
                }
            })).ToArray());

        sWatch.Stop();
        Console.WriteLine("lock test. Elapsed = {0}", 
            sWatch.Elapsed.TotalSeconds);

        // now try concurrentBag
        sWatch.Restart();
        Task.WaitAll(Enumerable.Range(1, numTasks).
                Select(x => Task.Factory.StartNew(() =>
            {
                for (i = 0; i < collSize / numTasks; i++)
                {
                    concBag.Add(x);
                }
            })).ToArray());

        sWatch.Stop();
        Console.WriteLine("Conc Bag test. Elapsed = {0}",
               sWatch.Elapsed.TotalSeconds);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

娇纵 2024-10-21 04:38:13

让我问您这个问题:您拥有一个不断向集合添加内容并且从不读取集合的应用程序,这有多现实?这样的收藏有什么用呢? (这不是一个纯粹的反问。我可以想象有一些用途,例如,您只在关闭时(用于日志记录)或在用户请求时从集合中读取。我相信这些场景是不过,相当罕见。)

这就是您的代码正在模拟的内容。除了列表必须调整其内部数组大小的偶尔情况外,调用 List.Add 的速度都快如闪电;但这会被很快发生的所有其他添加所消除。因此,在这种情况下,您不太可能看到大量的争用,尤其在具有例如 8 个核心的个人电脑上进行测试(正如您在某处评论中所说的那样)。 也许您可能会在 24 核计算机等设备上看到更多争用,其中许多内核可能会同时尝试将内容添加到列表中。

争用更有可能发生在您从收藏中阅读的地方,尤其是在。在 foreach 循环(或相当于 foreach 底层的 LINQ 查询)中,需要锁定整个操作,以便您在迭代集合时不会修改集合。

如果您能够真实地重现此场景,我相信您会看到 ConcurrentBag 的规模比当前测试显示的要好得多。


更新这里是我编写的一个程序,用于在上述场景中比较这些集合(多个作者,多个读者)。使用集合大小为 10000 和 8 个读者线程运行 25 次试验,我得到了以下结果:

Took 529.0095 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 39.5237 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
Took 309.4475 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 81.1967 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
Took 228.7669 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 164.8376 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
[ ... ]
Average list time: 176.072456 ms.
Average bag time: 59.603656 ms.

很明显,这取决于您对这些集合所做的操作。

Let me ask you this: how realistic is it that you'd have an application which is constantly adding to a collection and never reading from it? What's the use of such a collection? (This is not a purely rhetorical question. I could imagine there being uses where, e.g., you only read from the collection on shutdown (for logging) or when requested by the user. I believe these scenarios are fairly rare, though.)

This is what your code is simulating. Calling List<T>.Add is going to be lightning-fast in all but the occasional case where the list has to resize its internal array; but this is smoothed out by all the other adds that happen quite quickly. So you're not likely to see a significant amount of contention in this context, especially testing on a personal PC with, e.g., even 8 cores (as you stated you have in a comment somewhere). Maybe you might see more contention on something like a 24-core machine, where many cores can be trying to add to the list literally at the same time.

Contention is much more likely to creep in where you read from your collection, esp. in foreach loops (or LINQ queries which amount to foreach loops under the hood) which require locking the entire operation so that you aren't modifying your collection while iterating over it.

If you can realistically reproduce this scenario, I believe you will see ConcurrentBag<T> scale much better than your current test is showing.


Update: Here is a program I wrote to compare these collections in the scenario I described above (multiple writers, many readers). Running 25 trials with a collection size of 10000 and 8 reader threads, I got the following results:

Took 529.0095 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 39.5237 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
Took 309.4475 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 81.1967 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
Took 228.7669 ms to add 10000 elements to a List<double> with 8 reader threads.
Took 164.8376 ms to add 10000 elements to a ConcurrentBag<double> with 8 reader threads.
[ ... ]
Average list time: 176.072456 ms.
Average bag time: 59.603656 ms.

So clearly it depends on exactly what you're doing with these collections.

空心↖ 2024-10-21 04:38:13

.NET Framework 4 中似乎有一个 bug,微软在 4.5 中修复了这个错误,看来他们没想到 ConcurrentBag 会被大量使用。

请参阅以下 Ayende 帖子了解更多信息

http ://ayende.com/blog/156097/the-high-cost-of-concurrentbag-in-net-4-0

There seems to be a bug in the .NET Framework 4 that Microsoft fixed in 4.5, it seems they didn't expect ConcurrentBag to be used a lot.

See the following Ayende post for more info

http://ayende.com/blog/156097/the-high-cost-of-concurrentbag-in-net-4-0

白馒头 2024-10-21 04:38:13

作为一般答案:

  • 如果数据争用很少或没有(即锁),使用锁定的并发集合可以非常快。这是因为这样的集合类通常是使用非常便宜的锁定原语构建的,尤其是在不满足时。
  • 无锁集合可能会更慢,因为避免锁的技巧以及其他瓶颈(例如错误共享、实现其无锁性质所需的复杂性导致缓存未命中等)......

总而言之,决定哪种方式更快的是高度依赖于所使用的数据结构以及其他问题中的锁争用量(例如,共享/独占类型安排中的读取器数量与写入器数量)。

你的具体例子有很高的争议程度,所以我必须说我对这种行为感到惊讶。另一方面,保留锁时完成的工作量非常小,所以毕竟锁本身可能很少有争用。 ConcurrentBag 并发处理的实现也可能存在缺陷,这使得您的特定示例(频繁插入且没有读取)成为一个糟糕的用例。

As a general answer:

  • Concurrent collections that use locking can be very fast if there is little or no contention for their data (i.e., locks). This is due to the fact that such collection classes are often built using very inexpensive locking primitives, especially when uncontented.
  • Lockless collections can be slower, because of tricks used to avoid locks and due to other bottlenecks such as false sharing, complexity required to implement their lockless nature leading to cache misses, etc...

To summarize, the decision of which way is faster is highly dependant on the data structures employed and the amount of contention for the locks among other issues (e.g., num readers vs. writers in a shared/exclusive type arrangement).

Your particular example has a very high degree of contention, so I must say I am surprised by the behavior. On the other hand, the amount of work done while the lock is kept is very small, so maybe there is little contention for the lock itself, after all. There could also be deficiencies in the implementation of ConcurrentBag's concurrency handling which makes your particular example (with frequent inserts and no reads) a bad use case for it.

怀念你的温柔 2024-10-21 04:38:13

使用 MS 的竞争可视化工具查看该程序表明,与简单锁定 List 相比,ConcurrentBag 与并行插入相关的成本要高得多。我注意到的一件事是,启动第一个 ConcurrentBag 运行(冷运行)似乎会产生与启动 6 个线程(在我的机器上使用)相关的成本。然后将 5 或 6 个线程与 List 代码一起使用,这样速度更快(热运行)。在列表后添加另一个 ConcurrentBag 运行表明它比第一次(热运行)花费的时间更少。

从我在争用中看到的情况来看,ConcurrentBag实现分配内存花费了大量时间。从 List 代码中删除显式大小分配会减慢速度,但不足以产生影响。

编辑:似乎是 ConcurrentBag 在内部为每个 Thread.CurrentThread 保留一个列表,锁定 2-4 次,具体取决于是否它在一个新线程上运行,并至少执行一个 Interlocked.Exchange。正如 MSDN 中所述:“针对同一线程同时生成和使用存储在包中的数据的场景进行了优化。”这是与原始列表相比性能下降的最可能的解释。

Looking at the program using MS's contention visualizer shows that ConcurrentBag<T> has a much higher cost associated with parallel insertion than simply locking on a List<T>. One thing I noticed is there appears to be a cost associated with spinning up the 6 threads (used on my machine) to begin the first ConcurrentBag<T> run (cold run). 5 or 6 threads are then used with the List<T> code, which is faster (warm run). Adding another ConcurrentBag<T> run after the list shows it takes less time than the first (warm run).

From what I'm seeing in the contention, a lot of time is spent in the ConcurrentBag<T> implementation allocating memory. Removing the explicit allocation of size from the List<T> code slows it down, but not enough to make a difference.

EDIT: it appears to be that the ConcurrentBag<T> internally keeps a list per Thread.CurrentThread, locks 2-4 times depending on if it is running on a new thread, and performs at least one Interlocked.Exchange. As noted in MSDN: "optimized for scenarios where the same thread will be both producing and consuming data stored in the bag." This is the most likely explanation for your performance decrease versus a raw list.

清欢 2024-10-21 04:38:13

这个问题已经在 .NET 4.5 中得到解决。根本问题是 ConcurrentBag 使用的 ThreadLocal 没有预期会有很多实例。该问题已得到修复,现在可以运行得相当快。

来源 - . NET 4.0

This is already resolved in .NET 4.5. The underlying issue was that ThreadLocal, which ConcurrentBag uses, didn’t expect to have a lot of instances. That has been fixed, and now can run fairly fast.

source - The HIGH cost of ConcurrentBag in .NET 4.0

不羁少年 2024-10-21 04:38:13

正如 @Darin-Dimitrov 所说,我怀疑您的 Parallel.For 实际上并没有在两个结果中产生相同数量的线程。尝试手动创建 N 个线程,以确保在这两种情况下您确实看到了线程争用。

As @Darin-Dimitrov said, I suspect that your Parallel.For isn't actually spawning the same number of threads in each of the two results. Try manually creating N threads to ensure that you are actually seeing thread contention in both cases.

孤独患者 2024-10-21 04:38:13

您基本上只有很少的并发写入并且没有争用(Parallel.For 并不一定意味着很多线程)。尝试并行写入,您将观察到不同的结果:

class Program
{
    private static object list1_lock = new object();
    private const int collSize = 1000;

    static void Main()
    {
        ConcurrentBagTest();
        LockCollTest();
    }

    private static void ConcurrentBagTest()
    {
        var bag1 = new ConcurrentBag<int>();
        var stopWatch = Stopwatch.StartNew();
        Task.WaitAll(Enumerable.Range(1, collSize).Select(x => Task.Factory.StartNew(() =>
        {
            Thread.Sleep(5);
            bag1.Add(x);
        })).ToArray());
        stopWatch.Stop();
        Console.WriteLine("Elapsed Time = {0}", stopWatch.Elapsed.TotalSeconds);
    }

    private static void LockCollTest()
    {
        var lst1 = new List<int>(collSize);
        var stopWatch = Stopwatch.StartNew();
        Task.WaitAll(Enumerable.Range(1, collSize).Select(x => Task.Factory.StartNew(() =>
        {
            lock (list1_lock)
            {
                Thread.Sleep(5);
                lst1.Add(x);
            }
        })).ToArray());
        stopWatch.Stop();
        Console.WriteLine("Elapsed = {0}", stopWatch.Elapsed.TotalSeconds);
    }
}

You basically have very few concurrent writes and no contention (Parallel.For doesn't necessarily mean many threads). Try parallelizing the writes and you will observe different results:

class Program
{
    private static object list1_lock = new object();
    private const int collSize = 1000;

    static void Main()
    {
        ConcurrentBagTest();
        LockCollTest();
    }

    private static void ConcurrentBagTest()
    {
        var bag1 = new ConcurrentBag<int>();
        var stopWatch = Stopwatch.StartNew();
        Task.WaitAll(Enumerable.Range(1, collSize).Select(x => Task.Factory.StartNew(() =>
        {
            Thread.Sleep(5);
            bag1.Add(x);
        })).ToArray());
        stopWatch.Stop();
        Console.WriteLine("Elapsed Time = {0}", stopWatch.Elapsed.TotalSeconds);
    }

    private static void LockCollTest()
    {
        var lst1 = new List<int>(collSize);
        var stopWatch = Stopwatch.StartNew();
        Task.WaitAll(Enumerable.Range(1, collSize).Select(x => Task.Factory.StartNew(() =>
        {
            lock (list1_lock)
            {
                Thread.Sleep(5);
                lst1.Add(x);
            }
        })).ToArray());
        stopWatch.Stop();
        Console.WriteLine("Elapsed = {0}", stopWatch.Elapsed.TotalSeconds);
    }
}
想你的星星会说话 2024-10-21 04:38:13

我的猜测是锁不会经历太多争用。我建议阅读以下文章: Java 理论与实践:有缺陷的微基准剖析。本文讨论了锁微基准测试。正如文章中所述,在这种情况下需要考虑很多事情。

My guess is that locks don't experience much contention. I would recommend reading following article: Java theory and practice: Anatomy of a flawed microbenchmark. The article discusses a lock microbenchmark. As stated in the article there are a lot of things to take into consideration in this kind of situations.

雨的味道风的声音 2024-10-21 04:38:13

看到他们两者之间的扩展将会很有趣。

两个问题

1)bag 与 list 的读取速度有多快,记得在 list 上加锁

2)当另一个线程正在写入时,bag 与 list 的读取速度有多快

It would be interesting to see scaling between the two of them.

Two questions

1) how fast is bag vs list for reading, remember to put a lock on the list

2) how fast is bag vs list for reading while another thread is writing

半枫 2024-10-21 04:38:13

由于循环体很小,您可以尝试使用 Partitioner 类的 Create 方法...

这使您能够提供
委托体的顺序循环,
以便仅调用委托
每个分区一次,而不是一次
每次迭代

如何:加速小循环体

Because the loop body is small, you could try using the Partitioner class Create method...

which enables you to provide a
sequential loop for the delegate body,
so that the delegate is invoked only
once per partition, instead of once
per iteration

How to: Speed Up Small Loop Bodies

可遇━不可求 2024-10-21 04:38:13

看起来 ConcurrentBag 只是比其他并发集合慢。

我认为这是一个实现问题 - ANTS Profiler 显示它在几个地方陷入困境 - 包括数组副本。

使用并发字典的速度要快数千倍。

It appears that ConcurrentBag is just slower than the other concurrent collections.

I think it's an implementation problem- ANTS Profiler shows that it is gets bogged down in a couple of places - including an array copy.

Using concurrent dictionary is thousands of times faster.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文