我发现自己对 的存在非常感兴趣即将推出的 .NET 4.0 框架中的 ConcurrentBag
类:
当顺序无关紧要时,袋子对于存储物品非常有用,并且与套装不同,袋子支持重复。
我的问题是:这个想法如何实施?我熟悉的大多数集合本质上相当于(在幕后)某种形式的数组,其中顺序可能并不“重要”,但有一个顺序(这就是为什么,即使它不不需要,枚举几乎总是会经历一个未更改的集合,无论是 List
、Queue
、Stack
等。顺序)。
如果我必须猜测,我可能会建议在内部它可能是一个 Dictionary>
;但这实际上似乎很可疑,因为仅使用任何类型T
作为键是没有意义的。
我所期待/希望的是,这实际上是一个已经在某个地方“弄清楚”的已建立的对象类型,并且了解该已建立类型的人可以告诉我它。这对我来说太不寻常了——这些概念在现实生活中很容易理解,但作为开发人员很难转化为可用的类——这就是为什么我对可能性感到好奇。
编辑:
一些响应者建议 Bag
内部可能是哈希表的一种形式。这也是我最初的想法,但我预见到了这个想法的两个问题:
- 当您没有适合相关类型的哈希码函数时,哈希表并不是那么有用。
- 简单地跟踪集合中对象的“计数”与存储对象不同。
正如 Meta-Knight 所建议的,也许一个例子会让这一点更清楚:
public class ExpensiveObject() {
private ExpensiveObject() {
// very intense operations happening in here
}
public ExpensiveObject CreateExpensiveObject() {
return new ExpensiveObject();
}
}
static void Main() {
var expensiveObjects = new ConcurrentBag<ExpensiveObject>();
for (int i = 0; i < 5; i++) {
expensiveObjects.Add(ExpensiveObject.CreateExpensiveObject());
}
// after this point in the code, I want to believe I have 5 new
// expensive objects in my collection
while (expensiveObjects.Count > 0) {
ExpensiveObject expObj = null;
bool objectTaken = expensiveObjects.TryTake(out expObj);
if (objectTaken) {
// here I THINK I am queueing a particular operation to be
// executed on 5 separate threads for 5 separate objects,
// but if ConcurrentBag is a hashtable then I've just received
// the object 5 times and so I am working on the same object
// from 5 threads at the same time!
ThreadPool.QueueUserWorkItem(DoWorkOnExpensiveObject, expObj);
} else {
break;
}
}
}
static void DoWorkOnExpensiveObject(object obj) {
ExpensiveObject expObj = obj as ExpensiveObject;
if (expObj != null) {
// some work to be done
}
}
I find myself very intrigued by the existence of a ConcurrentBag<T>
class in the upcoming .NET 4.0 framework:
Bags are useful for storing objects when ordering doesn't matter, and unlike sets, bags support duplicates.
My question is: how might this idea be implemented? Most collections I'm familiar with essentially amount to (under the hood) some form of array, in which order may not "matter," but there is an order (which is why, even though it doesn't need to, enumeration will pretty much always go through an unchanged collection, be it List
, Queue
, Stack
, etc. in the same sequence).
If I had to guess, I might suggest that internally it could be a Dictionary<T, LinkedList<T>>
; but that actually seems quite dubious considering it wouldn't make sense to use just any type T
as a key.
What I'm expecting/hoping is that this is actually an established object type that has already been "figured out" somewhere, and that somebody who knows of this established type can tell me about it. It's just so unusual to me--one of those concepts that's easy to understand in real life, but is difficult to translate into a usable class as a developer--which is why I'm curious as to the possibilities.
EDIT:
Some responders have suggested that a Bag
could be a form of a hashtable internally. This was my initial thought as well, but I foresaw two problems with this idea:
- A hashtable is not all that useful when you don't have a suitable hashcode function for the type in question.
- Simply tracking an object's "count" in a collection is not the same as storing the object.
As Meta-Knight suggested, perhaps an example would make this more clear:
public class ExpensiveObject() {
private ExpensiveObject() {
// very intense operations happening in here
}
public ExpensiveObject CreateExpensiveObject() {
return new ExpensiveObject();
}
}
static void Main() {
var expensiveObjects = new ConcurrentBag<ExpensiveObject>();
for (int i = 0; i < 5; i++) {
expensiveObjects.Add(ExpensiveObject.CreateExpensiveObject());
}
// after this point in the code, I want to believe I have 5 new
// expensive objects in my collection
while (expensiveObjects.Count > 0) {
ExpensiveObject expObj = null;
bool objectTaken = expensiveObjects.TryTake(out expObj);
if (objectTaken) {
// here I THINK I am queueing a particular operation to be
// executed on 5 separate threads for 5 separate objects,
// but if ConcurrentBag is a hashtable then I've just received
// the object 5 times and so I am working on the same object
// from 5 threads at the same time!
ThreadPool.QueueUserWorkItem(DoWorkOnExpensiveObject, expObj);
} else {
break;
}
}
}
static void DoWorkOnExpensiveObject(object obj) {
ExpensiveObject expObj = obj as ExpensiveObject;
if (expObj != null) {
// some work to be done
}
}
发布评论
评论(6)
如果您查看
ConcurrentBag
的详细信息,您会发现它在内部基本上是一个自定义的链表。由于 Bags 可以包含重复项,并且无法通过索引访问,因此双向链表是一个非常好的实现选项。这允许对插入和删除进行相当细粒度的锁定(您不必锁定整个集合,只需锁定插入/删除位置周围的节点)。由于您不担心重复,因此不涉及散列。这使得双链表变得完美。
If you look at the details of
ConcurrentBag<T>
, you'll find that it's, internally, basically a customized linked list.Since Bags can contain duplicates, and are not accessible by index, a doubly linked list is a very good option for implementation. This allows locking to be fairly fine grained for insert and removal (you don't have to lock the entire collection, just the nodes around where you're inserting/removing). Since you're not worried about duplicates, no hashing is involved. This makes a double linked list perfect.
这里有一些关于 ConcurrentBag 的好信息: http://geekswithblogs.net/BlackRabbitCoder/archive/2011/03/03/c.net-little-wonders-concurrentbag-and-blockingcollection.aspx
There's some good info on ConcurrentBag here: http://geekswithblogs.net/BlackRabbitCoder/archive/2011/03/03/c.net-little-wonders-concurrentbag-and-blockingcollection.aspx
由于顺序并不重要,ConcurrentBag 可以在幕后使用哈希表来快速检索数据。但与哈希集不同,包接受重复项。也许每个项目都可以与 Count 属性配对,添加项目时该属性设置为 1。如果您第二次添加相同的项目,则只需增加该项目的 Count 属性即可。
然后,要删除计数大于 1 的项目,您只需减少该项目的计数即可。如果计数为 1,您将从哈希表中删除项目计数对。
Since ordering doesn't matter a ConcurrentBag could be using a hashtable behind the scenes to allow for fast retrieval of data. But unlike a Hashset a bag accepts duplicates. Maybe each item could be paired with a Count property which is set to 1 when an item is added. If you add the same item for a second time, you could just increment the Count property of this item.
Then, to remove an item which has a count greater than one, you could just decrease the Count for this item. If the count was one, you would remove the Item-Count pair from the hashtable.
好吧,在 Smalltalk(Bag 概念的由来)中,集合基本上与哈希相同,尽管它允许重复。但它不存储重复的对象,而是维护“出现计数”,例如每个对象的引用计数。如果 ConcurrentBag 是一个忠实的实现,这应该为您提供一个起点。
Well, in smalltalk (where the notion of a Bag came from), the collection is basically the same as a hash, albeit one that allows duplicates. Instead of storing the duplicate object though, it maintains an "occurrence count", e.g., a refcount of each object. If ConcurrentBag is a faithful implementation, this should give you a starting point.
我相信“Bag”的概念与“Multiset”同义。
如果您对它们的实现方式感兴趣,有许多“Bag”/“Multiset”实现(这些恰好是 java)是开源的。
这些实现表明,“包”可以根据您的需要以多种方式实现。有 TreeMultiset、HashMultiset、LinkedHashMultiset、ConcurrentHashMultiset 的示例。
Google 收藏集
Google 有许多 “MultiSet”实现,其中之一是 ConcurrentHashMultiset。
Apache Commons
Apache 有许多“Bag”实现。
I believe the concept of a 'Bag' is synonymous with 'Multiset'.
There are a number of "Bag"/"Multiset" implementations (these happen to be java) that are open source if you are interested in how they are implemented.
These implementations show that a 'Bag' can be implemented in any number of ways depending on your needs. There are examples of TreeMultiset, HashMultiset, LinkedHashMultiset, ConcurrentHashMultiset.
Google Collections
Google has a number of "MultiSet" implementations, one being a ConcurrentHashMultiset.
Apache Commons
Apache has a number of "Bag" implementations.
System.Collections.Concurrent 命名空间现已开源,现在可以在此处找到 ConcurrentBag 的实现:
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections.Concurrent/src /System/Collections/Concurrent/ConcurrentBag.cs
以下是截至 2022 年 1 月 30 日的实现。它已获得 MIT 许可。
The System.Collections.Concurrent namespace is now open source, and the implementation for ConcurrentBag can now be found here:
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentBag.cs
Below is the implementation as of Jan 30, 2022. It is MIT licensed.