.NET 集合类

发布于 2024-10-04 11:28:38 字数 149 浏览 2 评论 0原文

相关数据组（例如零件列表等）可以使用数组（零件数组）或使用集合来处理。我知道，与集合相比，使用数组时，插入、删除和其他一些操作会对性能产生影响。这是否意味着集合内部不使用数组？如果是这样，List、Collection 等集合使用的数据结构是什么？

内部如何处理集合？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゞ记忆︶ㄣ 2024-10-11 11:28:38

List 使用内部数组。在列表开头附近删除/插入项目比在列表末尾附近删除/插入项目成本更高，因为内部数组的整个内容需要向一个方向移动。此外，一旦您在内部列表已满时尝试添加项目，将构造一个新的、更大的数组，复制内容，并丢弃旧数组。

Collection 类与无参数构造函数一起使用时，会在内部使用 List。因此，在性能方面，它们是相同的，除了由包装引起的开销之外。（本质上是多了一层间接，在大多数情况下可以忽略不计。）

LinkedList 顾名思义，是一个链接列表。这将牺牲迭代速度来换取插入/删除速度。由于迭代意味着无限地遍历指针到指针到指针，因此这将需要更多的工作。除了指针遍历之外，两个节点可能不会被分配在彼此靠近的任何位置，从而降低了 CPU RAM 缓存的有效性。

然而，插入或删除节点所需的时间是恒定的，因为无论列表的状态如何，它都需要相同数量的操作。（这不考虑实际找到要删除的项目或遍历列表以查找插入点必须完成的任何工作！）

如果您对集合的主要关注点是测试集合中是否有某些内容，则您可以可以考虑使用 HashSet 来代替。将项目添加到集合中的速度相对较快，介于插入列表和链接列表之间。物品的移除将再次相对较快。但真正的好处在于查找时间 - 测试 HashSet 是否包含某个项目不需要迭代整个列表。平均而言，它的执行速度比任何列表或链表结构都要快。

但是，HashSet 不能包含等效项。如果您的部分要求是被视为相等的两个项目（通过 Object.Equals(Object) 重载或通过实现 IEquatable）独立共存于集合，那么您根本无法使用 HashSet。此外，HashSet 不保证插入顺序，因此如果维护某种顺序很重要，您也不能使用 HashSet。

List<T> uses an internal array. Removing/inserting items near the beginning of the list will be more expensive than doing the same near the end of the list, since the entire contents of the internal array need to be shifted in one direction. Also, once you try to add an item when the internal list is full, a new, bigger array will be constructed, the contents copied, and the old array discarded.

The Collection<T> class, when used with the parameterless constructor, uses a List<T> internally. So performance-wise they will be identical, with the exception of overhead caused by wrapping. (Essentially one more level of indirection, which is going to be negligible in most scenarios.)

LinkedList<T> is, as its name implies, a linked list. This will sacrifice iteration speed for insertion/removal speed. Since iterating means traversing pointers-to-pointers-to-pointers ad infinitum, this is going to take more work overall. Aside from the pointer traversal, two nodes may not be allocated anywhere near each other, reducing the effectiveness of CPU RAM caches.

However, the amount of time required to insert or remove a node is constant, since it requires the same number of operations no matter the state of the list. (This does not take into account any work that must be done to actually locate the item to remove, or to traverse the list to find the insertion point!)

If your primary concern with your collection is testing if something is in the collection, you might consider a HashSet<T> instead. Addition of items to the set will be relatively fast, somewhere between insertion into a list and a linked list. Removal of items will again be relatively fast. But the real gain is in lookup time -- testing if a HashSet<T> contains an item does not require iterating the entire list. On average it will perform faster than any list or linked list structure.

However, a HashSet<T> cannot contain equivalent items. If part of your requirements is that two items that are considered equal (by an Object.Equals(Object) overload, or by implementing IEquatable<T>) coexist independently in the collection, then you simply cannot use a HashSet<T>. Also, HashSet<T> does not guarantee insertion order, so you also can't use a HashSet<T> if maintaining some sort of ordering is important.

回复收藏 0 原文

末が日狂欢 2024-10-11 11:28:38

实现简单集合有两种基本方法：

连续数组
链表

连续数组对于您提到的操作具有性能劣势，因为集合的内存空间要么是预先分配的，要么是根据集合的内容分配的。因此，删除或插入需要移动许多数组元素以保持整个集合连续且顺序正确。

链接列表消除了这些问题，因为集合中的项目不需要连续存储在内存中。相反，每个元素都包含对一个或多个其他元素的引用。因此，当进行插入时，相关项目会在内存中的任何位置创建，并且只需要修改对集合中已存在的一个或两个元素的引用。

例如：

LinkedList<object> c = new LinkedList<object>(); // a linked list
object[] a = new object[] { }; // a contiguous array

这当然是简化的。 LinkedList<> 的内部结构无疑比简单的单链表或双链表更复杂，但这就是基本结构。

There are two basic ways to implement a simple collection:

contiguous array
linked list

Contiguous arrays have performance disadvantages for the operations you mentioned because the memory space of the collection is either preallocated or allocated based on the contents of the collection. Thus deletion or insertion requires moving many array elements to keep the entire collection contiguous and in the proper order.

Linked lists remove these issues because the items in the collection do not need to be stored in memory contiguously. Instead each element contains a reference to one or more of the other elements. Thus, when an insertion is made, the item in question is created anywhere in memory and only the references on one or two of the elements already in the collection need to be modified.

For example:

LinkedList<object> c = new LinkedList<object>(); // a linked list
object[] a = new object[] { }; // a contiguous array

This is simplified of course. The internals of LinkedList<> are doubtless more complex than a simple singly or doubly linked list, but that is the basic structure.

回复收藏 0 原文