.NET 集合类
相关数据组(例如零件列表等)可以使用数组(零件数组)或使用集合来处理。我知道,与集合相比,使用数组时,插入、删除和其他一些操作会对性能产生影响。这是否意味着集合内部不使用数组?如果是这样,List、Collection 等集合使用的数据结构是什么?
内部如何处理集合?
Group of related data like a list of parts etc., can be handled either using Arrays(Array of Parts) or using Collection. I understand that When Arrays are used, Insertion, Deletion and some other operations have performance impact when it is compared with Collections. Does this mean that Arrays are not used internally by the collections?, If so what is the data structure used for collections like List, Collection etc?
How the collections are handled internally?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
List
使用内部数组。在列表开头附近删除/插入项目比在列表末尾附近删除/插入项目成本更高,因为内部数组的整个内容需要向一个方向移动。此外,一旦您在内部列表已满时尝试添加项目,将构造一个新的、更大的数组,复制内容,并丢弃旧数组。Collection
类与无参数构造函数一起使用时,会在内部使用List
。因此,在性能方面,它们是相同的,除了由包装引起的开销之外。 (本质上是多了一层间接,在大多数情况下可以忽略不计。)LinkedList
顾名思义,是一个链接列表。这将牺牲迭代速度来换取插入/删除速度。由于迭代意味着无限地遍历指针到指针到指针,因此这将需要更多的工作。除了指针遍历之外,两个节点可能不会被分配在彼此靠近的任何位置,从而降低了 CPU RAM 缓存的有效性。然而,插入或删除节点所需的时间是恒定的,因为无论列表的状态如何,它都需要相同数量的操作。 (这不考虑实际找到要删除的项目或遍历列表以查找插入点必须完成的任何工作!)
如果您对集合的主要关注点是测试集合中是否有某些内容,则您可以可以考虑使用
HashSet
来代替。将项目添加到集合中的速度相对较快,介于插入列表和链接列表之间。物品的移除将再次相对较快。但真正的好处在于查找时间 - 测试HashSet
是否包含某个项目不需要迭代整个列表。平均而言,它的执行速度比任何列表或链表结构都要快。但是,
HashSet
不能包含等效项。如果您的部分要求是被视为相等的两个项目(通过Object.Equals(Object)
重载或通过实现IEquatable
)独立共存于集合,那么您根本无法使用HashSet
。此外,HashSet
不保证插入顺序,因此如果维护某种顺序很重要,您也不能使用HashSet
。List<T>
uses an internal array. Removing/inserting items near the beginning of the list will be more expensive than doing the same near the end of the list, since the entire contents of the internal array need to be shifted in one direction. Also, once you try to add an item when the internal list is full, a new, bigger array will be constructed, the contents copied, and the old array discarded.The
Collection<T>
class, when used with the parameterless constructor, uses aList<T>
internally. So performance-wise they will be identical, with the exception of overhead caused by wrapping. (Essentially one more level of indirection, which is going to be negligible in most scenarios.)LinkedList<T>
is, as its name implies, a linked list. This will sacrifice iteration speed for insertion/removal speed. Since iterating means traversing pointers-to-pointers-to-pointers ad infinitum, this is going to take more work overall. Aside from the pointer traversal, two nodes may not be allocated anywhere near each other, reducing the effectiveness of CPU RAM caches.However, the amount of time required to insert or remove a node is constant, since it requires the same number of operations no matter the state of the list. (This does not take into account any work that must be done to actually locate the item to remove, or to traverse the list to find the insertion point!)
If your primary concern with your collection is testing if something is in the collection, you might consider a
HashSet<T>
instead. Addition of items to the set will be relatively fast, somewhere between insertion into a list and a linked list. Removal of items will again be relatively fast. But the real gain is in lookup time -- testing if aHashSet<T>
contains an item does not require iterating the entire list. On average it will perform faster than any list or linked list structure.However, a
HashSet<T>
cannot contain equivalent items. If part of your requirements is that two items that are considered equal (by anObject.Equals(Object)
overload, or by implementingIEquatable<T>
) coexist independently in the collection, then you simply cannot use aHashSet<T>
. Also,HashSet<T>
does not guarantee insertion order, so you also can't use aHashSet<T>
if maintaining some sort of ordering is important.实现简单集合有两种基本方法:
连续数组对于您提到的操作具有性能劣势,因为集合的内存空间要么是预先分配的,要么是根据集合的内容分配的。因此,删除或插入需要移动许多数组元素以保持整个集合连续且顺序正确。
链接列表消除了这些问题,因为集合中的项目不需要连续存储在内存中。相反,每个元素都包含对一个或多个其他元素的引用。因此,当进行插入时,相关项目会在内存中的任何位置创建,并且只需要修改对集合中已存在的一个或两个元素的引用。
例如:
这当然是简化的。
LinkedList<>
的内部结构无疑比简单的单链表或双链表更复杂,但这就是基本结构。There are two basic ways to implement a simple collection:
Contiguous arrays have performance disadvantages for the operations you mentioned because the memory space of the collection is either preallocated or allocated based on the contents of the collection. Thus deletion or insertion requires moving many array elements to keep the entire collection contiguous and in the proper order.
Linked lists remove these issues because the items in the collection do not need to be stored in memory contiguously. Instead each element contains a reference to one or more of the other elements. Thus, when an insertion is made, the item in question is created anywhere in memory and only the references on one or two of the elements already in the collection need to be modified.
For example:
This is simplified of course. The internals of
LinkedList<>
are doubtless more complex than a simple singly or doubly linked list, but that is the basic structure.我认为某些集合类可能在内部使用数组以及链表或类似的东西。使用 System.Collections 命名空间中的集合而不是数组的好处是,您不需要花费任何额外的时间编写代码来执行更新操作。
数组总是更轻量级,如果您知道一些非常好的搜索算法,那么您甚至可以更有效地使用它们,但大多数时候您可以通过使用 System.Collections 中的类来避免重新发明轮子。这些类旨在帮助程序员避免编写已经编写和调整数百次的代码,因此您不太可能通过自己操作数组来获得显着的性能提升。
当您需要一个不需要太多添加、删除或编辑的静态集合时,那么也许是使用数组的好时机,因为它们不需要集合所需的额外内存。
I think that some collection classes might use arrays internally as well as linked lists or something similar. The benefit of using collections from the System.Collections namespace instead of arrays, is that you do not need to spend any extra time writing code to perform update operations.
Arrays will always be more lightweight, and if you know some very good search algorithms, then you might even be able to use them more efficiently, but most of the the time you can avoid reinventing the wheel by using classes from System.Collections. These classes are meant to help the programmer avoid writing code that has already been written and tuned hundreds of times, so it is unlikely that you'll get a significant performance boost by manipulating arrays yourself.
When you need a static collection that doesn't require much adding, removing or editing, then perhaps it is a good time to use an array, since they don't require the extra memory that collections do.