什么 .NET 字典支持“查找最近的键”?手术?
我正在将一些 C++ 代码转换为 C#,它调用 std::map::lower_bound(k) 来查找映射中键等于或大于 k 的条目。然而,我没有看到任何方法可以用 .NET 的 SortedDictionary 做同样的事情。我怀疑我可以使用 SortedList 实现解决方法,但不幸的是 SortedList 太慢(插入和删除键的 O(n) )。我能做些什么?
注意:我找到了一种解决方法,可以利用我的特定场景...具体来说,我的键是从 0 开始的密集整数,所以我使用了 List
I'm converting some C++ code to C# and it calls std::map::lower_bound(k) to find an entry in the map whose key is equal to or greater than k. However, I don't see any way to do the same thing with .NET's SortedDictionary. I suspect I could implement a workaround using SortedList, but unfortunately SortedList is too slow (O(n) for inserting and deleting keys). What can I do?
Note: I found a workaround using that takes advantage of my particular scenario... Specifically, my keys are a dense population of integers starting at just over 0, so I used a List<TValue> as my dictionary with the list index serving as the key, and searching for a key equal or greater than k can be done in only a few loop iterations. But it would still be nice to see the original question answered.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我开发了几个支持“查找下一个更高的键”和“查找下一个更低的键”操作的集合类。
首先,我制作了一组 Compact Patricia Trie 集合。这些是旨在最小化内存使用的集合/字典;它们对于大量 URL 和某些其他类型的数据特别有效。它们只能部分解决问题,因为仅支持某些类型的键,即
byte[]
、string
和所有原始整数类型 (Int8..UInt64)。此外,字符串排序区分大小写。 NuGet 包:Loyc.Utilities发布此答案后,我制作了更多排序的数据结构来一般解决此问题:和
BList
,BDictionary
, <代码>BMultiMapSparseAList
。详情请看我的第二个回答。I have developed several collection classes that support "find next higher key" and "find next lower key" operations.
First I made a set of Compact Patricia Trie collections. These are sets/dictionaries designed to minimize memory usage; they are especially efficient for large collections of URLs and certain other kinds of data. They only partly solve the problem, because only certain kinds of keys are supported, namely
byte[]
,string
, and all primitive integer types (Int8..UInt64). Also, string sorting is case-sensitive. NuGet package: Loyc.UtilitiesAfter publishing this answer, I made more sorted data structures that solve this problem generally:
BList<T>
,BDictionary<K,V>
,BMultiMap<K,V>
andSparseAList<T>
. See my second answer for details.问题在于,字典/哈希表被设计为根据输入值到达唯一的内存位置,因此您需要一个数据结构,该数据结构旨在容纳与您存储的每个值相关的范围,同时时间正确更新每个间隔
我认为跳过列表(或平衡二叉树)可以帮助你。尽管它们不能以 O(n) 的速度执行查找,但它们可以以对数方式执行查找,并且仍然比树更快。
我知道这不是一个正确的答案,因为我不能说 .NET BCL 已经包含这样一个类,不幸的是您必须自己实现一个类,或者找到一个为您支持它的第 3 方程序集。不过,CodeProject 这里似乎有一个很好的例子。
The problem is that a dictionary/hash table is designed to arrive at a unique memory location based on an input value, so you'll need a data structure that is designed to accommodate a range related to each value you store, and at the same time update each interval correctly
I think skip lists (or balanced binary trees) can help you. Although they cannot perform lookups in O(n), they can do logarithmically and still faster than trees.
I know this is not a proper answer since I cannot say that the .NET BCL already contains such a class, you'll unfortunately have to implement one yourself, or find a 3rd party assembly that supports it for you. There seems to be a nice example over at The CodeProject here, though.
你可以试试我下面写的代码。它使用二分搜索,因此假设列表/数组是预先排序的。
You can try the code i wrote below. it using binary search, therefore assuming the list/array is pre-sorted.
我创建了几个为任何数据类型提供此功能的数据结构:
BList
(排序列表)、BDictionary
(一个字典,其项目为按键排序)和BMultiMap
(一种字典,其中多个值可以与一个键关联)。有关详细信息,请参阅本文。这些数据结构中的每一个都提供FindLowerBound()
和FindUpperBound()
方法,其工作方式类似于 C++ 的lower_bound
和upper_bound
。在内部,这些集合类似于 B+ 树,因此它们具有良好的性能和较低的内存用法;BDictionary<,>
通常比标准SortedDictionary<,>
使用的内存少约 44%(而标准SortedDictionary<,>
使用的内存平均比略少) Dictionary<,>
),假设 64 位键和 64 位值。我还制作了一个“稀疏”集合
SparseAList
,它与BDictionary
类似,只是您可以在任何地方插入和删除“空白空间”在集合中(空空间不消耗任何内存)。有关详细信息,请参阅本文。所有这些集合都位于 Loyc.Collections NuGet 包中。
I created several data structures that provide this functionality for any data type:
BList<T>
(a sorted list),BDictionary<K,V>
(a dictionary whose items are sorted by key), andBMultiMap<K,V>
(a dictionary in which more than one value can be associated with a key). See this article for details. Each of these data structures provideFindLowerBound()
andFindUpperBound()
methods that work like C++'slower_bound
andupper_bound
. Internally, these collections are similar to B+ trees, so they have good performance and low memory usage;BDictionary<,>
typically uses about 44% less memory than a standardSortedDictionary<,>
(which in turn uses, on average, slightly less memory thanDictionary<,>
), assuming 64-bit keys and 64-bit values.I also made a "sparse" collection,
SparseAList<T>
, which is similar toBDictionary<int,T>
except that you can insert and remove "empty space" anywhere in the collection (empty space does not consume any memory). See this article for details.All of these collections are in the Loyc.Collections NuGet package.
找到最接近 K:
或更快:
find nearest to K:
or much faster:
基本框架中没有二叉搜索树集合实现,因此您必须构建一个或找到一个实现。正如您所指出的,SortedList 在搜索方面最接近,但在插入/删除方面速度较慢(由于其底层数组实现)。
There isn't a binary search tree collection implementation in the base framework, so you'll either have to build one or find an implementation. As you noted, SortedList is closest in terms of searching but is slower (due to its underlying array implementation) for insertion/deletion.
我认为有关 SortedList< 的问题存在错误/em> 复杂性。
SortedList 的插入新项目。如果您事先知道容量,那么在最坏的情况下可以在 O(Log(n)) 内完成。
I think there's a mistake in the question about SortedList complexity.
SortedList has O(log(n)) amortized complexity for inserting new item. If you know in advance the capacity it can be done in O(Log(n)) in the worst case.
您可以使用以下扩展方法对
SortedSet
执行此操作:You can do this for
SortedSet<T>
with following extension methods: