快速字符串查找的最佳集合
我需要一个字符串列表以及一种快速确定该列表中是否包含字符串的方法。
为了提高查找速度,我考虑了 SortedList
和 Dictionary
;然而,当我只需要一个字符串
时,两者都可以使用KeyValuePair
。
我知道我可以使用 KeyValuePair
并简单地忽略 Value
部分。但我确实更喜欢高效,只是想知道是否有更适合我要求的集合。
I need a list of strings and a way to quickly determine if a string is contained within that list.
To enhance lookup speed, I considered SortedList
and Dictionary
; however, both work with KeyValuePair
s when all I need is a single string
.
I know I could use a KeyValuePair
and simply ignore the Value
portion. But I do prefer to be efficient and am just wondering if there is a collection better suited to my requirements.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果您使用的是 .NET 3.5 或更高版本,请使用
HashSet
。如果做不到这一点,
Dictionary
(或者您想要的TValue
类型参数的任何类型)将比SortedList
更快,如果你有很多条目 - 后者将使用二分搜索,因此它将是 O(log n) 查找,而不是 O(1)。If you're on .NET 3.5 or higher, use
HashSet<String>
.Failing that, a
Dictionary<string, byte>
(or whatever type you want for theTValue
type parameter) would be faster than aSortedList
if you have a lot of entries - the latter will use a binary search, so it'll be O(log n) lookup, instead of O(1).如果您只想知道字符串是否在集合中,请使用
HashSet
If you just want to know if a string is in the set use
HashSet<string>
这听起来像是 Per MSDN 的工作
:Contains 函数有 O( 1)复杂性。
但您应该注意,添加时不会出现重复错误。
This sounds like a job for
Per MSDN: The Contains function has O(1) complexity.
But you should be aware that it does not give an error for duplicates when adding.
HashSet
就像一个 < code>Dictionary,但只有键。HashSet<string>
is like aDictionary
, but with only keys.如果您想滚动自己的数据结构,请使用 Trie。
http://en.wikipedia.org/wiki/Trie
最坏的情况是字符串是存在:O(字符串长度)
If you feel like rolling your own data structure, use a Trie.
http://en.wikipedia.org/wiki/Trie
worst-case is if the string is present: O(length of string)
我知道这个答案对本次聚会来说有点晚了,但我遇到了我们的系统运行缓慢的问题。经过分析后,我们发现我们的数据结构结构化方式发生了大量字符串查找。
因此我们做了一些研究,遇到了这些基准测试,做了我们自己的测试,现在已经改用 SortedList。
尽管事实证明字典速度更快,但我们需要重构的代码更少,而且性能的提高对我们来说已经足够好了。
无论如何,想分享该网站,以防其他人遇到类似的问题。它们在数据结构之间进行比较,其中您要查找的字符串是“键”(如哈希表、字典等)或“值”(列表、数组或字典等),这就是我们的字符串所在的位置存储。
I know this answer is a bit late to this party, but I was running into an issue where our systems were running slow. After profiling we found out there was a LOT of string lookups happening with the way we had our data structures structured.
So we did some research, came across these benchmarks, did our own tests, and have switched over to using SortedList now.
Even though a Dictionary proved to be faster, it was less code we had to refactor, and the performance increase was good enough for us.
Anyway, wanted to share the website in case other people are running into similar issues. They do comparisons between data structures where the string you're looking for is a "key" (like HashTable, Dictionary, etc) or in a "value" (List, Array, or in a Dictionary, etc) which is where ours are stored.
我知道这个问题已经很老了,但我只需要解决同样的问题,只是针对一小部分字符串(2 到 4 之间)。
就我而言,我实际上对字符串数组进行了手动查找,结果比
HashSet
(我对其进行了基准测试)要快得多。请注意,它比仅适用于小型数组的哈希集更好!
编辑:仅适用于手动
for
循环,不使用 LINQ,注释中详细信息I know the question is old as hell, but I just had to solve the same problem, only for a very small set of strings(between 2 and 4).
In my case, I actually used manual lookup over an array of strings which turned up to be much faster than
HashSet<string>
(I benchmarked it).Note, that it is better than hash set for only for tiny arrays!
EDIT: works only with a manual
for
loop, do not use LINQ, details in comments