检查整数集合是否存在的最有效方法是什么?
我有一个发送到我的网络服务的大量整数列表。我们的业务规则规定这些值必须是唯一的。确定是否存在重复项的最有效方法是什么?我不需要知道这些值,我只需要知道其中 2 个值是否相等。
起初我正在考虑使用整数的通用列表和 list.Exists() 方法,但这是 O(n) 的;
然后我正在考虑使用 Dictionary 和 ContainsKey 方法。但是,我只需要键,不需要值。我认为这也是一个线性搜索。
是否有更好的数据类型可用于查找列表中的唯一性?或者我陷入了线性搜索?
I have a large list of integers that are sent to my webservice. Our business rules state that these values must be unique. What is the most performant way to figure out if there are any duplicates? I dont need to know the values, I only need to know if 2 of the values are equal.
At first I was thinking about using a Generic List of integers and the list.Exists() method, but this is of O(n);
Then I was thinking about using a Dictionary and the ContainsKey method. But, I only need the Keys, I do not need the values. And I think this is a linear search as well.
Is there a better datatype to use to find uniqueness within a list? Or am I stuck with a linear search?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用
HashSet
:HashSet
甚至公开一个构造函数接受IEnumerable
。通过将List
传递给HashSet的
构造函数,您最终将获得对新HashSet
的引用> 将包含与原始List
不同的项目序列。Use a
HashSet<T>
:HashSet<T>
even exposes a constructor that accepts anIEnumerable<T>
. By passing yourList<T>
to theHashSet<T>'s
constructor you will end up with a reference to a newHashSet<T>
that will contain a distinct sequence of items from your originalList<T>
.听起来像是 哈希集 的工作...
Sounds like a job for a Hashset...
如果您使用的是框架 3.5,则可以使用
HashSet
集合。否则,最好的选择是
字典
。每件物品的价值都会被浪费,但这会给你最好的表现。如果您在将项目添加到 HashSet/Dictionary 时检查重复项,而不是事后对它们进行计数,那么在存在重复项的情况下,您将获得比 O(n) 更好的性能,因为您不必在找到第一个重复项后继续查找。
If you are using framework 3.5 you can use the
HashSet
collection.Otherwise the best option is the
Dictionary
. The value of each item will be wasted, but that will give you the best performance.If you check for duplicates while you add the items to the HashSet/Dictionary instead of counting them afterwards, you get better performance than O(n) in case there are duplicates, as you don't have to continue looking after finding the first duplicate.
如果数字集稀疏,那么正如其他人建议的那样使用 HashSet。
但是,如果数字集大部分按顺序排列,偶尔有间隙,则将数字集存储为排序数组或开始、结束对的二叉树会更好。然后,您可以搜索查找具有小于搜索关键字的最大开始值的对,并与该对的结束值进行比较,以查看它是否存在于集合中。
If the set of numbers is sparse, then as others suggest use a HashSet.
But if the set of numbers is mostly in sequence with occasional gaps, it would be a lot better if you stored the number set as a sorted array or binary tree of begin,end pairs. Then you could search to find the pair with the largest begin value that was smaller than your search key and compare with that pair's end value to see if it exists in the set.
做什么:
我想知道它的性能。我认为它会和 O(n) 一样好,但代码更少并且仍然易于阅读。
What about doing:
I wonder about the performance of this. I think it would be as good as O(n) but with less code and still easily readable.