如何创建HashSet>具有不同的元素?
我有一个包含多个整数列表的 HashSet - 即 HashSet
>
为了保持唯一性,我目前必须做两件事:
1. 手动循环现有列表,使用 SequenceEquals
查找重复项。
2. 对各个列表进行排序,以便 SequenceEquals
当前起作用。
有更好的方法吗?是否有一个现有的 IEqualityComparer 可以提供给 HashSet,以便 HashSet.Add()
可以自动处理唯一性?
var hashSet = new HashSet<List<int>>();
for(/* some condition */)
{
List<int> list = new List<int>();
...
/* for eliminating duplicate lists */
list.Sort();
foreach(var set in hashSet)
{
if (list.SequenceEqual(set))
{
validPartition = false;
break;
}
}
if (validPartition)
newHashSet.Add(list);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
下面是一个可能的比较器,它通过
IEnumerable
的元素进行比较。添加之前仍然需要手动排序。人们可以将排序构建到比较器中,但我认为这不是一个明智的选择。添加列表的规范形式似乎更明智。
该代码仅适用于 .net 4,因为它利用了通用方差。如果您需要早期版本,则需要将
IEnumerable
替换为List
,或者为集合类型添加第二个泛型参数。Here is a possible comparer that compares an
IEnumerable<T>
by its elements. You still need to sort manually before adding.One could build the sorting into the comparer, but I don't think that's a wise choice. Adding a canonical form of the list seems wiser.
This code will only work in .net 4 since it takes advantage of generic variance. If you need earlier versions you need to either replace
IEnumerable
withList
, or add a second generic parameter for the collection type.这开始是错误的,它必须是
HashSet>
因为您不能允许列表更改并使设置谓词无效。当您将集合添加到集合中时,这允许您在 O(n) 中计算哈希码。如果所有哈希结果都相等,则进行 O(n) 测试来检查它是否已经在非常不常见的 O(n^2) 最坏情况的集合中。将计算出的哈希值与集合一起存储。This starts off wrong, it has to be a
HashSet<ReadOnlyCollection<>>
because you cannot allow the lists to change and invalidate the set predicate. This then allows you to calculate a hash code in O(n) when you add the collection to the set. And an O(n) test to check if it is already in the set with a very uncommon O(n^2) worst case if all the hashes turn out to be equal. Store the computed hash with the collection.您不只使用数组有什么原因吗?
int[]
会表现得更好。另外,我假设列表包含重复项,否则您只是使用集合而不会出现问题。看起来,一旦将它们添加到
HashSet
中,它们的内容就不会发生(太大)变化。最终,您将不得不使用依赖于SequenceEqual
的比较器。但您不必每次都这样做。相反,或者进行指数数量的序列比较(例如,随着哈希集的增长,对每个现有成员执行SequenceEqual
)--如果您预先创建了一个好的哈希码,您可能必须做很多很少有这样的比较。虽然生成一个好的哈希码的开销可能与执行SequenceEqual
大致相同,但您只需为每个列表执行一次。因此,第一次对特定的
List
进行操作时,您应该根据有序的数字序列生成一个哈希值并将其缓存。然后下次比较列表时,可以使用缓存的值。我不确定你如何使用我脑海中的比较器(也许是静态字典?)来做到这一点——但你可以实现List
包装器来轻松完成此操作。这是一个基本想法。您需要小心确保它不脆弱(例如,确保在成员更改时无效任何缓存的哈希代码),但看起来这不会成为您使用方式的典型情况这。
如果列表添加后不会更改,那么这应该非常快。即使在列表可能频繁更改的情况下,创建新哈希码的时间也可能与进行序列比较的时间相差不大(甚至更长)。
Is there a reason you aren't just using an array?
int[]
will perform better. Also I assume the lists contain duplicates, otherwise you'd just be using sets and not have a problem.It appears that their contents won't change (much) once they've been added to the
HashSet
. At the end of the day, you are going to have to use a comparer that falls back onSequenceEqual
. But you don't have to do it every single time. Instead or doing an exponential number of sequence compares (e.g. -- as the hashset grows, doing aSequenceEqual
against each existing member) -- if you create a good hashcode up front, you may have to do very few such compares. While the overhead of generating a good hashcode is probably about the same as doing aSequenceEqual
you're only doing it a single time for each list.So, the first time you operate on a particular
List<int>
, you should generate a hash based on the ordered sequence of numbers and cache it. Then the next time the list is compared, the cached value can be used. I'm not sure how you might do this with a comparer off the top of my head (maybe a static dictionary?) -- but you could implementList
wrapper that does this easily.Here's a basic idea. You'd need to be careful to ensure that it isn't brittle (e.g. make sure you void any cached hash code when members change) but it doesn't look like that's going to be a typical situation for the way you're using this.
If the lists won't change once added, this should be very fast. Even in situations where the lists could change frequently, the time to create a new hash code is not likely very different (if even greater at all) than doing a sequence compare.
如果您没有指定 IEQualityComparer,则将使用默认类型,因此我认为您需要做的是创建自己的 IEQualityComparer 实现,并将其传递给 HashSet 的构造函数。 这是一个很好的示例。
If you don't specify an IEQualityComparer, then the types default will be used, so I think what you'll need to do is create your own implementation of IEQualityComparer, and pass that to the constructor of your HashSet. Here is a good example.
在比较列表的哈希集时,您始终有一个选择,即不比较每个元素,而是对列表进行排序并使用逗号将它们连接起来,然后比较生成的字符串。
因此,在这种情况下,当您创建自定义比较器而不是迭代元素并计算自定义哈希函数时,您可以应用此逻辑。
When comparing hashsets of lists one option you always have is that instead of comparing each element, you sort lists and join them using a comma and compare generated strings.
So, in this case, when you create custom comparer instead of iterating over elements and calculating custom hash function, you can apply this logic.