如何创建HashSet>具有不同的元素?

发布于 2024-10-29 02:16:31 字数 748 浏览 2 评论 0 原文

我有一个包含多个整数列表的 HashSet - 即 HashSet>

为了保持唯一性,我目前必须做两件事: 1. 手动循环现有列表,使用 SequenceEquals 查找重复项。 2. 对各个列表进行排序,以便 SequenceEquals 当前起作用。

有更好的方法吗?是否有一个现有的 IEqualityComparer 可以提供给 HashSet,以便 HashSet.Add() 可以自动处理唯一性?

var hashSet = new HashSet<List<int>>();

for(/* some condition */)
{
    List<int> list = new List<int>();

    ...

    /* for eliminating duplicate lists */

    list.Sort();

    foreach(var set in hashSet)
    {
        if (list.SequenceEqual(set))
        {
            validPartition = false;
            break;
        }
    }

    if (validPartition)
           newHashSet.Add(list);
}

I have a HashSet that contains multiple lists of integers - i.e. HashSet<List<int>>

In order to maintain uniqueness I am currently having to do two things:
1. Manually loop though existing lists, looking for duplicates using SequenceEquals.
2. Sorting the individual lists so that SequenceEquals works currently.

Is there a better way to do this? Is there an existing IEqualityComparer that I can provide to the HashSet so that HashSet.Add() can automatically handle uniqueness?

var hashSet = new HashSet<List<int>>();

for(/* some condition */)
{
    List<int> list = new List<int>();

    ...

    /* for eliminating duplicate lists */

    list.Sort();

    foreach(var set in hashSet)
    {
        if (list.SequenceEqual(set))
        {
            validPartition = false;
            break;
        }
    }

    if (validPartition)
           newHashSet.Add(list);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

情感失落者 2024-11-05 02:16:31

下面是一个可能的比较器,它通过 IEnumerable 的元素进行比较。添加之前仍然需要手动排序。

人们可以将排序构建到比较器中,但我认为这不是一个明智的选择。添加列表的规范形式似乎更明智。

该代码仅适用于 .net 4,因为它利用了通用方差。如果您需要早期版本,则需要将 IEnumerable 替换为 List,或者为集合类型添加第二个泛型参数。

class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>>
{
    public bool Equals(IEnumerable<T> seq1,IEnumerable<T> seq2)
    {
        return seq1.SequenceEqual(seq2);
    }
    
    public int GetHashCode(IEnumerable<T> seq)
    {
        int hash = 1234567;
        foreach(T elem in seq)
            hash = unchecked(hash * 37 + elem.GetHashCode());
        return hash;
    }
}

void Main()
{
    var hashSet = new HashSet<List<int>>(new SequenceComparer<int>());

    List<int> test=new int[]{1,3,2}.ToList();
    test.Sort();
    hashSet.Add(test);

    List<int> test2=new int[]{3,2,1}.ToList();
    test2.Sort();       
    hashSet.Contains(test2).Dump();
}

Here is a possible comparer that compares an IEnumerable<T> by its elements. You still need to sort manually before adding.

One could build the sorting into the comparer, but I don't think that's a wise choice. Adding a canonical form of the list seems wiser.

This code will only work in .net 4 since it takes advantage of generic variance. If you need earlier versions you need to either replace IEnumerable with List, or add a second generic parameter for the collection type.

class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>>
{
    public bool Equals(IEnumerable<T> seq1,IEnumerable<T> seq2)
    {
        return seq1.SequenceEqual(seq2);
    }
    
    public int GetHashCode(IEnumerable<T> seq)
    {
        int hash = 1234567;
        foreach(T elem in seq)
            hash = unchecked(hash * 37 + elem.GetHashCode());
        return hash;
    }
}

void Main()
{
    var hashSet = new HashSet<List<int>>(new SequenceComparer<int>());

    List<int> test=new int[]{1,3,2}.ToList();
    test.Sort();
    hashSet.Add(test);

    List<int> test2=new int[]{3,2,1}.ToList();
    test2.Sort();       
    hashSet.Contains(test2).Dump();
}
浮世清欢 2024-11-05 02:16:31

这开始是错误的,它必须是 HashSet> 因为您不能允许列表更改并使设置谓词无效。当您将集合添加到集合中时,这允许您在 O(n) 中计算哈希码。如果所有哈希结果都相等,则进行 O(n) 测试来检查它是否已经在非常不常见的 O(n^2) 最坏情况的集合中。将计算出的哈希值与集合一起存储。

This starts off wrong, it has to be a HashSet<ReadOnlyCollection<>> because you cannot allow the lists to change and invalidate the set predicate. This then allows you to calculate a hash code in O(n) when you add the collection to the set. And an O(n) test to check if it is already in the set with a very uncommon O(n^2) worst case if all the hashes turn out to be equal. Store the computed hash with the collection.

怎言笑 2024-11-05 02:16:31

您不只使用数组有什么原因吗? int[] 会表现得更好。另外,我假设列表包含重复项,否则您只是使用集合而不会出现问题。

看起来,一旦将它们添加到 HashSet 中,它们的内容就不会发生(太大)变化。最终,您将不得不使用依赖于 SequenceEqual 的比较器。但您不必每次都这样做。相反,或者进行指数数量的序列比较(例如,随着哈希集的增长,对每个现有成员执行SequenceEqual)--如果您预先创建了一个好的哈希码,您可能必须做很多很少有这样的比较。虽然生成一个好的哈希码的开销可能与执行 SequenceEqual 大致相同,但您只需为每个列表执行一次。

因此,第一次对特定的 List 进行操作时,您应该根据有序的数字序列生成一个哈希值并将其缓存。然后下次比较列表时,可以使用缓存的值。我不确定你如何使用我脑海中的比较器(也许是静态字典?)来做到这一点——但你可以实现 List 包装器来轻松完成此操作。

这是一个基本想法。您需要小心确保它不脆弱(例如,确保在成员更改时无效任何缓存的哈希代码),但看起来这不会成为您使用方式的典型情况这。

public class FasterComparingList<T>: IList<T>, IList, ... 
    /// whatever you need to implement
{
   // Implement your interfaces against InnerList
   // Any methods that change members of the list need to
   // set _LongHash=null to force it to be regenerated
   public List<T> InnerList { ... lazy load a List }
   public int GetHashCode()
   {
       if (_LongHash==null) {
           _LongHash=GetLongHash();
       }
       return (int)_LongHash;
   }
   private int? _LongHash=null;
   public bool Equals(FasterComparingList<T> list)
   {
       if (InnerList.Count==list.Count) {
           return true;
       }
       // you could also cache the sorted state and skip this if a list hasn't
       // changed since the last sort
       // not sure if native `List` does
       list.Sort();
       InnerList.Sort();
       return InnerList.SequenceEqual(list);
   }
   protected int GetLongHash()
   {
       return .....
       // something to create a reasonably good hash code -- which depends on the 
       // data. Adding all the numbers is probably fine, even if it fails a couple 
       // percent of the time you're still orders of magnitude ahead of sequence
       // compare each time
   } 
}

如果列表添加后不会更改,那么这应该非常快。即使在列表可能频繁更改的情况下,创建新哈希码的时间也可能与进行序列比较的时间相差不大(甚至更长)。

Is there a reason you aren't just using an array? int[] will perform better. Also I assume the lists contain duplicates, otherwise you'd just be using sets and not have a problem.

It appears that their contents won't change (much) once they've been added to the HashSet. At the end of the day, you are going to have to use a comparer that falls back on SequenceEqual. But you don't have to do it every single time. Instead or doing an exponential number of sequence compares (e.g. -- as the hashset grows, doing a SequenceEqual against each existing member) -- if you create a good hashcode up front, you may have to do very few such compares. While the overhead of generating a good hashcode is probably about the same as doing a SequenceEqual you're only doing it a single time for each list.

So, the first time you operate on a particular List<int>, you should generate a hash based on the ordered sequence of numbers and cache it. Then the next time the list is compared, the cached value can be used. I'm not sure how you might do this with a comparer off the top of my head (maybe a static dictionary?) -- but you could implement List wrapper that does this easily.

Here's a basic idea. You'd need to be careful to ensure that it isn't brittle (e.g. make sure you void any cached hash code when members change) but it doesn't look like that's going to be a typical situation for the way you're using this.

public class FasterComparingList<T>: IList<T>, IList, ... 
    /// whatever you need to implement
{
   // Implement your interfaces against InnerList
   // Any methods that change members of the list need to
   // set _LongHash=null to force it to be regenerated
   public List<T> InnerList { ... lazy load a List }
   public int GetHashCode()
   {
       if (_LongHash==null) {
           _LongHash=GetLongHash();
       }
       return (int)_LongHash;
   }
   private int? _LongHash=null;
   public bool Equals(FasterComparingList<T> list)
   {
       if (InnerList.Count==list.Count) {
           return true;
       }
       // you could also cache the sorted state and skip this if a list hasn't
       // changed since the last sort
       // not sure if native `List` does
       list.Sort();
       InnerList.Sort();
       return InnerList.SequenceEqual(list);
   }
   protected int GetLongHash()
   {
       return .....
       // something to create a reasonably good hash code -- which depends on the 
       // data. Adding all the numbers is probably fine, even if it fails a couple 
       // percent of the time you're still orders of magnitude ahead of sequence
       // compare each time
   } 
}

If the lists won't change once added, this should be very fast. Even in situations where the lists could change frequently, the time to create a new hash code is not likely very different (if even greater at all) than doing a sequence compare.

皓月长歌 2024-11-05 02:16:31

如果您没有指定 IEQualityComparer,则将使用默认类型,因此我认为您需要做的是创建自己的 IEQualityComparer 实现,并将其传递给 HashSet 的构造函数。 这是一个很好的示例

If you don't specify an IEQualityComparer, then the types default will be used, so I think what you'll need to do is create your own implementation of IEQualityComparer, and pass that to the constructor of your HashSet. Here is a good example.

谁的新欢旧爱 2024-11-05 02:16:31

在比较列表的哈希集时,您始终有一个选择,即不比较每个元素,而是对列表进行排序并使用逗号将它们连接起来,然后比较生成的字符串。

因此,在这种情况下,当您创建自定义比较器而不是迭代元素并计算自定义哈希函数时,您可以应用此逻辑。

When comparing hashsets of lists one option you always have is that instead of comparing each element, you sort lists and join them using a comma and compare generated strings.

So, in this case, when you create custom comparer instead of iterating over elements and calculating custom hash function, you can apply this logic.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文