如何创建HashSet>具有不同的元素？

发布于 2024-10-29 02:16:31 字数 748 浏览 2 评论 0 原文

我有一个包含多个整数列表的 HashSet - 即 HashSet>

为了保持唯一性，我目前必须做两件事： 1. 手动循环现有列表，使用 SequenceEquals 查找重复项。 2. 对各个列表进行排序，以便 SequenceEquals 当前起作用。

有更好的方法吗？是否有一个现有的 IEqualityComparer 可以提供给 HashSet，以便 HashSet.Add() 可以自动处理唯一性？

var hashSet = new HashSet<List<int>>();

for(/* some condition */)
{
    List<int> list = new List<int>();

    ...

    /* for eliminating duplicate lists */

    list.Sort();

    foreach(var set in hashSet)
    {
        if (list.SequenceEqual(set))
        {
            validPartition = false;
            break;
        }
    }

    if (validPartition)
           newHashSet.Add(list);
}

原文

I have a HashSet that contains multiple lists of integers - i.e. HashSet<List<int>>

In order to maintain uniqueness I am currently having to do two things:
1. Manually loop though existing lists, looking for duplicates using SequenceEquals.
2. Sorting the individual lists so that SequenceEquals works currently.

Is there a better way to do this? Is there an existing IEqualityComparer that I can provide to the HashSet so that HashSet.Add() can automatically handle uniqueness?

var hashSet = new HashSet<List<int>>();

for(/* some condition */)
{
    List<int> list = new List<int>();

    ...

    /* for eliminating duplicate lists */

    list.Sort();

    foreach(var set in hashSet)
    {
        if (list.SequenceEqual(set))
        {
            validPartition = false;
            break;
        }
    }

    if (validPartition)
           newHashSet.Add(list);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情感失落者 2024-11-05 02:16:31

下面是一个可能的比较器，它通过 IEnumerable 的元素进行比较。添加之前仍然需要手动排序。

人们可以将排序构建到比较器中，但我认为这不是一个明智的选择。添加列表的规范形式似乎更明智。

该代码仅适用于 .net 4，因为它利用了通用方差。如果您需要早期版本，则需要将 IEnumerable 替换为 List，或者为集合类型添加第二个泛型参数。

class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>>
{
    public bool Equals(IEnumerable<T> seq1,IEnumerable<T> seq2)
    {
        return seq1.SequenceEqual(seq2);
    }
    
    public int GetHashCode(IEnumerable<T> seq)
    {
        int hash = 1234567;
        foreach(T elem in seq)
            hash = unchecked(hash * 37 + elem.GetHashCode());
        return hash;
    }
}

void Main()
{
    var hashSet = new HashSet<List<int>>(new SequenceComparer<int>());

    List<int> test=new int[]{1,3,2}.ToList();
    test.Sort();
    hashSet.Add(test);

    List<int> test2=new int[]{3,2,1}.ToList();
    test2.Sort();       
    hashSet.Contains(test2).Dump();
}

Here is a possible comparer that compares an IEnumerable<T> by its elements. You still need to sort manually before adding.

One could build the sorting into the comparer, but I don't think that's a wise choice. Adding a canonical form of the list seems wiser.

This code will only work in .net 4 since it takes advantage of generic variance. If you need earlier versions you need to either replace IEnumerable with List, or add a second generic parameter for the collection type.

class SequenceComparer<T>:IEqualityComparer<IEnumerable<T>>
{
    public bool Equals(IEnumerable<T> seq1,IEnumerable<T> seq2)
    {
        return seq1.SequenceEqual(seq2);
    }
    
    public int GetHashCode(IEnumerable<T> seq)
    {
        int hash = 1234567;
        foreach(T elem in seq)
            hash = unchecked(hash * 37 + elem.GetHashCode());
        return hash;
    }
}

void Main()
{
    var hashSet = new HashSet<List<int>>(new SequenceComparer<int>());

    List<int> test=new int[]{1,3,2}.ToList();
    test.Sort();
    hashSet.Add(test);

    List<int> test2=new int[]{3,2,1}.ToList();
    test2.Sort();       
    hashSet.Contains(test2).Dump();
}

回复收藏 0 原文

浮世清欢 2024-11-05 02:16:31

这开始是错误的，它必须是 HashSet> 因为您不能允许列表更改并使设置谓词无效。当您将集合添加到集合中时，这允许您在 O(n) 中计算哈希码。如果所有哈希结果都相等，则进行 O(n) 测试来检查它是否已经在非常不常见的 O(n^2) 最坏情况的集合中。将计算出的哈希值与集合一起存储。

回复收藏 0 原文

怎言笑 2024-11-05 02:16:31

您不只使用数组有什么原因吗？ int[] 会表现得更好。另外，我假设列表包含重复项，否则您只是使用集合而不会出现问题。

看起来，一旦将它们添加到 HashSet 中，它们的内容就不会发生（太大）变化。最终，您将不得不使用依赖于 SequenceEqual 的比较器。但您不必每次都这样做。相反，或者进行指数数量的序列比较（例如，随着哈希集的增长，对每个现有成员执行SequenceEqual）--如果您预先创建了一个好的哈希码，您可能必须做很多很少有这样的比较。虽然生成一个好的哈希码的开销可能与执行 SequenceEqual 大致相同，但您只需为每个列表执行一次。

因此，第一次对特定的 List 进行操作时，您应该根据有序的数字序列生成一个哈希值并将其缓存。然后下次比较列表时，可以使用缓存的值。我不确定你如何使用我脑海中的比较器（也许是静态字典？）来做到这一点——但你可以实现 List 包装器来轻松完成此操作。

这是一个基本想法。您需要小心确保它不脆弱（例如，确保在成员更改时无效任何缓存的哈希代码），但看起来这不会成为您使用方式的典型情况这。

public class FasterComparingList<T>: IList<T>, IList, ... 
    /// whatever you need to implement
{
   // Implement your interfaces against InnerList
   // Any methods that change members of the list need to
   // set _LongHash=null to force it to be regenerated
   public List<T> InnerList { ... lazy load a List }
   public int GetHashCode()
   {
       if (_LongHash==null) {
           _LongHash=GetLongHash();
       }
       return (int)_LongHash;
   }
   private int? _LongHash=null;
   public bool Equals(FasterComparingList<T> list)
   {
       if (InnerList.Count==list.Count) {
           return true;
       }
       // you could also cache the sorted state and skip this if a list hasn't
       // changed since the last sort
       // not sure if native `List` does
       list.Sort();
       InnerList.Sort();
       return InnerList.SequenceEqual(list);
   }
   protected int GetLongHash()
   {
       return .....
       // something to create a reasonably good hash code -- which depends on the 
       // data. Adding all the numbers is probably fine, even if it fails a couple 
       // percent of the time you're still orders of magnitude ahead of sequence
       // compare each time
   } 
}

如果列表添加后不会更改，那么这应该非常快。即使在列表可能频繁更改的情况下，创建新哈希码的时间也可能与进行序列比较的时间相差不大（甚至更长）。

Is there a reason you aren't just using an array? int[] will perform better. Also I assume the lists contain duplicates, otherwise you'd just be using sets and not have a problem.

It appears that their contents won't change (much) once they've been added to the HashSet. At the end of the day, you are going to have to use a comparer that falls back on SequenceEqual. But you don't have to do it every single time. Instead or doing an exponential number of sequence compares (e.g. -- as the hashset grows, doing a SequenceEqual against each existing member) -- if you create a good hashcode up front, you may have to do very few such compares. While the overhead of generating a good hashcode is probably about the same as doing a SequenceEqual you're only doing it a single time for each list.

So, the first time you operate on a particular List<int>, you should generate a hash based on the ordered sequence of numbers and cache it. Then the next time the list is compared, the cached value can be used. I'm not sure how you might do this with a comparer off the top of my head (maybe a static dictionary?) -- but you could implement List wrapper that does this easily.

Here's a basic idea. You'd need to be careful to ensure that it isn't brittle (e.g. make sure you void any cached hash code when members change) but it doesn't look like that's going to be a typical situation for the way you're using this.

public class FasterComparingList<T>: IList<T>, IList, ... 
    /// whatever you need to implement
{
   // Implement your interfaces against InnerList
   // Any methods that change members of the list need to
   // set _LongHash=null to force it to be regenerated
   public List<T> InnerList { ... lazy load a List }
   public int GetHashCode()
   {
       if (_LongHash==null) {
           _LongHash=GetLongHash();
       }
       return (int)_LongHash;
   }
   private int? _LongHash=null;
   public bool Equals(FasterComparingList<T> list)
   {
       if (InnerList.Count==list.Count) {
           return true;
       }
       // you could also cache the sorted state and skip this if a list hasn't
       // changed since the last sort
       // not sure if native `List` does
       list.Sort();
       InnerList.Sort();
       return InnerList.SequenceEqual(list);
   }
   protected int GetLongHash()
   {
       return .....
       // something to create a reasonably good hash code -- which depends on the 
       // data. Adding all the numbers is probably fine, even if it fails a couple 
       // percent of the time you're still orders of magnitude ahead of sequence
       // compare each time
   } 
}

If the lists won't change once added, this should be very fast. Even in situations where the lists could change frequently, the time to create a new hash code is not likely very different (if even greater at all) than doing a sequence compare.

回复收藏 0 原文