linq except 和自定义 IEqualityComparer

发布于 2024-08-25 20:45:18 字数 614 浏览 8 评论 0原文

我正在尝试在两个字符串列表上实现自定义比较器,并使用 .Except() linq 方法来获取不属于列表之一的字符串。我做自定义比较器的原因是因为我需要进行“模糊”比较,即一个列表上的一个字符串可以嵌入另一个列表上的字符串内。

我做了以下比较器

public class ItemFuzzyMatchComparer : IEqualityComparer<string>
{
    bool IEqualityComparer<string>.Equals(string x, string y)
    {
        return (x.Contains(y) || y.Contains(x));
    }

    int IEqualityComparer<string>.GetHashCode(string obj)
    {
        if (Object.ReferenceEquals(obj, null))
            return 0;
        return obj.GetHashCode();
    }
}

当我调试时,唯一遇到的断点是在 GetHashCode() 方法中。 Equals() 永远不会被触及。有什么想法吗?

I'm trying to implement a custom comparer on two lists of strings and use the .Except() linq method to get those that aren't one one of the lists. The reason I'm doing a custom comparer is because I need to do a "fuzzy" compare, i.e. one string on one list could be embedded inside a string on the other list.

I've made the following comparer

public class ItemFuzzyMatchComparer : IEqualityComparer<string>
{
    bool IEqualityComparer<string>.Equals(string x, string y)
    {
        return (x.Contains(y) || y.Contains(x));
    }

    int IEqualityComparer<string>.GetHashCode(string obj)
    {
        if (Object.ReferenceEquals(obj, null))
            return 0;
        return obj.GetHashCode();
    }
}

When I debug, the only breakpoint that hits is in the GetHashCode() method. The Equals() never gets touched. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

尤怨 2024-09-01 20:45:18

如果返回的所有哈希码都不同,则永远不需要比较是否相等。

基本上问题是你的哈希和相等概念非常不同。我不完全确定你会如何纠正这个问题,但在你这样做之前它肯定不会起作用。

您需要确保如果 Equals(a, b) 返回 true,则 GetHashCode(a) == GetHashCode(b)。 (相反的情况不一定成立 - 哈希冲突是可以接受的,尽管显然您希望冲突尽可能少。)

If all the hash codes returned are different, it never needs to compare for equality.

Basically the problem is that your hash and equality concepts are very different. I'm not entirely sure how you'd correct this, but until you've done so it certainly won't work.

You need to make sure that if Equals(a, b) returns true, then GetHashCode(a) == GetHashCode(b). (The reverse doesn't have to be true - hash collisions are acceptable, although obviously you want to have as few of them as possible.)

徒留西风 2024-09-01 20:45:18

正如乔恩指出的,您需要确保两个字符串的哈希码相等(根据您的比较规则)。不幸的是,这是相当困难的。

为了演示该问题,Equals(str, "") 对所有字符串 str 返回 true,这本质上意味着所有字符串都等于空字符串,因此,所有字符串必须具有与空字符串相同的哈希码。因此,正确实现 IEqualityComparer 的唯一方法是始终返回相同的哈希码:

public class ItemFuzzyMatchComparer : IEqualityComparer<string>  { 
  bool IEqualityComparer<string>.Equals(string x, string y)  { 
    return (x.Contains(y) || y.Contains(x)); 
  }  
  int IEqualityComparer<string>.GetHashCode(string obj)  { 
    if (Object.ReferenceEquals(obj, null)) return 0; 
    return 1; 
  } 
}

然后您可以使用 Except 方法,它会正确运行。唯一的问题是您(可能)会得到一个相当低效的实现,因此如果您需要更好的性能,您可能必须实现自己的 Except。但是,我不确定 LINQ 实现的效率有多低,并且我不确定是否实际上可以为您的比较规则提供任何有效的实现。

As Jon pointed out, you need to make sure that the hash-code of two strings that are equal (according to your comparison rule). This is unfortunatelly quite difficult.

To demonstrate the problem, Equals(str, "") returns true for all strings str, which essentially means that all strings are equal to an empty string and as a result, all strings must have the same hash-code as an empty string. Therefore, the only way to implement IEqualityComparer correctly is to return always the same hash-code:

public class ItemFuzzyMatchComparer : IEqualityComparer<string>  { 
  bool IEqualityComparer<string>.Equals(string x, string y)  { 
    return (x.Contains(y) || y.Contains(x)); 
  }  
  int IEqualityComparer<string>.GetHashCode(string obj)  { 
    if (Object.ReferenceEquals(obj, null)) return 0; 
    return 1; 
  } 
}

Then you can use the Except method and it will behave correctly. The only problem is that you'll (probably) get a pretty inefficient implementation, so if you needed better performance, you may have to implement your own Except. However, I'm not exactly sure how inefficient the LINQ implementation will be and I'm not sure if it is actually possible to have any efficient implementation for your comparison rule.

无声无音无过去 2024-09-01 20:45:18

也许这个问题可以在没有 IEqualityComparer 接口实现的情况下得到解决。 Jon 和 Thomas 关于实现该接口有很好的观点,而平等似乎并不能定义您的问题。根据您的描述,我认为您可以在比较期间不使用 except 扩展来执行此操作。相反,首先获取匹配项,然后执行 except。看看这是否适合您:

 List<String> listOne = new List<string>(){"hard", "fun", "code", "rocks"};
 List<String> listTwo = new List<string>(){"fund", "ode", "ard"};

 var fuzzyMatchList = from str in listOne
                      from sr2 in listTwo
                      where str.Contains(sr2) || sr2.Contains(str)
                      select str;
 var exceptList = listOne.Except(fuzzyMatchList);

Maybe this problem could be solved without the IEqualityComparer interface implementation. Jon and Thomas have good points about implementing that interface, and equality doesn't seem to define your problem. From your description, I think you could do this without using the Except extension during the compare. Instead, get the matches first, then do the Except. See if this does the job for you:

 List<String> listOne = new List<string>(){"hard", "fun", "code", "rocks"};
 List<String> listTwo = new List<string>(){"fund", "ode", "ard"};

 var fuzzyMatchList = from str in listOne
                      from sr2 in listTwo
                      where str.Contains(sr2) || sr2.Contains(str)
                      select str;
 var exceptList = listOne.Except(fuzzyMatchList);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文