带阈值过滤器的模糊匹配 C#
我需要实现某种这样的:
string textToSearch = "Extreme Golf: The Showdown";
string textToSearchFor = "Golf Extreme Showdown";
int fuzzyMatchScoreThreshold = 80; // One a 0 to 100 scale
bool searchSuccessful = IsFuzzyMatch(textToSearch, textToSearchFor, fuzzyMatchScoreThreshold);
if (searchSuccessful == true)
{
-- we have a match.
}
这是用 C# 编写的函数存根:
public bool IsFuzzyMatch (string textToSearch, string textToSearchFor, int fuzzyMatchScoreThreshold)
{
bool isMatch = false;
// do fuzzy logic here and set isMatch to true if successful match.
return isMatch;
}
但我不知道如何在 IsFuzzyMatch 方法中实现逻辑。 有什么想法吗?也许有一个现成的解决方案可以用于此目的?
I need to implement some kind of this:
string textToSearch = "Extreme Golf: The Showdown";
string textToSearchFor = "Golf Extreme Showdown";
int fuzzyMatchScoreThreshold = 80; // One a 0 to 100 scale
bool searchSuccessful = IsFuzzyMatch(textToSearch, textToSearchFor, fuzzyMatchScoreThreshold);
if (searchSuccessful == true)
{
-- we have a match.
}
Here's the function stub written in C#:
public bool IsFuzzyMatch (string textToSearch, string textToSearchFor, int fuzzyMatchScoreThreshold)
{
bool isMatch = false;
// do fuzzy logic here and set isMatch to true if successful match.
return isMatch;
}
But I have no any idea how to implement logic in IsFuzzyMatch method.
Any ideas? Perhaps there is a ready-made solution for this purpose?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我喜欢骰子系数、编辑距离、最长公共子序列,有时还喜欢双变音位的组合。前三个将为您提供阈值。我更喜欢以某种方式将它们结合起来。 YMMV。
我刚刚发布了一篇博客文章,其中每个函数都有一个 C# 实现,名为 在 C# 扩展中查找模糊字符串匹配的四个函数。
I like a combination of Dice Coeffiecient, Levenshtein Distance, Longest Common Subsequence, and at times the Double Metaphone. The first three will provide you a threshold value. I prefer to combine them in some way. YMMV.
I've just posted a blog post that has a C# implementation for each of these called Four Functions for Finding Fuzzy String Matches in C# Extensions.
您需要 Levenshtein 距离算法 来查找如何通过操作插入、删除和从一个字符串转到另一个字符串调整。您 fuzzyMatchScoreThreshold 是一个编辑距离,以简单的方式除以字符串的长度。
You need Levenshtein Distance Algorithm for find how to go from one string to another by operations insert, delete and modify. You fuzzyMatchScoreThreshold is a Levenshtein Distance divided to length of the string in simple way.