比较字符串相似度

发布于 2024-11-27 16:57:41 字数 277 浏览 0 评论 0原文

比较两个字符串以查看它们的相似程度的最佳方法是什么?

示例:

My String
My String With Extra Words

或者

My String
My Slightly Different String

我正在寻找的是确定每对中的第一个和第二个字符串的相似程度。我想对比较进行评分,如果字符串足够相似,我会认为它们是匹配对。

在 C# 中有没有好的方法可以做到这一点?

What is the best way to compare two strings to see how similar they are?

Examples:

My String
My String With Extra Words

Or

My String
My Slightly Different String

What I am looking for is to determine how similar the first and second string in each pair is. I would like to score the comparison and if the strings are similar enough, I would consider them a matching pair.

Is there a good way to do this in C#?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

温折酒 2024-12-04 16:57:41
static class LevenshteinDistance
{
    public static int Compute(string s, string t)
    {
        if (string.IsNullOrEmpty(s))
        {
            if (string.IsNullOrEmpty(t))
                return 0;
            return t.Length;
        }

        if (string.IsNullOrEmpty(t))
        {
            return s.Length;
        }

        int n = s.Length;
        int m = t.Length;
        int[,] d = new int[n + 1, m + 1];

        // initialize the top and right of the table to 0, 1, 2, ...
        for (int i = 0; i <= n; d[i, 0] = i++);
        for (int j = 1; j <= m; d[0, j] = j++);

        for (int i = 1; i <= n; i++)
        {
            for (int j = 1; j <= m; j++)
            {
                int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
                int min1 = d[i - 1, j] + 1;
                int min2 = d[i, j - 1] + 1;
                int min3 = d[i - 1, j - 1] + cost;
                d[i, j] = Math.Min(Math.Min(min1, min2), min3);
            }
        }
        return d[n, m];
    }
}
static class LevenshteinDistance
{
    public static int Compute(string s, string t)
    {
        if (string.IsNullOrEmpty(s))
        {
            if (string.IsNullOrEmpty(t))
                return 0;
            return t.Length;
        }

        if (string.IsNullOrEmpty(t))
        {
            return s.Length;
        }

        int n = s.Length;
        int m = t.Length;
        int[,] d = new int[n + 1, m + 1];

        // initialize the top and right of the table to 0, 1, 2, ...
        for (int i = 0; i <= n; d[i, 0] = i++);
        for (int j = 1; j <= m; d[0, j] = j++);

        for (int i = 1; i <= n; i++)
        {
            for (int j = 1; j <= m; j++)
            {
                int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
                int min1 = d[i - 1, j] + 1;
                int min2 = d[i, j - 1] + 1;
                int min3 = d[i - 1, j - 1] + cost;
                d[i, j] = Math.Min(Math.Min(min1, min2), min3);
            }
        }
        return d[n, m];
    }
}
心奴独伤 2024-12-04 16:57:41

如果有人想知道 @FrankSchwieterman 发布的 C# 等价物是什么:

public static int GetDamerauLevenshteinDistance(string s, string t)
{
    if (string.IsNullOrEmpty(s))
    {
        throw new ArgumentNullException(s, "String Cannot Be Null Or Empty");
    }

    if (string.IsNullOrEmpty(t))
    {
        throw new ArgumentNullException(t, "String Cannot Be Null Or Empty");
    }

    int n = s.Length; // length of s
    int m = t.Length; // length of t

    if (n == 0)
    {
        return m;
    }

    if (m == 0)
    {
        return n;
    }

    int[] p = new int[n + 1]; //'previous' cost array, horizontally
    int[] d = new int[n + 1]; // cost array, horizontally

    // indexes into strings s and t
    int i; // iterates through s
    int j; // iterates through t

    for (i = 0; i <= n; i++)
    {
        p[i] = i;
    }

    for (j = 1; j <= m; j++)
    {
        char tJ = t[j - 1]; // jth character of t
        d[0] = j;

        for (i = 1; i <= n; i++)
        {
            int cost = s[i - 1] == tJ ? 0 : 1; // cost
            // minimum of cell to the left+1, to the top+1, diagonally left and up +cost                
            d[i] = Math.Min(Math.Min(d[i - 1] + 1, p[i] + 1), p[i - 1] + cost);
        }

        // copy current distance counts to 'previous row' distance counts
        int[] dPlaceholder = p; //placeholder to assist in swapping p and d
        p = d;
        d = dPlaceholder;
    }

    // our last action in the above loop was to switch d and p, so p now 
    // actually has the most recent cost counts
    return p[n];
}

If anyone was wondering what the C# equivalent of what @FrankSchwieterman posted is:

public static int GetDamerauLevenshteinDistance(string s, string t)
{
    if (string.IsNullOrEmpty(s))
    {
        throw new ArgumentNullException(s, "String Cannot Be Null Or Empty");
    }

    if (string.IsNullOrEmpty(t))
    {
        throw new ArgumentNullException(t, "String Cannot Be Null Or Empty");
    }

    int n = s.Length; // length of s
    int m = t.Length; // length of t

    if (n == 0)
    {
        return m;
    }

    if (m == 0)
    {
        return n;
    }

    int[] p = new int[n + 1]; //'previous' cost array, horizontally
    int[] d = new int[n + 1]; // cost array, horizontally

    // indexes into strings s and t
    int i; // iterates through s
    int j; // iterates through t

    for (i = 0; i <= n; i++)
    {
        p[i] = i;
    }

    for (j = 1; j <= m; j++)
    {
        char tJ = t[j - 1]; // jth character of t
        d[0] = j;

        for (i = 1; i <= n; i++)
        {
            int cost = s[i - 1] == tJ ? 0 : 1; // cost
            // minimum of cell to the left+1, to the top+1, diagonally left and up +cost                
            d[i] = Math.Min(Math.Min(d[i - 1] + 1, p[i] + 1), p[i - 1] + cost);
        }

        // copy current distance counts to 'previous row' distance counts
        int[] dPlaceholder = p; //placeholder to assist in swapping p and d
        p = d;
        d = dPlaceholder;
    }

    // our last action in the above loop was to switch d and p, so p now 
    // actually has the most recent cost counts
    return p[n];
}
塔塔猫 2024-12-04 16:57:41

我正在比较两个这样的句子

string[] vs = string1.Split(new char[] { ' ', '-', '/', '(', ')' },StringSplitOptions.RemoveEmptyEntries);
string[] vs1 = string2.Split(new char[] { ' ', '-', '/', '(', ')' }, StringSplitOptions.RemoveEmptyEntries);


vs.Intersect(vs1, StringComparer.OrdinalIgnoreCase).Count();

Intersect 给你一组相同的单词列表,我继续查看计数并说如果它大于 1,则这两个句子包含相似的单词。

I am comparing two sentences like this

string[] vs = string1.Split(new char[] { ' ', '-', '/', '(', ')' },StringSplitOptions.RemoveEmptyEntries);
string[] vs1 = string2.Split(new char[] { ' ', '-', '/', '(', ')' }, StringSplitOptions.RemoveEmptyEntries);


vs.Intersect(vs1, StringComparer.OrdinalIgnoreCase).Count();

Intersect gives you a set of identical word lists , I continue by looking at the count and saying if it is more than 1, these two sentences contain similar words.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文