用极限(阈值)编辑距离

发布于 2025-02-08 10:07:23 字数 1958 浏览 1 评论 0原文

我在Java 8中的一个项目工作,我想以一种迭代的方式计算2个字符串的编辑距离(因此没有递归);该方法将被执行多次,因此我需要通过给定极限(阈值)进行改进,这意味着,如果来自2个字符串的编辑距离大于给定的数字 x ,则算法将是停了下来。

现在,我应该从哪里找到这样的方法?是否有包含这种方法的Java库(或从Maven提供的)?例如在弦乐中?

对于当前实施,我使用经典的编辑距离算法。

    static int min(int x, int y, int z)
{
    if (x <= y && x <= z)
        return x;
    if (y <= x && y <= z)
        return y;
    else
        return z;
}

static int editDistDP(String str1, String str2, int m,
                      int n)
{
    // Create a table to store results of subproblems
    int dp[][] = new int[m + 1][n + 1];

    // Fill d[][] in bottom up manner
    for (int i = 0; i <= m; i++) {
        for (int j = 0; j <= n; j++) {
            // If first string is empty, only option is
            // to insert all characters of second string
            if (i == 0)
                dp[i][j] = j; // Min. operations = j

            // If second string is empty, only option is
            // to remove all characters of second string
            else if (j == 0)
                dp[i][j] = i; // Min. operations = i

            // If last characters are same, ignore last
            // char and recur for remaining string
            else if (str1.charAt(i - 1)
                     == str2.charAt(j - 1))
                dp[i][j] = dp[i - 1][j - 1];

            // If the last character is different,
            // consider all possibilities and find the
            // minimum
            else
                dp[i][j] = 1
                           + min(dp[i][j - 1], // Insert
                                 dp[i - 1][j], // Remove
                                 dp[i - 1]
                                   [j - 1]); // Replace
        }
    }

    return dp[m][n];
}

另外,我发现 this 使用Levenstein距离并给定限制,但我不想更改如果没有必要的话,请给莱文斯坦

I work for a project in Java 8 and I want to compute the editing distance for 2 strings - in a iterative way (so without recursion); the method will be executed many times, so I need to improve it with a given limit (threshold), meaning that if the editing distance from 2 strings is greater than a given number x, the algorithm will be stopped.

Now, where from should I find a method like this? Is there a java library (or provided from maven) which contain such a method? e.g. in StringUtils?

For current implementation I use classic editing distance algorithm.

    static int min(int x, int y, int z)
{
    if (x <= y && x <= z)
        return x;
    if (y <= x && y <= z)
        return y;
    else
        return z;
}

static int editDistDP(String str1, String str2, int m,
                      int n)
{
    // Create a table to store results of subproblems
    int dp[][] = new int[m + 1][n + 1];

    // Fill d[][] in bottom up manner
    for (int i = 0; i <= m; i++) {
        for (int j = 0; j <= n; j++) {
            // If first string is empty, only option is
            // to insert all characters of second string
            if (i == 0)
                dp[i][j] = j; // Min. operations = j

            // If second string is empty, only option is
            // to remove all characters of second string
            else if (j == 0)
                dp[i][j] = i; // Min. operations = i

            // If last characters are same, ignore last
            // char and recur for remaining string
            else if (str1.charAt(i - 1)
                     == str2.charAt(j - 1))
                dp[i][j] = dp[i - 1][j - 1];

            // If the last character is different,
            // consider all possibilities and find the
            // minimum
            else
                dp[i][j] = 1
                           + min(dp[i][j - 1], // Insert
                                 dp[i - 1][j], // Remove
                                 dp[i - 1]
                                   [j - 1]); // Replace
        }
    }

    return dp[m][n];
}

Also, I find this which use Levenstein distance with a given limit, but I wouldn't want to change to Levenstein if it ins't necessary

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

欢你一世 2025-02-15 10:07:23

我认为 this apache commons正是您要寻找的,它具有具有最大距离阈值的构造函数。

伪代码:

LevenshteinDistance ld = new LevenshteinDistance(10);
Integer editDistanceWithThreshold = ld.apply('String1', 'String2')

I think this by Apache Commons is exactly what you are looking for which has a constructor with threshold for maximum distance.

Pseudocode:

LevenshteinDistance ld = new LevenshteinDistance(10);
Integer editDistanceWithThreshold = ld.apply('String1', 'String2')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文