用极限(阈值)编辑距离
我在Java 8中的一个项目工作,我想以一种迭代的方式计算2个字符串的编辑距离(因此没有递归);该方法将被执行多次,因此我需要通过给定极限(阈值)进行改进,这意味着,如果来自2个字符串的编辑距离大于给定的数字 x ,则算法将是停了下来。
现在,我应该从哪里找到这样的方法?是否有包含这种方法的Java库(或从Maven提供的)?例如在弦乐中?
对于当前实施,我使用经典的编辑距离算法。
static int min(int x, int y, int z)
{
if (x <= y && x <= z)
return x;
if (y <= x && y <= z)
return y;
else
return z;
}
static int editDistDP(String str1, String str2, int m,
int n)
{
// Create a table to store results of subproblems
int dp[][] = new int[m + 1][n + 1];
// Fill d[][] in bottom up manner
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
// If first string is empty, only option is
// to insert all characters of second string
if (i == 0)
dp[i][j] = j; // Min. operations = j
// If second string is empty, only option is
// to remove all characters of second string
else if (j == 0)
dp[i][j] = i; // Min. operations = i
// If last characters are same, ignore last
// char and recur for remaining string
else if (str1.charAt(i - 1)
== str2.charAt(j - 1))
dp[i][j] = dp[i - 1][j - 1];
// If the last character is different,
// consider all possibilities and find the
// minimum
else
dp[i][j] = 1
+ min(dp[i][j - 1], // Insert
dp[i - 1][j], // Remove
dp[i - 1]
[j - 1]); // Replace
}
}
return dp[m][n];
}
另外,我发现 this 使用Levenstein距离并给定限制,但我不想更改如果没有必要的话,请给莱文斯坦
I work for a project in Java 8 and I want to compute the editing distance for 2 strings - in a iterative way (so without recursion); the method will be executed many times, so I need to improve it with a given limit (threshold), meaning that if the editing distance from 2 strings is greater than a given number x, the algorithm will be stopped.
Now, where from should I find a method like this? Is there a java library (or provided from maven) which contain such a method? e.g. in StringUtils?
For current implementation I use classic editing distance algorithm.
static int min(int x, int y, int z)
{
if (x <= y && x <= z)
return x;
if (y <= x && y <= z)
return y;
else
return z;
}
static int editDistDP(String str1, String str2, int m,
int n)
{
// Create a table to store results of subproblems
int dp[][] = new int[m + 1][n + 1];
// Fill d[][] in bottom up manner
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
// If first string is empty, only option is
// to insert all characters of second string
if (i == 0)
dp[i][j] = j; // Min. operations = j
// If second string is empty, only option is
// to remove all characters of second string
else if (j == 0)
dp[i][j] = i; // Min. operations = i
// If last characters are same, ignore last
// char and recur for remaining string
else if (str1.charAt(i - 1)
== str2.charAt(j - 1))
dp[i][j] = dp[i - 1][j - 1];
// If the last character is different,
// consider all possibilities and find the
// minimum
else
dp[i][j] = 1
+ min(dp[i][j - 1], // Insert
dp[i - 1][j], // Remove
dp[i - 1]
[j - 1]); // Replace
}
}
return dp[m][n];
}
Also, I find this which use Levenstein distance with a given limit, but I wouldn't want to change to Levenstein if it ins't necessary
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为 this apache commons正是您要寻找的,它具有具有最大距离阈值的构造函数。
伪代码:
I think this by Apache Commons is exactly what you are looking for which has a constructor with threshold for maximum distance.
Pseudocode: