修改编辑距离函数来计算两组 xy 坐标之间的距离?

发布于 2024-08-18 00:35:27 字数 1830 浏览 12 评论 0原文

我一直在尝试修改 Levenshtein Distance 函数,以便它可以找到两条线或 xy 坐标集之间的距离(换句话说,是线的相似或不同,而不是它们的几何距离)。不过我遇到了一些问题。我知道如何使用上面的值来获取删除成本,以及如何使用左侧的值来获取添加成本,但是在替换过程中,我尝试使用欧氏距离,但它对我不起作用。

如果你能指出我做错了什么,那就太好了。

这是 javascript 中的相关代码:

padlock.dtw = {
    _deletionCost: 1,
    _insertionCost: 1,
    levenshtein: function(a,b){
        var l1 = a.length, l2 = b.length;
        if (Math.min(l1, l2) === 0) {
            return Math.max(l1, l2);
        }
        var i = 0, j = 0, d = [];
        for (i = 0 ; i <= l1 ; i++) {
            d[i] = [];
            d[i][0] = i;
        }
        for (j = 0 ; j <= l2 ; j++) {
            d[0][j] = j;
        }
        for (i = 1 ; i <= l1 ; i++) {
            for (j = 1 ; j <= l2 ; j++) {
                d[i][j] = Math.min(
                    d[i - 1][j] + this._deletionCost, /* deletion */
                    d[i][j - 1] + this._insertionCost, /* addition */
                    d[i - 1][j - 1] + (a[i - 1] === b[j - 1] ? 0 : this.euclideanDistance(a[i-1], b[j-1])) /* substitution, use euchlidean distance as cost */
                );
            }
        }
        this._debugPrintMatrix(d);
        return d[l1][l2];
    },
    euclideanDistance: function(a, b){
        var xd = a[0]-b[0];
        var yd = a[1]-b[1];
        return Math.abs(Math.sqrt(Math.pow(xd, 2) + Math.pow(yd, 2)));
    },
    _debugPrintMatrix: function(m){
        for(var i=0;i<m.length;i++){
            console.log.apply(this, m[i]);
        }
    }
}

示例输出:

>>> padlock.dtw.levenshtein( [ [1,1], [0,9], [3,3], [4,4] ], [ [1,1], [2,2], [3,3], [4,4] ] )

Distance Matrix:
0 1 2                 3 4
1 0 1                 2 3
2 1 2                 3 4
3 2 2.414213562373095 2 3
4 3 3.414213562373095 3 2

Final Distance: 2

I've been trying to work on modifying a Levenshtein Distance function so that it can find the distance between two lines, or sets of x-y coordinates (in other words, how similar or different the lines are, not their geometric distance). I'm running into some problems though. I get how you take the value above to get deletion cost, and the one to the left to get addition, but during substitution I'm trying to use euchlidian distance, and it's not working for me.

If you could point out what I'm doing wrong, that would be awesome.

Here's the relevant code in javascript:

padlock.dtw = {
    _deletionCost: 1,
    _insertionCost: 1,
    levenshtein: function(a,b){
        var l1 = a.length, l2 = b.length;
        if (Math.min(l1, l2) === 0) {
            return Math.max(l1, l2);
        }
        var i = 0, j = 0, d = [];
        for (i = 0 ; i <= l1 ; i++) {
            d[i] = [];
            d[i][0] = i;
        }
        for (j = 0 ; j <= l2 ; j++) {
            d[0][j] = j;
        }
        for (i = 1 ; i <= l1 ; i++) {
            for (j = 1 ; j <= l2 ; j++) {
                d[i][j] = Math.min(
                    d[i - 1][j] + this._deletionCost, /* deletion */
                    d[i][j - 1] + this._insertionCost, /* addition */
                    d[i - 1][j - 1] + (a[i - 1] === b[j - 1] ? 0 : this.euclideanDistance(a[i-1], b[j-1])) /* substitution, use euchlidean distance as cost */
                );
            }
        }
        this._debugPrintMatrix(d);
        return d[l1][l2];
    },
    euclideanDistance: function(a, b){
        var xd = a[0]-b[0];
        var yd = a[1]-b[1];
        return Math.abs(Math.sqrt(Math.pow(xd, 2) + Math.pow(yd, 2)));
    },
    _debugPrintMatrix: function(m){
        for(var i=0;i<m.length;i++){
            console.log.apply(this, m[i]);
        }
    }
}

Sample output:

>>> padlock.dtw.levenshtein( [ [1,1], [0,9], [3,3], [4,4] ], [ [1,1], [2,2], [3,3], [4,4] ] )

Distance Matrix:
0 1 2                 3 4
1 0 1                 2 3
2 1 2                 3 4
3 2 2.414213562373095 2 3
4 3 3.414213562373095 3 2

Final Distance: 2

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

花开半夏魅人心 2024-08-25 00:35:27

如果我正确理解你的问题,那么你应该完全删除计算两点之间欧几里得距离的代码!

首先,让我重申你的问题:

你有两组点,例如

A = [ [1,1], [0,9], [3,3], [4,4] ]
B = [ [1,1], [2,2], [3,3], [4,4] ]

你尝试计算这两组点之间的编辑距离。你用“点”代替“字母”。

到目前为止,这是有道理的。只需将 levenshtein 算法中的“字母”替换为点即可!

但你犯了一个错误:原始的 Levenshtein 算法不计算两个字母之间的距离,例如 distance(a,b)=1 或 distance(a,d)=3。

您尝试用这样的东西扩展算法(使用 euclideanDistance() 函数)。但编辑算法并不适合此类事情。如果您仔细观察它,您会发现它不起作用(矩阵中的值具有含义,并且每次循环迭代都使用在先前迭代中计算的矩阵中的值)。

编辑距离是编辑距离,不是几何距离。您尝试更改它,以便它计算编辑距离和几何距离的混合。这种混合毫无意义,毫无用处,而且是错误的,恕我直言。

结论

要计算两组 xy 坐标的编辑距离,您应该将 euclidianDistance() 替换为简单的相等比较(a[0]==b [0] && a[1]==b[1])。

然后levenshtein算法会给你一个“编辑距离”。

If I understood your question correctly, then you should completely remove the code for computing euclidian distance between two points!

First, let me restate your question:

You have two sets of points, e.g.

A = [ [1,1], [0,9], [3,3], [4,4] ]
B = [ [1,1], [2,2], [3,3], [4,4] ]

You try to compute a levenshtein distance between those two sets. You substitute "letters" with "points".

Up to this point, it makes sense. Just replace the "letters" in levenshtein algorithm with points and you are done!

But you made a mistake: The original Levenshtein algorithm does not calculate distances between two letters, like e.g. distance(a,b)=1 or distance(a,d)=3.

You tried to extend the algorithm with such a thing (using euclideanDistance() function). But levenshtein algorithm is not meant for such things. And if you have a close look at it, you will see, that it will not work (the values in the matrix have a meaning, and each loop iteration uses values in the matrix that were computed in a previous iteration).

Levenshtein distance is an edit distance, no geometrical distance. You tried to change it, so that it computes a mix of edit and geometrical distance. This mix makes no sense, it is useless and wrong, IMHO.

Conclusion

To compute the levenshtein distance of two sets of x-y coordinates, you should replace your euclidianDistance() with a simple equality comparison (a[0]==b[0] && a[1]==b[1]).

Then the levenshtein algorithm will give you an "edit distance".

胡渣熟男 2024-08-25 00:35:27

使用几何学来计算两条线之间的距离不是更聪明吗?
或者是否有特定原因您不想使用它。

由于两条线总是有一个交点,除非它们平行(编辑,谢谢),所以很容易计算最小距离:即 0 或插入一些数学,可以在谷歌!

Wouldn't it be more clever to use geometrics for calculating the distance between two lines?
Or is there a specific reason you wouldn't want to use that.

Since two lines always have a point of intersection, unless they are parallel (edit, thanks), it's easy to calculate the smallest distance: that's 0 or insert some math, which can be found on google!

平安喜乐 2024-08-25 00:35:27

我不明白你为什么要使用 Levenshtein 来实现这一点,看来你可以通过简单的计算得到更好的结果。

  • 要找出线条角度的差异,您可以简单地找到每条线条的角度 (arctan((x_1-x_2)/(y_1-y_2))) 并减去它们。
  • 要找到线的平均距离,您可以简单地使用每条线的第一个点和每条线的第二个点的距离公式,并将这些距离平均在一起。

除此之外(除非你的线条是 3D 的),没有什么可以真正“比较”它们的。

也许我误解了。您是否想要比较各行的字符串值?

I don't understand why you would use Levenshtein for this, it appears that you would get much better results from simple calculations.

  • To find the difference in angle of the lines, you could simply find the angle for each line (arctan((x_1-x_2)/(y_1-y_2))) and subtract them.
  • To find the average distance of the lines, you could simply use the distance formula with the first point of each line and the second point of each line and average those distances together.

Other than that (unless your lines are in 3D), there's nothing else to really "compare" them with.

Perhaps I've misunderstood. Are you looking to compare the string values for the lines?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文