Levenshtein Distance:从矩阵推断编辑操作

发布于 2024-11-04 02:26:09 字数 209 浏览 7 评论 0原文

我用 C++ 编写了 Levenshtein 算法

如果我输入:
字符串 s: 民主党
string t: Republican

我得到矩阵 D 已填满,并且可以在 D[10][8] = 8
中读取操作数(编辑距离) 除了填充矩阵之外,我想构建最佳解决方案。该解决方案必须如何看待?我没有想法。
请只写信告诉我这个例子必须如何看待。

I wrote Levenshtein algorithm in in C++

If I input:
string s: democrat
string t: republican

I get the matrix D filled-up and the number of operations (the Levenshtein distance) can be read in D[10][8] = 8
Beyond the filled matrix I want to construct the optimal solution. How must look this solution? I don't have an idea.
Please only write me HOW MUST LOOK for this example.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

扮仙女 2024-11-11 02:26:09

问题是
给定 Levenshtein 算法生成的矩阵,如何找到“最优解”?
即,我们如何找到将“s string”转换为“t string”所需的字符串操作的精确顺序:插入、删除和替换[单个字母]?

首先,应该指出的是,在许多情况下,存在多个最佳解决方案
虽然 Levenshtein 算法提供了最小操作数(在民主党/共和党示例中为 8 )有许多序列(8 个操作)可以产生这种转换。

通过“解码”Levenshtein 矩阵,可以枚举所有此类最佳序列。
总体思路是,最优解都遵循一条“路径”,从左上角到右下角(或沿另一个方向),由此矩阵单元值在此路径上,要么保持不变,要么增加 1(或反向减少 1),从 0 开始,到相关字符串的最佳操作数结束(0 到 8 民主党/共和党情况)。当需要进行运算时,数字会增加;当字符串中相应位置的字母相同时,数字保持不变。

很容易产生一种产生这样一条路径的算法(产生所有可能的路径稍微复杂一些),并从这样的路径推导出操作的顺序。

该路径查找算法应从右下角开始并向后进行。采用这种方法的原因是我们知道一个事实,即要成为最佳解决方案,它必须在这个角结束,而要在这个角结束,它必须来自 3 个单元格之一,要么紧邻其左侧,要么紧邻其上方它或立即对角线。通过在这三个小区中选择一个满足我们“值相同或减一”要求的小区,我们可以有效地在最佳路径之一上选择一个小区。通过重复该操作直到到达左上角(或者实际上直到到达值为 0 的单元格),我们有效地在最佳路径上回溯。

以民主党 - 共和党为例进行说明

还应该指出的是,可以通过以下两种方式之一构建矩阵:水平或垂直使用“民主党”。这不会改变编辑距离的计算,也不会改变所需的操作列表;它只会改变我们解释矩阵的方式,例如在“路径”上水平移动意味着插入一个字符[从t字符串]或删除一个字符[从s字符串]取决于“字符串s”是否是“水平”或矩阵中的“垂直”。

我将使用以下矩阵。因此,约定是(仅沿着从左到右和/或从上到下的方向)

  • 水平移动是“t string”中的字母的插入
  • 垂直移动是从“字符串”中删除一个字母,
  • 对角线移动是:
    • 无操作(相应位置的两个字母相同);数字没有改变
    • a SUBSTITUTION(各个位置的字母不同);数量增加一。

编辑矩阵 s = "democrat", t="republican"

      r  e  p  u  b  l  i  c  a  n
   0  1  2  3  4  5  6  7  8  9  10
d  1  1  2  3  4  5  6  7  8  9  10
e  2  2  1  2  3  4  5  6  7  8  9
m  3  3  2  2  3  4  5  6  7  8  9
o  4  4  3  3  3  4  5  6  7  8  9
c  5  5  4  4  4  4  5  6  6  7  8
r  6  5  5  5  5  5  5  6  7  7  8
a  7  6  6  6  6  6  6  6  7  7  8
t  8  7  7  7  7  7  7  7  7  8  8

我用来从多个可能的最佳路径中选择一条路径的任意方法是下面大致描述:

Starting at the bottom-rightmost cell, and working our way backward toward 
the top left.

For each "backward" step, consider the 3 cells directly adjacent to the current
cell   (in the left, top or left+top directions)

   if the value in the diagonal cell (going up+left) is smaller or equal to the
      values found in the other two cells
   AND 
      if this is same or 1 minus the value of the current cell 
   then  "take the diagonal cell"
         if the value of the diagonal cell is one less than the current cell:
            Add a SUBSTITUTION operation (from the letters corresponding to
            the _current_ cell)
         otherwise: do not add an operation this was a no-operation.

   elseif the value in the cell to the left is smaller of equal to the value of
       the of the cell above current cell
   AND 
       if this value is same or 1 minus the value of the current cell 
   then "take the cell to left", and
        add an INSERTION of the letter corresponding to the cell
   else
       take the cell above, add
       Add a DELETION operation of the letter in 's string'

按照这个非正式的伪代码,我们得到以下内容:

从右下角的“n”、“t”单元格开始。
选择[对角线]“a”、“a”单元格作为下一个目的地,因为它小于其他两个(并且满足相同或 -1 条件)。
请注意,新单元格比当前单元格小 1
因此第 8 步是将“t”替换为“n”:   democra N

继续“a”,“a”单元格,
选择[对角线]“c”、“r”单元格作为下一个目的地...
请注意,新单元格与当前单元格具有相同的值==> 无需任何操作

继续“c”、“r”单元格,
选择[对角线]“i”、“c”单元格作为下一个目的地...
请注意,新单元格比当前单元格小 1
因此第 7 步是将“r”替换为“c”:   democ C

继续使用“i”、“c”单元格,
选择[对角线]“l”、“o”单元格作为下一个目的地...
请注意,新单元格比当前单元格小 1
因此第 6 步是将“c”替换为“i”:  演示可以

继续使用“l”、“o”单元格,
选择[对角线]“b”、“m”单元格作为下一个目的地...
请注意,新单元格比当前单元格小 1
因此第 5 步是将“o”替换为“l”:   dem L ican

继续使用“b”、“m”单元格,
选择[对角线]“u”、“e”单元格作为下一个目的地...
请注意,新单元格比当前单元格小 1
因此,第 4 步是将“m”替换为“b”:   de B lican

继续“u”、“e”单元格,
请注意,“对角线”单元格不合格,因为“左侧”单元格小于它。
选择[左]“p”、“e”单元格作为下一个目的地...
因此,步骤 3 是在“e”之后插入“u”:   de U blcan

继续使用“p”、“e”单元格,
同样,“对角线”单元格不符合条件
选择[左]“e”、“e”单元格作为下一个目的地...
因此,步骤 2 是在“e”之后插入“p”:   de P ublican

继续“e”,“e”单元格,
选择[对角线]“r”、“d”单元格作为下一个目的地...
请注意,新单元格与当前单元格具有相同的值==> 无需任何操作

继续“r”、“d”单元格,
选择[对角线]“开始”单元格作为下一个目的地...
请注意,新单元格比当前单元格小 1
因此,步骤 1 是将“d”替换为“r”:   R epublican

您已到达值为 0 的单元格:您的工作已完成

The question is
Given the matrix produced by the Levenshtein algorithm, how can one find "the optimal solution"?
i.e. how can we find the precise sequence of string operations: inserts, deletes and substitution [of a single letter], necessary to convert the 's string' into the 't string'?

First, it should be noted that in many cases there are SEVERAL optimal solutions.
While the Levenshtein algorithm supplies the minimum number of operations (8 in democrat/republican example) there are many sequences (of 8 operations) which can produce this conversion.

By "decoding" the Levenshtein matrix, one can enumerate ALL such optimal sequences.
The general idea is that the optimal solutions all follow a "path", from top left corner to bottom right corner (or in the other direction), whereby the matrix cell values on this path either remain the same or increase by one (or decrease by one in the reverse direction), starting at 0 and ending at the optimal number of operations for the strings in question (0 thru 8 democrat/republican case). The number increases when an operation is necessary, it stays the same when the letter at corresponding positions in the strings are the same.

It is easy to produce an algorithm which produces such a path (slightly more complicated to produce all possible paths), and from such path deduce the sequence of operations.

This path finding algorithm should start at the lower right corner and work its way backward. The reason for this approach is that we know for a fact that to be an optimal solution it must end in this corner, and to end in this corner, it must have come from one of the 3 cells either immediately to its left, immediately above it or immediately diagonally. By selecting a cell among these three cells, one which satisfies our "same value or decreasing by one" requirement, we effectively pick a cell on one of the optimal paths. By repeating the operation till we get on upper left corner (or indeed until we reach a cell with a 0 value), we effectively backtrack our way on an optimal path.

Illustration with the democrat - republican example

It should also be noted that one can build the matrix in one of two ways: with 'democrat' horizontally or vertically. This doesn't change the computation of the Levenshtein distance nor does it change the list of operations needed; it only changes the way we interpret the matrix, for example moving horizontally on the "path" either means inserting a character [from the t string] or deleting a character [off the s string] depending whether 'string s' is "horizontal" or "vertical" in the matrix.

I'll use the following matrix. The conventions are therefore (only going in the left-to-right and/or top-to-bottom directions)

  • an horizontal move is an INSERTION of a letter from the 't string'
  • an vertical move is a DELETION of a letter from the 's string'
  • a diagonal move is either:
    • a no-operation (both letters at respective positions are the same); the number doesn't change
    • a SUBSTITUTION (letters at respective positions are distinct); the number increase by one.

Levenshtein matrix for s = "democrat", t="republican"

      r  e  p  u  b  l  i  c  a  n
   0  1  2  3  4  5  6  7  8  9  10
d  1  1  2  3  4  5  6  7  8  9  10
e  2  2  1  2  3  4  5  6  7  8  9
m  3  3  2  2  3  4  5  6  7  8  9
o  4  4  3  3  3  4  5  6  7  8  9
c  5  5  4  4  4  4  5  6  6  7  8
r  6  5  5  5  5  5  5  6  7  7  8
a  7  6  6  6  6  6  6  6  7  7  8
t  8  7  7  7  7  7  7  7  7  8  8

The arbitrary approach I use to select one path among several possible optimal paths is loosely described below:

Starting at the bottom-rightmost cell, and working our way backward toward 
the top left.

For each "backward" step, consider the 3 cells directly adjacent to the current
cell   (in the left, top or left+top directions)

   if the value in the diagonal cell (going up+left) is smaller or equal to the
      values found in the other two cells
   AND 
      if this is same or 1 minus the value of the current cell 
   then  "take the diagonal cell"
         if the value of the diagonal cell is one less than the current cell:
            Add a SUBSTITUTION operation (from the letters corresponding to
            the _current_ cell)
         otherwise: do not add an operation this was a no-operation.

   elseif the value in the cell to the left is smaller of equal to the value of
       the of the cell above current cell
   AND 
       if this value is same or 1 minus the value of the current cell 
   then "take the cell to left", and
        add an INSERTION of the letter corresponding to the cell
   else
       take the cell above, add
       Add a DELETION operation of the letter in 's string'

Following this informal pseudo-code, we get the following:

Start on the "n", "t" cell at bottom right.
Pick the [diagonal] "a", "a" cell as next destination since it is less than the other two (and satisfies the same or -1 condition).
Note that the new cell is one less than current cell
therefore the step 8 is substitute "t" with "n":     democra N

Continue with "a", "a" cell,
Pick the [diagonal] "c", "r" cell as next destination...
Note that the new cell is same value as current cell ==> no operation needed.

Continue with "c", "r" cell,
Pick the [diagonal] "i", "c" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 7 is substitute "r" with "c":     democ C an

Continue with "i", "c" cell,
Pick the [diagonal] "l", "o" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 6 is substitute "c" with "i":     demo I can

Continue with "l", "o" cell,
Pick the [diagonal] "b", "m" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 5 is substitute "o" with "l":     dem L ican

Continue with "b", "m" cell,
Pick the [diagonal]"u", "e" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 4 is substitute "m" with "b":     de B lican

Continue with "u", "e" cell,
Note the "diagonal" cell doesn't qualify, because the "left" cell is less than it.
Pick the [left] "p", "e" cell as next destination...
therefore the step 3 is instert "u" after "e":     de U blican

Continue with "p", "e" cell,
again the "diagonal" cell doesn't qualify
Pick the [left] "e", "e" cell as next destination...
therefore the step 2 is instert "p" after "e":     de P ublican

Continue with "e", "e" cell,
Pick the [diagonal] "r", "d" cell as next destination...
Note that the new cell is same value as current cell ==> no operation needed.

Continue with "r", "d" cell,
Pick the [diagonal] "start" cell as next destination...
Note that the new cell is one less than current cell
therefore the step 1 is substitute "d" with "r":     R epublican

You've arrived at a cell which value is 0 : your work is done!

兲鉂ぱ嘚淚 2024-11-11 02:26:09

从 python 中实现的矩阵推断移动的回溯算法:

    def _backtrack_string(matrix, output_word):
    '''
    Iteratively backtrack DP matrix to get optimal set of moves

    Inputs: DP matrix (list:list:int),
            Input word (str),
            Output word (str),
            Start x position in DP matrix (int),
            Start y position in DP matrix (int)
    Output: Optimal path (list)
    '''

    i = len(matrix) - 1
    j = len(matrix[0]) - 1
    optimal_path = []
    while i > 0 and j > 0:
        diagonal = matrix[i-1][j-1]
        vertical = matrix[i-1][j]
        horizontal = matrix[i][j-1]
        current = matrix[i][j]
        if diagonal <= vertical and diagonal <= horizontal and (diagonal <= current):
            i = i - 1
            j = j - 1
            if diagonal == current - 1:
                optimal_path.append("Replace " + str(j) + ", " + str(output_word[j]) )
            elif horizontal <= vertical and horizontal <= current:
                j = j - 1
                optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
            elif vertical <= horizontal and vertical <= current:
                i = i - 1
                optimal_path.append("Delete " + str(i))
        elif horizontal <= vertical and horizontal <= current:
            j = j - 1
            optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
        else:
            i = i - 1
            optimal_path.append("Delete " + str(i))

    return reversed(optimal_path)

当我使用原始单词“OPERATING”和所需单词“CONSTANTINE”运行算法时得到的输出如下

    Insert 0, C
    Replace 2, N
    Replace 3, S
    Replace 4, T
    Insert 6, N
    Replace 10, E

       ""  C  O  N  S  T  A  N  T  I  N   E

    "" [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 
              <--                              Insert 0, C
    O  [1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9,  10]
                \                              Replace 2, N
    P  [2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9,  10]
                   \                           Replace 3, S
    E  [3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9,  9]
                      \                        Replace 4, T
    R  [4, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9,  10]  No move
                         \ <--                 Insert 6, N
    A  [5, 5, 5, 5, 5, 5, 4, 5, 6, 7, 8,  9]
                               \               No move
    T  [6, 6, 6, 6, 6, 5, 5, 5, 5, 6, 7,  8]
                                  \            No move
    I  [7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 6,  7]
                                     \         No move
    N  [8, 8, 8, 7, 8, 7, 7, 6, 7, 6, 5,  6]
                                        \      Replace 10, E
    G  [9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6,  6]

请注意,如果元素在对角线与当前元素相同。根据垂直(上)和水平(左)位置的值,可能会进行删除或插入。当发生以下情况时,我们只会得到“无操作”或“替换”操作

# assume bottom right of a 2x2 matrix is the reference position 
# and has value v
# the following is the situation where we get a replace operation
    [v + 1 , v<]
    [  v<  , v]
# the following is the situation where we get a "no operation"
    [v , v<]
    [v<, v ] 

我认为这是第一个答案中描述的算法可能会崩溃的地方。当两个操作都不正确时,上面的 2x2 矩阵中可能还有其他排列。除非考虑到这一点,否则显示的输入“OPERATING”和输出“CONSTANTINE”的示例会破坏算法。

The backtracking algorithm to infer the moves from the matrix implemented in python:

    def _backtrack_string(matrix, output_word):
    '''
    Iteratively backtrack DP matrix to get optimal set of moves

    Inputs: DP matrix (list:list:int),
            Input word (str),
            Output word (str),
            Start x position in DP matrix (int),
            Start y position in DP matrix (int)
    Output: Optimal path (list)
    '''

    i = len(matrix) - 1
    j = len(matrix[0]) - 1
    optimal_path = []
    while i > 0 and j > 0:
        diagonal = matrix[i-1][j-1]
        vertical = matrix[i-1][j]
        horizontal = matrix[i][j-1]
        current = matrix[i][j]
        if diagonal <= vertical and diagonal <= horizontal and (diagonal <= current):
            i = i - 1
            j = j - 1
            if diagonal == current - 1:
                optimal_path.append("Replace " + str(j) + ", " + str(output_word[j]) )
            elif horizontal <= vertical and horizontal <= current:
                j = j - 1
                optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
            elif vertical <= horizontal and vertical <= current:
                i = i - 1
                optimal_path.append("Delete " + str(i))
        elif horizontal <= vertical and horizontal <= current:
            j = j - 1
            optimal_path.append("Insert " + str(j) + ", " + str(output_word[j]))
        else:
            i = i - 1
            optimal_path.append("Delete " + str(i))

    return reversed(optimal_path)

The output I get when I run the algorithm with original word "OPERATING" and desired word "CONSTANTINE" is the following

    Insert 0, C
    Replace 2, N
    Replace 3, S
    Replace 4, T
    Insert 6, N
    Replace 10, E

       ""  C  O  N  S  T  A  N  T  I  N   E

    "" [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 
              <--                              Insert 0, C
    O  [1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9,  10]
                \                              Replace 2, N
    P  [2, 2, 2, 2, 3, 4, 5, 6, 7, 8, 9,  10]
                   \                           Replace 3, S
    E  [3, 3, 3, 3, 3, 4, 5, 6, 7, 8, 9,  9]
                      \                        Replace 4, T
    R  [4, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9,  10]  No move
                         \ <--                 Insert 6, N
    A  [5, 5, 5, 5, 5, 5, 4, 5, 6, 7, 8,  9]
                               \               No move
    T  [6, 6, 6, 6, 6, 5, 5, 5, 5, 6, 7,  8]
                                  \            No move
    I  [7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 6,  7]
                                     \         No move
    N  [8, 8, 8, 7, 8, 7, 7, 6, 7, 6, 5,  6]
                                        \      Replace 10, E
    G  [9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 6,  6]

Note that I had to add extra conditions if the element in the diagonal is the same as the current element. There could be a deletion or insertion depending on values in the vertical (up) and horizontal (left) positions. We only get a "no operation" or "replace" operation when the following occurs

# assume bottom right of a 2x2 matrix is the reference position 
# and has value v
# the following is the situation where we get a replace operation
    [v + 1 , v<]
    [  v<  , v]
# the following is the situation where we get a "no operation"
    [v , v<]
    [v<, v ] 

I think this is where the algorithm described in the first answer could break. There could be other arrangements in the 2x2 matrix above when neither operations are correct. The example shown with input "OPERATING" and output "CONSTANTINE" breaks the algorithm unless this is taken into account.

一个人练习一个人 2024-11-11 02:26:09

我已经使用它好几次了,但在我看来,矩阵应该看起来像这样:

. . r e p u b l i c a n
. 0 1 2 3 4 5 6 7 8 9 10
d 1 1 2 3 4 5 6 7 8 9 10
e 2 2 1 2 3 4 5 6 7 8 9
m 3 3 2 2 3 4 5 6 7 8 9
o 4 4 3 3 3 4 5 6 7 8 9
c 5 5 4 4 4 4 5 6 7 8 9
r 6 5 5 5 5 5 5 6 7 8 9
a 7 6 6 6 6 6 6 6 7 7 8
t 8 7 7 7 7 7 7 7 7 7 8

不过,不要认为这是理所当然的。

It's been some times since I played with it, but it seems to me the matrix should look something like:

. . r e p u b l i c a n
. 0 1 2 3 4 5 6 7 8 9 10
d 1 1 2 3 4 5 6 7 8 9 10
e 2 2 1 2 3 4 5 6 7 8 9
m 3 3 2 2 3 4 5 6 7 8 9
o 4 4 3 3 3 4 5 6 7 8 9
c 5 5 4 4 4 4 5 6 7 8 9
r 6 5 5 5 5 5 5 6 7 8 9
a 7 6 6 6 6 6 6 6 7 7 8
t 8 7 7 7 7 7 7 7 7 7 8

Don't take it for granted though.

荆棘i 2024-11-11 02:26:09

这是基于 mjv 答案的 VBA 算法。
(解释得很好,但缺少一些案例)。

    Sub TU_Levenshtein()

        Call Levenshtein("democrat", "republican")

        Call Levenshtein("ooo", "u") 

        Call Levenshtein("ceci est un test", "ceci n'est pas un test")

    End Sub

    Sub Levenshtein(ByVal string1 As String, ByVal string2 As String)

        ' Fill Matrix Levenshtein (-> array 'Distance')

        Dim i As Long, j As Long
        Dim string1_length As Long
        Dim string2_length As Long
        Dim distance() As Long

        string1_length = Len(string1)
        string2_length = Len(string2)
        ReDim distance(string1_length, string2_length)

        For i = 0 To string1_length
            distance(i, 0) = i
        Next

        For j = 0 To string2_length
            distance(0, j) = j
        Next

        For i = 1 To string1_length
            For j = 1 To string2_length
                If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
                    distance(i, j) = distance(i - 1, j - 1)
                Else
                    distance(i, j) = Application.WorksheetFunction.min _
                    (distance(i - 1, j) + 1, _
                     distance(i, j - 1) + 1, _
                     distance(i - 1, j - 1) + 1)
                End If
            Next
        Next

        LevenshteinDistance = distance(string1_length, string2_length) ' for information only

        ' Write Matrix on VBA sheets (only for visuation, not used in calculus)

        Cells.Clear

        For i = 1 To UBound(distance, 1)
            Cells(i + 2, 1).Value = Mid(string1, i, 1)
        Next i

        For i = 1 To UBound(distance, 2)
            Cells(1, i + 2).Value = Mid(string2, i, 1)
        Next i

        For i = 0 To UBound(distance, 1)

            For j = 0 To UBound(distance, 2)

                Cells(i + 2, j + 2) = distance(i, j)

            Next j

        Next i

        ' One solution

        current_posx = UBound(distance, 1)
        current_posy = UBound(distance, 2)

        Do

            cc = distance(current_posx, current_posy)

            Cells(current_posx + 1, current_posy + 1).Interior.Color = vbYellow ' visualisation again

            ' Manage border case

            If current_posy - 1 < 0 Then

                MsgBox ("deletion. " & Mid(string1, current_posx, 1))

                current_posx = current_posx - 1
                current_posy = current_posy

                GoTo suivant

            End If

            If current_posx - 1 < 0 Then

                MsgBox ("insertion. " & Mid(string2, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            End If

            ' Middle cases

            cc_L = distance(current_posx, current_posy - 1)
            cc_U = distance(current_posx - 1, current_posy)
            cc_D = distance(current_posx - 1, current_posy - 1)

            If (cc_D <= cc_L And cc_D <= cc_U) And (cc_D = cc - 1 Or cc_D = cc) Then

                If (cc_D = cc - 1) Then

                    MsgBox "substitution. " & Mid(string1, current_posx, 1) & " by " & Mid(string2, current_posy, 1)

                    current_posx = current_posx - 1
                    current_posy = current_posy - 1

                    GoTo suivant

                Else

                    MsgBox "no operation"

                    current_posx = current_posx - 1
                    current_posy = current_posy - 1

                    GoTo suivant

                End If

            ElseIf cc_L <= cc_D And cc_L = cc - 1 Then

                MsgBox ("insertion. " & Mid(string2, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            Else

                MsgBox ("deletion." & Mid(string1, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            End If

    suivant:

        Loop While Not (current_posx = 0 And current_posy = 0)

    End Sub

Here is a VBA algorithm based on mjv's answer.
(very well explained, but some case were missing).

    Sub TU_Levenshtein()

        Call Levenshtein("democrat", "republican")

        Call Levenshtein("ooo", "u") 

        Call Levenshtein("ceci est un test", "ceci n'est pas un test")

    End Sub

    Sub Levenshtein(ByVal string1 As String, ByVal string2 As String)

        ' Fill Matrix Levenshtein (-> array 'Distance')

        Dim i As Long, j As Long
        Dim string1_length As Long
        Dim string2_length As Long
        Dim distance() As Long

        string1_length = Len(string1)
        string2_length = Len(string2)
        ReDim distance(string1_length, string2_length)

        For i = 0 To string1_length
            distance(i, 0) = i
        Next

        For j = 0 To string2_length
            distance(0, j) = j
        Next

        For i = 1 To string1_length
            For j = 1 To string2_length
                If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
                    distance(i, j) = distance(i - 1, j - 1)
                Else
                    distance(i, j) = Application.WorksheetFunction.min _
                    (distance(i - 1, j) + 1, _
                     distance(i, j - 1) + 1, _
                     distance(i - 1, j - 1) + 1)
                End If
            Next
        Next

        LevenshteinDistance = distance(string1_length, string2_length) ' for information only

        ' Write Matrix on VBA sheets (only for visuation, not used in calculus)

        Cells.Clear

        For i = 1 To UBound(distance, 1)
            Cells(i + 2, 1).Value = Mid(string1, i, 1)
        Next i

        For i = 1 To UBound(distance, 2)
            Cells(1, i + 2).Value = Mid(string2, i, 1)
        Next i

        For i = 0 To UBound(distance, 1)

            For j = 0 To UBound(distance, 2)

                Cells(i + 2, j + 2) = distance(i, j)

            Next j

        Next i

        ' One solution

        current_posx = UBound(distance, 1)
        current_posy = UBound(distance, 2)

        Do

            cc = distance(current_posx, current_posy)

            Cells(current_posx + 1, current_posy + 1).Interior.Color = vbYellow ' visualisation again

            ' Manage border case

            If current_posy - 1 < 0 Then

                MsgBox ("deletion. " & Mid(string1, current_posx, 1))

                current_posx = current_posx - 1
                current_posy = current_posy

                GoTo suivant

            End If

            If current_posx - 1 < 0 Then

                MsgBox ("insertion. " & Mid(string2, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            End If

            ' Middle cases

            cc_L = distance(current_posx, current_posy - 1)
            cc_U = distance(current_posx - 1, current_posy)
            cc_D = distance(current_posx - 1, current_posy - 1)

            If (cc_D <= cc_L And cc_D <= cc_U) And (cc_D = cc - 1 Or cc_D = cc) Then

                If (cc_D = cc - 1) Then

                    MsgBox "substitution. " & Mid(string1, current_posx, 1) & " by " & Mid(string2, current_posy, 1)

                    current_posx = current_posx - 1
                    current_posy = current_posy - 1

                    GoTo suivant

                Else

                    MsgBox "no operation"

                    current_posx = current_posx - 1
                    current_posy = current_posy - 1

                    GoTo suivant

                End If

            ElseIf cc_L <= cc_D And cc_L = cc - 1 Then

                MsgBox ("insertion. " & Mid(string2, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            Else

                MsgBox ("deletion." & Mid(string1, current_posy, 1))

                current_posx = current_posx
                current_posy = current_posy - 1

                GoTo suivant

            End If

    suivant:

        Loop While Not (current_posx = 0 And current_posy = 0)

    End Sub
坏尐絯℡ 2024-11-11 02:26:09

JackIsJack 答案的 C# 实现有一些更改:

  • 操作按“正向”顺序输出(JackIsJack 按反向顺序输出);
  • 原始答案中的最后一个“else”子句工作不正确(看起来像复制粘贴错误)。

控制台应用程序代码:

class Program
{
    static void Main(string[] args)
    {
        Levenshtein("1", "1234567890");
        Levenshtein( "1234567890", "1");

        Levenshtein("kitten", "mittens");
        Levenshtein("mittens", "kitten");
        Levenshtein("kitten", "sitting");
        Levenshtein("sitting", "kitten");
        Levenshtein("1234567890", "12356790");
        Levenshtein("12356790", "1234567890");
        Levenshtein("ceci est un test", "ceci n'est pas un test");
        Levenshtein("ceci n'est pas un test", "ceci est un test");
    }

    static void Levenshtein(string string1, string string2)
    {
        Console.WriteLine("Levenstein '" + string1 + "' => '" + string2 + "'");

        var string1_length = string1.Length;
        var string2_length = string2.Length;

        int[,] distance = new int[string1_length + 1, string2_length + 1];

        for (int i = 0; i <= string1_length; i++)
        {
            distance[i, 0] = i;
        }


        for (int j = 0; j <= string2_length; j++)
        {
            distance[0, j] = j;
        }


        for (int i = 1; i <= string1_length; i++)
        {
            for (int j = 1; j <= string2_length; j++)
            {
                if (string1[i - 1] == string2[j - 1])
                {
                    distance[i, j] = distance[i - 1, j - 1];
                }
                else
                {
                    distance[i, j] = Math.Min(distance[i - 1, j] + 1, Math.Min(
                       distance[i, j - 1] + 1,
                       distance[i - 1, j - 1] + 1));
                }

            }
        }


        var LevenshteinDistance = distance[string1_length, string2_length];// for information only
        Console.WriteLine($"Levernstein distance: {LevenshteinDistance}");

        // List of operations
        var current_posx = string1_length;
        var current_posy = string2_length;

        var stack = new Stack<string>(); // for outputting messages in forward direction

        while (current_posx != 0 || current_posy != 0)
        {
            var cc = distance[current_posx, current_posy];
            // edge cases
            if (current_posy - 1 < 0)
            {
                stack.Push("Delete '" + string1[current_posx - 1] + "'");
                current_posx--;
                continue;
            }

            if (current_posx - 1 < 0)
            {
                stack.Push("Insert '" + string2[current_posy - 1] + "'");
                current_posy--;
                continue;
            }

            // Middle cases
            var cc_L = distance[current_posx, current_posy - 1];
            var cc_U = distance[current_posx - 1, current_posy];
            var cc_D = distance[current_posx - 1, current_posy - 1];

            if ((cc_D <= cc_L && cc_D <= cc_U) && (cc_D == cc - 1 || cc_D == cc))
            {
                if (cc_D == cc - 1)
                {
                    stack.Push("Substitute '" + string1[current_posx - 1] + "' by '" + string2[current_posy - 1] + "'");
                    current_posx--;
                    current_posy--;
                }
                else
                {
                    stack.Push("Keep '" + string1[current_posx - 1] + "'");
                    current_posx--;
                    current_posy--;
                }
            }
            else if (cc_L <= cc_D && cc_L == cc - 1)
            {
                stack.Push("Insert '" + string2[current_posy - 1] + "'");
                current_posy--;                   
            }
            else
            {
                stack.Push("Delete '" + string1[current_posx - 1]+"'");
                current_posx--;                   
            }
        }

        while(stack.Count > 0)
        {
            Console.WriteLine(stack.Pop());
        }
    }
}

C# implementation of JackIsJack answer with some changes:

  • Operations are output in 'forward' order (JackIsJack outputs in reverse order);
  • Last 'else' clause in original answer worked incorrectly (looks like copy-paste error).

Console application code:

class Program
{
    static void Main(string[] args)
    {
        Levenshtein("1", "1234567890");
        Levenshtein( "1234567890", "1");

        Levenshtein("kitten", "mittens");
        Levenshtein("mittens", "kitten");
        Levenshtein("kitten", "sitting");
        Levenshtein("sitting", "kitten");
        Levenshtein("1234567890", "12356790");
        Levenshtein("12356790", "1234567890");
        Levenshtein("ceci est un test", "ceci n'est pas un test");
        Levenshtein("ceci n'est pas un test", "ceci est un test");
    }

    static void Levenshtein(string string1, string string2)
    {
        Console.WriteLine("Levenstein '" + string1 + "' => '" + string2 + "'");

        var string1_length = string1.Length;
        var string2_length = string2.Length;

        int[,] distance = new int[string1_length + 1, string2_length + 1];

        for (int i = 0; i <= string1_length; i++)
        {
            distance[i, 0] = i;
        }


        for (int j = 0; j <= string2_length; j++)
        {
            distance[0, j] = j;
        }


        for (int i = 1; i <= string1_length; i++)
        {
            for (int j = 1; j <= string2_length; j++)
            {
                if (string1[i - 1] == string2[j - 1])
                {
                    distance[i, j] = distance[i - 1, j - 1];
                }
                else
                {
                    distance[i, j] = Math.Min(distance[i - 1, j] + 1, Math.Min(
                       distance[i, j - 1] + 1,
                       distance[i - 1, j - 1] + 1));
                }

            }
        }


        var LevenshteinDistance = distance[string1_length, string2_length];// for information only
        Console.WriteLine($"Levernstein distance: {LevenshteinDistance}");

        // List of operations
        var current_posx = string1_length;
        var current_posy = string2_length;

        var stack = new Stack<string>(); // for outputting messages in forward direction

        while (current_posx != 0 || current_posy != 0)
        {
            var cc = distance[current_posx, current_posy];
            // edge cases
            if (current_posy - 1 < 0)
            {
                stack.Push("Delete '" + string1[current_posx - 1] + "'");
                current_posx--;
                continue;
            }

            if (current_posx - 1 < 0)
            {
                stack.Push("Insert '" + string2[current_posy - 1] + "'");
                current_posy--;
                continue;
            }

            // Middle cases
            var cc_L = distance[current_posx, current_posy - 1];
            var cc_U = distance[current_posx - 1, current_posy];
            var cc_D = distance[current_posx - 1, current_posy - 1];

            if ((cc_D <= cc_L && cc_D <= cc_U) && (cc_D == cc - 1 || cc_D == cc))
            {
                if (cc_D == cc - 1)
                {
                    stack.Push("Substitute '" + string1[current_posx - 1] + "' by '" + string2[current_posy - 1] + "'");
                    current_posx--;
                    current_posy--;
                }
                else
                {
                    stack.Push("Keep '" + string1[current_posx - 1] + "'");
                    current_posx--;
                    current_posy--;
                }
            }
            else if (cc_L <= cc_D && cc_L == cc - 1)
            {
                stack.Push("Insert '" + string2[current_posy - 1] + "'");
                current_posy--;                   
            }
            else
            {
                stack.Push("Delete '" + string1[current_posx - 1]+"'");
                current_posx--;                   
            }
        }

        while(stack.Count > 0)
        {
            Console.WriteLine(stack.Pop());
        }
    }
}
臻嫒无言 2024-11-11 02:26:09

我最近对 ​​Levenshtein 距离算法的矩阵做了一些工作。我需要生成将一个列表转换为另一个列表的操作。 (这也适用于字符串。)

以下(誓言)测试是否显示了您正在寻找的功能类型?

  , "lev - complex 2"
  : { topic
    : lev.diff([13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11], [9, 13, 6, 5, 1, 8, 2, 15, 12, 11])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 9, val: 7 },
                                                 { op: 'delete', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 0, val: 9 },
                                                ]); }
    }
  , "lev - complex 3"
  : { topic
    : lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
                                                 { op: 'insert', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 9, val: 7 }
                                                ]); }
    }
  , "lev - complex 4"
  : { topic
    : lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11, 16], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11, 17])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
                                                 { op: 'insert', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 9, val: 7 },
                                                 { op: 'replace', pos: 11, val: 17 }
                                                ]); }
    }

I've done some work with the Levenshtein distance algorithm's matrix recently. I needed to produce the operations which would transform one list into another. (This will work for strings too.)

Do the following (vows) tests show the sort of functionality that you're looking for?

  , "lev - complex 2"
  : { topic
    : lev.diff([13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11], [9, 13, 6, 5, 1, 8, 2, 15, 12, 11])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 9, val: 7 },
                                                 { op: 'delete', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 0, val: 9 },
                                                ]); }
    }
  , "lev - complex 3"
  : { topic
    : lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
                                                 { op: 'insert', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 9, val: 7 }
                                                ]); }
    }
  , "lev - complex 4"
  : { topic
    : lev.diff([9, 13, 6, 5, 1, 8, 2, 15, 12, 11, 16], [13, 6, 5, 1, 8, 9, 2, 15, 12, 7, 11, 17])
    , "check actions"
    : function(topic) { assert.deepEqual(topic, [{ op: 'delete', pos: 0, val: 9 },
                                                 { op: 'insert', pos: 5, val: 9 },
                                                 { op: 'insert', pos: 9, val: 7 },
                                                 { op: 'replace', pos: 11, val: 17 }
                                                ]); }
    }
陌生 2024-11-11 02:26:09

这是一些 Matlab 代码,您认为这是正确的吗?似乎给出了正确的结果:)

clear all

s = char('democrat');
t = char('republican');

% Edit Matrix
m=length(s);
n=length(t);
mat=zeros(m+1,n+1);
for i=1:1:m
    mat(i+1,1)=i;
end
for j=1:1:n
    mat(1,j+1)=j;
end
for i=1:m
    for j=1:n
        if (s(i) == t(j))
            mat(i+1,j+1)=mat(i,j);
        else
            mat(i+1,j+1)=1+min(min(mat(i+1,j),mat(i,j+1)),mat(i,j));
        end
    end
end

% Edit Sequence
s = char('democrat');
t = char('republican');
i = m+1;
j = n+1;
display([s ' --> ' t])
while(i ~= 1 && j ~= 1)
    temp = min(min(mat(i-1,j-1), mat(i,j-1)), mat(i-1,j));
    if(mat(i-1,j) == temp)
        i = i - 1;
        t = [t(1:j-1) s(i) t(j:end)];
        disp(strcat(['iinsertion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
    elseif(mat(i-1,j-1) == temp)
        if(mat(i-1,j-1) == mat(i,j))
            i = i - 1;
            j = j - 1;
            disp(strcat(['uunchanged: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
        else
            i = i - 1;
            j = j - 1;
            t(j) = s(i);
            disp(strcat(['substition: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
        end
    elseif(mat(i,j-1) == temp)
        j = j - 1;
        t(j) = [];
        disp(strcat(['dddeletion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
    end
end

Here is some Matlab code, is this correct by your opinion? Seems to give the right results :)

clear all

s = char('democrat');
t = char('republican');

% Edit Matrix
m=length(s);
n=length(t);
mat=zeros(m+1,n+1);
for i=1:1:m
    mat(i+1,1)=i;
end
for j=1:1:n
    mat(1,j+1)=j;
end
for i=1:m
    for j=1:n
        if (s(i) == t(j))
            mat(i+1,j+1)=mat(i,j);
        else
            mat(i+1,j+1)=1+min(min(mat(i+1,j),mat(i,j+1)),mat(i,j));
        end
    end
end

% Edit Sequence
s = char('democrat');
t = char('republican');
i = m+1;
j = n+1;
display([s ' --> ' t])
while(i ~= 1 && j ~= 1)
    temp = min(min(mat(i-1,j-1), mat(i,j-1)), mat(i-1,j));
    if(mat(i-1,j) == temp)
        i = i - 1;
        t = [t(1:j-1) s(i) t(j:end)];
        disp(strcat(['iinsertion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
    elseif(mat(i-1,j-1) == temp)
        if(mat(i-1,j-1) == mat(i,j))
            i = i - 1;
            j = j - 1;
            disp(strcat(['uunchanged: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
        else
            i = i - 1;
            j = j - 1;
            t(j) = s(i);
            disp(strcat(['substition: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
        end
    elseif(mat(i,j-1) == temp)
        j = j - 1;
        t(j) = [];
        disp(strcat(['dddeletion: i=' int2str(i) ' , j=' int2str(j) ' ; ' s ' --> ' t]))
    end
end
笑看君怀她人 2024-11-11 02:26:09

根据编辑矩阵、源和目标获取所有编辑路径的代码。如果有任何错误请发表评论。多谢!

import copy
from typing import List, Union


def edit_distance(source: Union[List[str], str],
                  target: Union[List[str], str],
                  return_distance: bool = False):
    """get the edit matrix
    """
    edit_matrix = [[i + j for j in range(len(target) + 1)] for i in range(len(source) + 1)]
    
    for i in range(1, len(source) + 1):
        for j in range(1, len(target) + 1):
            if source[i - 1] == target[j - 1]:
                d = 0
            else:
                d = 1
            edit_matrix[i][j] = min(edit_matrix[i - 1][j] + 1,
                                    edit_matrix[i][j - 1] + 1,
                                    edit_matrix[i - 1][j - 1] + d)
    if return_distance:
        return edit_matrix[len(source)][len(target)]
    return edit_matrix


def get_edit_paths(matrix: List[List[int]],
                   source: Union[List[str], str],
                   target: Union[List[str], str]):
    """get all the valid edit paths
    """
    all_paths = []
    
    def _edit_path(i, j, optimal_path):
        if i > 0 and j > 0:
            diagonal = matrix[i - 1][j - 1]  # the diagonal value
            vertical = matrix[i - 1][j]      # the above value
            horizontal = matrix[i][j - 1]    # the left value
            current = matrix[i][j]           # current value
            
            # whether the source and target token are the same
            flag = False
            # compute the minimal value of the diagonal, vertical and horizontal
            minimal = min(diagonal, min(vertical, horizontal))
            
            # if the diagonal is the minimal
            if diagonal == minimal:
                new_i = i - 1
                new_j = j - 1
                path_ = copy.deepcopy(optimal_path)
                # if the diagnoal value equals to current - 1
                # it means `replace`` operation
                if diagonal == current - 1:
                    path_.append(f"Replace | {new_j} | {target[new_j]}")
                    _edit_path(new_i, new_j, path_)
                # if the diagonal value equals to current value
                # and corresponding positional value of source and target equal
                # it means this is current best path
                elif source[new_i] == target[new_j]:
                    flag = True
                    # path_.append(f"Keep | {new_i}")
                    _edit_path(new_i, new_j, path_)
            # if the position doesn't have best path
            # we need to consider other situations
            if not flag:
                # if vertical value equals to minimal
                # it means delete source corresponding value
                if vertical == minimal:
                    new_i = i - 1
                    new_j = j
                    path_ = copy.deepcopy(optimal_path)
                    path_.append(f"Delete | {new_i}")
                    _edit_path(new_i, new_j, path_)
                
                # if horizontal value equals to minimal
                # if mean insert target corresponding value to source
                if horizontal == minimal:
                    new_i = i
                    new_j = j - 1
                    path_ = copy.deepcopy(optimal_path)
                    path_.append(f"Insert | {new_j} | {target[new_j]}")
                    _edit_path(new_i, new_j, path_)
        else:
            all_paths.append(list(reversed(optimal_path)))
    # get the rows and columns of the edit matrix
    row_len = len(matrix) - 1
    col_len = len(matrix[0]) - 1
    _edit_path(row_len, col_len, optimal_path=[])
    return all_paths


if __name__ == "__main__":
    source = "BBDEF"
    target = "ABCDF"
    matrix = edit_distance(source, target)
    print("print paths")
    paths = get_edit_paths(matrix, source=list(source), target=list(target))
    for path in paths:
        print(path)

The code to get all the edit paths according to edit matrix, source and target. Make a comment if there are any bugs. Thanks a lot!

import copy
from typing import List, Union


def edit_distance(source: Union[List[str], str],
                  target: Union[List[str], str],
                  return_distance: bool = False):
    """get the edit matrix
    """
    edit_matrix = [[i + j for j in range(len(target) + 1)] for i in range(len(source) + 1)]
    
    for i in range(1, len(source) + 1):
        for j in range(1, len(target) + 1):
            if source[i - 1] == target[j - 1]:
                d = 0
            else:
                d = 1
            edit_matrix[i][j] = min(edit_matrix[i - 1][j] + 1,
                                    edit_matrix[i][j - 1] + 1,
                                    edit_matrix[i - 1][j - 1] + d)
    if return_distance:
        return edit_matrix[len(source)][len(target)]
    return edit_matrix


def get_edit_paths(matrix: List[List[int]],
                   source: Union[List[str], str],
                   target: Union[List[str], str]):
    """get all the valid edit paths
    """
    all_paths = []
    
    def _edit_path(i, j, optimal_path):
        if i > 0 and j > 0:
            diagonal = matrix[i - 1][j - 1]  # the diagonal value
            vertical = matrix[i - 1][j]      # the above value
            horizontal = matrix[i][j - 1]    # the left value
            current = matrix[i][j]           # current value
            
            # whether the source and target token are the same
            flag = False
            # compute the minimal value of the diagonal, vertical and horizontal
            minimal = min(diagonal, min(vertical, horizontal))
            
            # if the diagonal is the minimal
            if diagonal == minimal:
                new_i = i - 1
                new_j = j - 1
                path_ = copy.deepcopy(optimal_path)
                # if the diagnoal value equals to current - 1
                # it means `replace`` operation
                if diagonal == current - 1:
                    path_.append(f"Replace | {new_j} | {target[new_j]}")
                    _edit_path(new_i, new_j, path_)
                # if the diagonal value equals to current value
                # and corresponding positional value of source and target equal
                # it means this is current best path
                elif source[new_i] == target[new_j]:
                    flag = True
                    # path_.append(f"Keep | {new_i}")
                    _edit_path(new_i, new_j, path_)
            # if the position doesn't have best path
            # we need to consider other situations
            if not flag:
                # if vertical value equals to minimal
                # it means delete source corresponding value
                if vertical == minimal:
                    new_i = i - 1
                    new_j = j
                    path_ = copy.deepcopy(optimal_path)
                    path_.append(f"Delete | {new_i}")
                    _edit_path(new_i, new_j, path_)
                
                # if horizontal value equals to minimal
                # if mean insert target corresponding value to source
                if horizontal == minimal:
                    new_i = i
                    new_j = j - 1
                    path_ = copy.deepcopy(optimal_path)
                    path_.append(f"Insert | {new_j} | {target[new_j]}")
                    _edit_path(new_i, new_j, path_)
        else:
            all_paths.append(list(reversed(optimal_path)))
    # get the rows and columns of the edit matrix
    row_len = len(matrix) - 1
    col_len = len(matrix[0]) - 1
    _edit_path(row_len, col_len, optimal_path=[])
    return all_paths


if __name__ == "__main__":
    source = "BBDEF"
    target = "ABCDF"
    matrix = edit_distance(source, target)
    print("print paths")
    paths = get_edit_paths(matrix, source=list(source), target=list(target))
    for path in paths:
        print(path)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文