这是旅行推销员问题的变体吗?
我对两个单词列表的函数感兴趣,它将返回它们之间的顺序无关的编辑距离。
也就是说,参数将是两个单词列表(假设以空格分隔),返回值将是列表中单词的编辑(或编辑)距离的最小总和。
"catratbat"
和 "ratbatcat"
之间的距离将为 0。"catratbat"
和 " 之间的距离fat had bad”
与 “rat bat cat”
和 “had fat bad”
之间的距离相同,4. 在这种情况下单词数列表中的内容不相同,较短的列表将用 0 长度的单词填充。
我的直觉(没有通过计算机科学课程培养)除了使用暴力之外没有找到任何其他解决方案:
|had|fat|bad| a solution
---+---+---+---+ +---+---+---+
cat| 2 | 1 | 2 | | | 1 | |
---+---+---+---+ +---+---+---+
rat| 2 | 1 | 2 | | 3 | | |
---+---+---+---+ +---+---+---+
bat| 2 | 1 | 1 | | | | 4 |
---+---+---+---+ +---+---+---+
从第一行开始,选择一列并转到下一行,而无需重新访问您已经访问过的列。一遍又一遍地这样做,直到尝试完所有组合为止。
对我来说,这听起来有点像旅行推销员问题。是吗?您将如何解决我的具体问题?
I'm interested in a function of two word lists, which would return an order agnostic edit distance between them.
That is, the arguments would be two lists of (let's say space delimited) words and return value would be the minimum sum of the edit (or Levenshtein) distances of the words in the lists.
Distance between "cat rat bat"
and "rat bat cat"
would be 0. Distance between "cat rat bat"
and "fat had bad"
would be the same as distance between "rat bat cat"
and "had fat bad"
, 4. In the case the number of words in the lists are not the same, the shorter list would be padded with 0-length words.
My intuition (which hasn't been nurtured with computer science classes) does not find any other solution than to use brute force:
|had|fat|bad| a solution
---+---+---+---+ +---+---+---+
cat| 2 | 1 | 2 | | | 1 | |
---+---+---+---+ +---+---+---+
rat| 2 | 1 | 2 | | 3 | | |
---+---+---+---+ +---+---+---+
bat| 2 | 1 | 1 | | | | 4 |
---+---+---+---+ +---+---+---+
Starting from the first row, pick a column and go to the next rows without ever revisiting a column you have already visited. Do this over and over again until you've tried all combinations.
To me this sounds a bit like the traveling salesman problem. Is it, and how would you solve my particular problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如您在左侧网格中所示的那样,您可以从计算每对单词的编辑距离开始。这可以在多项式时间内轻松完成(n^2 编辑距离计算)。
那么您的问题可以描述为“最小加权二分匹配”,或者等效地,“最大加权二分匹配”。这也可以在多项式时间内完成(比旅行推销员更快)。请参阅http://en.wikipedia.org/wiki/Matching_%28graph_theory%29 #Maximum_matchings_in_bipartite_graphs
As you already showed in the grid on the left, you can start by calculating the edit distances for every pair of words. This is easily done in polynomial time (n^2 edit distance calculations).
Then your problem can be described as a "minimum weighted bipartite matching", or equivalently, a "maximum weighted bipartite matching". This can also be done in polynomial time (faster than traveling salesman). See http://en.wikipedia.org/wiki/Matching_%28graph_theory%29#Maximum_matchings_in_bipartite_graphs
这看起来是突破 Levenshtein 距离算法 的绝佳机会。该算法将完全满足您的要求(两个字符串之间的最小距离),并且它也非常高效。至于它是旅行推销员的变体,那就不行,因为由于问题的结构,这可以在多项式时间内解决。希望这有帮助。
This looks like a perfect oportunity to break out the Levenshtein distance algorithm. This algorithm will do exactly what you are looking for (the minimal distance between two strings) and it is pretty efficient too. As far as it being a variation of traveling salesman that would be a no because this can be solved in polynomial time due to the structure of the problem. Hope this helps.