这是旅行推销员问题的变体吗？

发布于 2024-08-29 23:50:53 字数 771 浏览 18 评论 0原文

我对两个单词列表的函数感兴趣，它将返回它们之间的顺序无关的编辑距离。

也就是说，参数将是两个单词列表（假设以空格分隔），返回值将是列表中单词的编辑（或编辑）距离的最小总和。

"catratbat" 和 "ratbatcat" 之间的距离将为 0。"catratbat" 和 " 之间的距离fat had bad” 与 “rat bat cat” 和 “had fat bad” 之间的距离相同，4. 在这种情况下单词数列表中的内容不相同，较短的列表将用 0 长度的单词填充。

我的直觉（没有通过计算机科学课程培养）除了使用暴力之外没有找到任何其他解决方案：

   |had|fat|bad|   a solution
---+---+---+---+ +---+---+---+
cat| 2 | 1 | 2 | |   | 1 |   |
---+---+---+---+ +---+---+---+
rat| 2 | 1 | 2 | | 3 |   |   |
---+---+---+---+ +---+---+---+
bat| 2 | 1 | 1 | |   |   | 4 |
---+---+---+---+ +---+---+---+

从第一行开始，选择一列并转到下一行，而无需重新访问您已经访问过的列。一遍又一遍地这样做，直到尝试完所有组合为止。

对我来说，这听起来有点像旅行推销员问题。是吗？您将如何解决我的具体问题？

原文

I'm interested in a function of two word lists, which would return an order agnostic edit distance between them.

That is, the arguments would be two lists of (let's say space delimited) words and return value would be the minimum sum of the edit (or Levenshtein) distances of the words in the lists.

Distance between "cat rat bat" and "rat bat cat" would be 0. Distance between "cat rat bat" and "fat had bad" would be the same as distance between "rat bat cat" and "had fat bad", 4. In the case the number of words in the lists are not the same, the shorter list would be padded with 0-length words.

My intuition (which hasn't been nurtured with computer science classes) does not find any other solution than to use brute force:

   |had|fat|bad|   a solution
---+---+---+---+ +---+---+---+
cat| 2 | 1 | 2 | |   | 1 |   |
---+---+---+---+ +---+---+---+
rat| 2 | 1 | 2 | | 3 |   |   |
---+---+---+---+ +---+---+---+
bat| 2 | 1 | 1 | |   |   | 4 |
---+---+---+---+ +---+---+---+

Starting from the first row, pick a column and go to the next rows without ever revisiting a column you have already visited. Do this over and over again until you've tried all combinations.

To me this sounds a bit like the traveling salesman problem. Is it, and how would you solve my particular problem?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萝莉病 2024-09-05 23:50:53

正如您在左侧网格中所示的那样，您可以从计算每对单词的编辑距离开始。这可以在多项式时间内轻松完成（n^2 编辑距离计算）。

那么您的问题可以描述为“最小加权二分匹配”，或者等效地，“最大加权二分匹配”。这也可以在多项式时间内完成（比旅行推销员更快）。请参阅http://en.wikipedia.org/wiki/Matching_%28graph_theory%29 #Maximum_matchings_in_bipartite_graphs

回复收藏 0 原文