如何找到最大的词梯?
我想找到给定字典的最大长度的词梯。 词梯是一个词序列,其中每个词仅在一个位置与前一个词不同。
我将实现以下算法:
- 从字典中读取单词,并按
- 每个组的长度对它们进行分组,创建一个
map
,它将每个单词映射到其他单词,这些单词仅在以下方面有所不同 每个组的一个位置(此地图
是一个以邻接列表实现的图形) - 找到每对“节点”之间的最短路径 -
地图中的单词
(使用bfs
) - 找到最大最短路径。
我猜算法是有效的。现在我想知道从性能的角度来看它是否是最佳的。你会如何优化它?
I would like to find the word ladder of maximal length for a given dictionary. A word ladder is a sequence of words such that each word differs from the previous word in only one position.
I am going to implement the following algorithm:
- read the words from the dictionary and group them by their length
- for each group create a
map
, which maps the each word to other words, which differ from it in only one position (thismap
is a graph implemented as adjacency lists) - for each group find the shortest paths between every pair of "nodes" -- words in the
map
(usingbfs
) - find the maximal shortest path.
I guess the algorithm works. Now I wonder if it is optimal from the performance point of view. How would you optimize it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据您的要求,我们问题的答案会有很大差异。
您在此处发布的算法将为您提供其之间具有最长最短词梯的一对单词。这称为字图的直径。如果你想找到这个,你这里的方法应该没问题。
如果你想真正找到字典中最长的词梯(也就是说,你想找到一次仅相差一个字母的最长单词链),那么如果你排除其中有循环的词梯,问题是NP-hard(通过哈密顿路径问题的简化),意味着推测没有有效的算法来解决该问题。您可能必须通过列出所有可能的单词梯来强制搜索单词列表。不幸的是,这在任何合理大小的字典中都是完全不可行的,因为要尝试的梯子数量显然是荒谬的。
简而言之,如果您正在寻找图形直径,那么您的解决方案非常好。如果您正在寻找实际最长的词梯,那么您很可能永远无法得到答案,因为搜索空间太大,并且已知该问题在理论上很困难。
希望这有帮助!
The answer to us question varies greatly depending on what you're asking for.
The algorithm you have posted here will give you the pair of words that have the longest shortest word ladder between them. This is called the diameter of the word graph. If you want to find this, the approach you have here should be fine.
If you want to actually find the longest word ladder in the dictionary (that is, you want to find the longest chain of words that differ only by one letter at a time), then if you exclude word ladders with cycles in them the problem is NP-hard (by a reduction from the Hamiltonian path problem), meaning that it is conjectured that there is no efficient algorithm for solving the problem. You would probably have to brute-force search he word list by listing off all possible word ladders. This, unfortunately, is completely infeasible in any reasonably-sized dictionary there would be a patently absurd number of ladders to try.
In short, if you are looking for the graph diameter, your solution is pretty good. If you are looking to find the actual longest word ladder, then chances are that you will never be able to get the answer, since the search space is too huge and the problem is known to be theoretically difficult.
Hope this helps!
只是为了好玩,这里有一种计算图直径的方法,对于有一些非常长的最短路径的稀疏图来说,该方法可能会更快。
1) 计算图的最小生成树。
2) 最小生成树的直径可以在两个BFS中找到。从树上的任意点开始第一个。从树中距第一个点最远的点开始第二个点。这是可行的,因为如果您为树分配任意根,则直径是距根的两个最长距离的总和,并且您的第一个 BFS 会找到其中一个。
3) 沿最小生成树直径的一半分配一个相当任意的根。从点 X 开始的直径的上限是 X 到该根的距离与任何节点到该根的最大距离之和。这只是一个上限,因为两个节点之间的最短距离不一定遵循最小生成树。
4) 按照最小生成树中距根的距离的非递增顺序从节点进行 BFS 搜索。在每次搜索开始时,您都会从该节点开始的图直径树中得到一个上限。当上限不大于迄今为止找到的最大直径时,您可以停止进行 BFS 搜索。
Just for fun, here is a way to compute the diameter of a graph that might be faster for sparse graphs where there are a few very long shortest paths.
1) Compute the minimum spanning tree for the graph.
2) The diameter of the minimum spanning tree can be found in two BFSs. Start the first from an arbitrary point on the tree. Start the second from a point which is a furthest point in the tree from the first point. This works because if you assign an arbitrary root to the tree, the diameter is the sum of two longest distances from the root, and your first BFS finds one of them.
3) Assign an fairly arbitrary root halfway along the diameter of the minimum spanning tree. An upper bound to a diameter starting from point X is the sum of the distance of X from that root and maximum distance of any node from that root. This is only an upper bound, because the shortest distance between two nodes does not necessarily follow the minimum spanning tree.
4) Do BFS searchs from nodes in non-increasing order of their distance from root in the minimum spanning tree. At the beginning of each search you have an upper bound from the tree of a graph diameter starting at that node. You can stop doing BFS searches when that upper bound is no greater than the largest diameter found so far.
Mathematica 中有此任务的完整实现,背景信息位于
http://blog.wolfram.com/2012/01/11/the-longest-word-ladder-puzzle-ever/
There is a full implementation of this task in Mathematica, with background information at
http://blog.wolfram.com/2012/01/11/the-longest-word-ladder-puzzle-ever/