使用Data.Memocombinators实现编辑距离算法

发布于 2024-10-26 20:27:09 字数 915 浏览 16 评论 0原文

假设我想为 Levensthein 距离（编辑距离）实现常用的动态规划算法。提出递归非常容易：

editDistance [] ys = length ys
editDistance xs [] = length xs
editDistance (x:xs) (y:ys) 
  | x == y = editDistance xs ys
  | otherwise = minimum [
      1 + editDistance xs (y:ys),
      1 + editDistance (x:xs) ys,
      1 + editDistance xs ys]

这会导致运行时间呈指数级增长，因此有必要记住该函数。我想通过使用 Data.Memocombinators 来做到这一点，并且我尝试了多种方法。这是我当前的尝试：

import qualified Data.MemoCombinators as M

memLength str = 
  M.wrap 
    (\i -> drop i str) 
    (\xs -> length str - length xs)
    (M.arrayRange (0,length str))

elm xs ys = (M.memo2 memListx memListy editDistance) xs ys
  where
    memListx = memLength xs
    memListy = memLength ys

然而，记忆化似乎没有任何效果，尽管我希望任何记忆化都会对运行时间有显着的改进，因为它至少是多项式的。我的实施有什么问题吗？如何获得良好的运行时间，同时尽可能保留编辑距离的高级定义？

原文

Lets say I wanted to implement the usual dynamic programming algorithm for Levensthein distance (edit distance). It is quite easy to come up with the recursion:

editDistance [] ys = length ys
editDistance xs [] = length xs
editDistance (x:xs) (y:ys) 
  | x == y = editDistance xs ys
  | otherwise = minimum [
      1 + editDistance xs (y:ys),
      1 + editDistance (x:xs) ys,
      1 + editDistance xs ys]

This suffers from exponential running time, so it is necessary to memoize the function. I want to do so by using Data.Memocombinators, and I have tried several ways. Here is my current attempt:

import qualified Data.MemoCombinators as M

memLength str = 
  M.wrap 
    (\i -> drop i str) 
    (\xs -> length str - length xs)
    (M.arrayRange (0,length str))

elm xs ys = (M.memo2 memListx memListy editDistance) xs ys
  where
    memListx = memLength xs
    memListy = memLength ys

However, the memoization seems not to have any effect, although I would expect any memoization to have a noticeable improvement to running time since it would at least be polynomial. What is wrong with my implementation? How can I get an ok running time while preserving as much as possible of the high level definition of edit distance?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

樱花落人离去 2024-11-02 20:27:09

如果您发布的代码实际上就是您正在做的事情，那么您就做错了！:-)。如果要记忆递归函数，则需要将对递归版本的调用回调到记忆版本中。例如：

editDistance (x:xs) (y:ys)
  | x == y = elm xs ys
  | ...

否则，您将执行完整的递归计算并仅存储最终结果。您需要存储中间结果。

这里还有另一个问题。 elm 的备忘录表不应依赖于它的参数（理想情况下，您甚至不应该提及这些参数，因此您不依赖于足够智能的编译器来找出依赖项）。如果备忘录表取决于参数，那么您必须为每个不同的参数构建一个新表，并且我们需要为所有参数共享一个表。您可以尝试一些愚蠢的事情，例如记住参数的整个结构：

elm = M.memo2 (M.list M.char) (M.list M.char)

看看这是否会加快速度（结合前一个技巧）。然后，您可以继续尝试仅使用长度而不是整个列表结构来获得额外的提升。

希望有帮助。

If the code you posted is actually what you're doing, then you're doing it wrong! :-). If you are going to memoize a recursive function, you need to have the calls to the recursive version call back into the memoized version. So eg.:

editDistance (x:xs) (y:ys)
  | x == y = elm xs ys
  | ...

Otherwise you do the full recursive computation and only store the final result. You need to store intermediate results.

There is another issue here. The memo table for elm should not depend on its arguments (ideally you shouldn't even mention the arguments, so you don't depend on the compiler being smart enough to figure out the dependencies). If the memo table depends on the arguments, then you have to build a new table for each different argument, and we need to share a table for all arguments. You could try something dumb like memoizing on the whole structure of the arguments:

elm = M.memo2 (M.list M.char) (M.list M.char)

See if that speeds it up (incorporating the former trick). Then you can move on to trying to use just the lengths instead of the entire list structure for an extra boost.

Hope that helped.

回复收藏 0 原文

~没有更多了~