Haskell：常见的核心递归谬误

发布于 2024-11-14 12:33:02 字数 645 浏览 3 评论 0原文

所以今晚我正在考虑图距离算法，并想出了这个当我开车的时候：

module GraphDistance where
import Data.Map

distance :: (Ord a) => a -> Map a [a] -> Map a Int
distance a m = m' 
  where m' = mapWithKey d m
        d a' as = if a == a' then 0 else 1 + minimum (Prelude.map (m'!) as)

一开始我还挺自豪的，因为它看起来很优雅。但后来我意识到这是行不通的 - corecursion 可能会卡住循环中。

我必须编写代码来说服自己：

ghci> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,^CInterrupted.

但现在我想我已经想清楚了。

是否有常见错误和反模式的列表当处理我可以阅读的核心数据时，这样我就可以训练我的大脑进行核心递归思考？经验很好地训练了我通过非核心递归进行思考数据，但如果可以的话，我想从别人的错误中吸取教训。

原文

So I was thinking about a graph distance algorithm tonight, and came up with this
while I was driving in the car:

module GraphDistance where
import Data.Map

distance :: (Ord a) => a -> Map a [a] -> Map a Int
distance a m = m' 
  where m' = mapWithKey d m
        d a' as = if a == a' then 0 else 1 + minimum (Prelude.map (m'!) as)

At first, I was fairly proud of myself, since it seemed so elegant.
But then I realized it wouldn't work - the corecursion might get stuck
in a loop.

I had to code it up to convince myself:

ghci> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,^CInterrupted.

But now I think I've pretty much thought it through.

Is there a list of common errors and anti-patterns
when dealing with corecursive data that I can go read up on,
so I can train my brain to think corecursively? Experience
has trained me fairly well for thinking through non-corecursive
data, but I'd like to learn from others mistakes if I can.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你在我安 2024-11-21 12:33:02

好吧，在处理核心递归数据时，实际上只有一个根本性错误，那就是不小心使用了递归！

核心递归意味着数据在某种意义上是增量生成的。您的图形距离函数在这里很有启发性，因为它似乎应该起作用，因此请考虑增量部分应该在哪里：起点是从节点到自身的距离 0，否则大于 1比它自己的近邻之间的最小距离。因此，我们期望每个距离值都是增量的，这意味着我们需要它们本身具有适当的核心递归性。

那么，所讨论的递归是由于 (+) 和 minimum 的组合而发生的：当找到最小值时，1 将始终是小于 1 + n，无需担心 n 是什么...但是如果不计算整个 Int，就无法比较 Int价值。

简而言之，该算法期望能够仅根据需要比较 (1 +) 应用于 0 的次数；也就是说，它想要使用“零”和“后继”定义的惰性自然数。

看哪：

data Nat = Z | S Nat

natToInt :: Nat -> Int
natToInt Z = 0
natToInt (S n) = 1 + natToInt n

instance Show Nat where show = show . natToInt

instance Eq Nat where
    Z == Z = True
    (S n1) == (S n2) = n1 == n2
    _ == _ = False

    Z /= Z = False
    (S n1) /= (S n2) = n1 /= n2
    _ /= _ = True


instance Ord Nat where
    compare Z Z = EQ
    compare Z (S _) = LT
    compare (S _) Z = GT
    compare (S n1) (S n2) = compare n1 n2

然后在 GHCi 中：

> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,1),(3,2)]

证明你的算法有效^[0]；你对它的实施是不正确的。

现在，作为一个细微的变化，让我们将您的算法应用于不同的图表：

> distance 1 $ fromList [(0,[1]),(1,[0]),(2,[3]),(3,[2])]

...我们期望它做什么？节点 2 或 3 距节点 1 的距离是多少？

在 GHCi 中运行它会得到明显的结果：

fromList [(0,1),(1,0),(2,^CInterrupted.

尽管如此，算法在此图上运行正常。你能看出问题所在吗？为什么它挂在 GHCi 中？

总之，您需要清楚地区分两种不能自由混合的形式：

惰性数据，可能是无限数据，以核心
递归方式生成有限数据，以递归方式使用

两种形式都可以以结构无关的方式进行转换（例如，map 适用于有限和无限列表）。 Codata 可以在 corecursive 算法的驱动下增量消耗；数据可以递归地生成，受递归算法的限制。

你不能做的是递归地使用codata（例如，向左折叠无限列表）或以核心方式生成数据（由于懒惰，在Haskell中很少见）。

[0]：实际上，它会在某些输入上失败（例如，一些断开连接的图表），但这是一个不同的问题。

Well, there's really only one fundamental mistake when dealing with corecursive data, and that's carelessly using recursion on it!

Corecursion implies that data is being generated incrementally in some sense. Your graph distance function is instructive here, because it seems like it should work, so think about where the incremental part should be: The starting point is a distance of 0 from a node to itself, otherwise one greater than the minimum distance among its own immediate neighbors. Thus, we would expect each distance value to be incremental, which means we need them to be suitably corecursive themselves.

The recursion at issue, then, is occurring due to the combination of (+) and minimum: when finding the minimum, 1 will always be less than 1 + n, without needing to worry about what n is... but there's no way to compare Ints without computing the entire value.

In short, the algorithm expects to be able to compare how many times (1 +) was applied to 0 only as far as needed; that is to say, it wants lazy natural numbers defined using "zero" and "successor".

Behold:

data Nat = Z | S Nat

natToInt :: Nat -> Int
natToInt Z = 0
natToInt (S n) = 1 + natToInt n

instance Show Nat where show = show . natToInt

instance Eq Nat where
    Z == Z = True
    (S n1) == (S n2) = n1 == n2
    _ == _ = False

    Z /= Z = False
    (S n1) /= (S n2) = n1 /= n2
    _ /= _ = True


instance Ord Nat where
    compare Z Z = EQ
    compare Z (S _) = LT
    compare (S _) Z = GT
    compare (S n1) (S n2) = compare n1 n2

And then in GHCi:

> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,1),(3,2)]

Proof that your algorithm works^[0]; your implementation of it was just incorrect.

Now, as a slight variation, let's apply your algorithm to a different graph:

> distance 1 $ fromList [(0,[1]),(1,[0]),(2,[3]),(3,[2])]

...what do we expect this to do? What is the distance from node 1 for nodes 2 or 3?

Running it in GHCi has the obvious result:

fromList [(0,1),(1,0),(2,^CInterrupted.

Nevertheless, the algorithm works correctly on this graph. Can you see the problem? Why did it hang in GHCi?

In summary, you need to clearly distinguish between two forms that can't be mixed freely:

Lazy, possibly infinite data, generated corecursively
Finite data, consumed recursively

Both forms can be transformed in a structure-agnostic way (e.g., map works on finite and infinite lists both). Codata can be consumed incrementally, driven by a corecursive algorithm; data can be generated recursively, limited by a recursive algorithm.

What you can't do is consume codata recursively (e.g., left folding an infinite list) or generate data corecursively (rare in Haskell, due to laziness).

[0]: Actually, it will fail on some inputs (e.g., some disconnected graphs), but that's a different issue.

回复收藏 0 原文

~没有更多了~