推断类型似乎检测到无限循环，但到底发生了什么？

发布于 2024-08-13 03:31:34 字数 1242 浏览 15 评论 0原文

在 Andrew Koenig 的关于 ML 类型推断的轶事中，作者使用合并排序的实现作为机器学习的学习练习，并很高兴发现一个“不正确”的结果类型推断。

令我惊讶的是，编译器报告了一种类型
'列表->整数列表
换句话说，此排序函数接受任何类型的列表并返回整数列表。
那是不可能的。输出必须是输入的排列；它怎么可能有不同的类型？读者肯定会发现我的第一冲动很熟悉：我想知道我是否发现了编译器中的错误！
经过更多思考，我意识到函数还有另一种方式可以忽略它的参数：也许它有时根本不返回。事实上，当我尝试时，确实发生了这样的情况：sort(nil) 确实返回了 nil，但是对任何非空列表进行排序将进入无限递归循环。

当翻译成 Haskell

split [] = ([], [])
split [x] = ([x], [])
split (x:y:xs) = (x:s1, y:s2)
  where (s1,s2) = split xs

merge xs [] = xs
merge [] ys = ys
merge xx@(x:xs) yy@(y:ys)
  | x < y     = x : merge xs yy
  | otherwise = y : merge xx ys

mergesort [] = []
mergesort xs = merge (mergesort p) (mergesort q)
  where (p,q) = split xs

GHC 时，推断出类似的类型：

*Main> :t mergesort
mergesort :: (Ord a) => [t] -> [a]

How does the Damas–Hindley–Milner 算法推断出这个类型？

原文

In Andrew Koenig’s An anecdote about ML type inference, the author uses implementation of merge sort as a learning exercise for ML and is pleased to find an “incorrect” type inference.

Much to my surprise, the compiler reported a type of
'a list -> int list
In other words, this sort function accepts a list of any type at all and returns a list of integers.
That is impossible. The output must be a permutation of the input; how can it possibly have a different type? The reader will surely find my first impulse familiar: I wondered if I had uncovered a bug in the compiler!
After thinking about it some more, I realized that there was another way in which the function could ignore its argument: perhaps it sometimes didn't return at all. Indeed, when I tried it, that is exactly what happened: sort(nil) did return nil, but sorting any non-empty list would go into an infinite recursion loop.

When translated to Haskell

split [] = ([], [])
split [x] = ([x], [])
split (x:y:xs) = (x:s1, y:s2)
  where (s1,s2) = split xs

merge xs [] = xs
merge [] ys = ys
merge xx@(x:xs) yy@(y:ys)
  | x < y     = x : merge xs yy
  | otherwise = y : merge xx ys

mergesort [] = []
mergesort xs = merge (mergesort p) (mergesort q)
  where (p,q) = split xs

GHC infers a similar type:

*Main> :t mergesort
mergesort :: (Ord a) => [t] -> [a]

How does the Damas–Hindley–Milner algorithm infer this type?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

昨迟人 2024-08-20 03:31:34

这确实是一个了不起的例子；本质上，在编译时检测到无限循环！在这个例子中，Hindley-Milner 推论没有什么特别之处；它就像往常一样进行。

请注意，ghc 正确获取 split 和 merge 的类型：

*Main> :t split
split :: [a] -> ([a], [a])
*Main> :t merge
merge :: (Ord t) => [t] -> [t] -> [t]

现在，当谈到 mergesort 时，它通常是一个函数 t< sub>1→t₂ 对于某些类型 t₁ 和 t₂。然后它看到第一行：

mergesort [] = []

并意识到 t₁ 和 t₂ 必须是列表类型，比如 t₁=[t₃] 和 t₂=[t₄]。所以归并排序必须是一个函数[t₃]→[t₄]。下一行

mergesort xs = merge (mergesort p) (mergesort q) 
  where (p,q) = split xs

告诉它：

xs 必须是 split 的输入，即对于某个 a 来说，其类型为 [a]（对于 a=t₃ 来说，它已经是这样了）。
所以 p 和 q 也是 [t₃] 类型，因为 split 是 [a]→([ a],[a])
mergesort p，因此，（回想一下，mergesort 被认为是 [t₃]→[t₄ 类型>]) 的类型为 [t₄]。
出于完全相同的原因，mergesort q 的类型为 [t₄]。
由于 merge 的类型为 (Ord t) =>; [t]-> [t]-> [t]，并且表达式 merge (mergesort p) (mergesort q) 中的输入均为 [t₄] 类型，即类型 t< sub>4 必须位于 Ord 中。
最后，merge (mergesort p) (mergesort q) 的类型与其两个输入相同，即 [t₄]。这符合之前已知的 mergesort 类型 [t₃]→[t₄]，因此无需进行更多推断Hindley-Milner 算法的“统一”部分已经完成。 mergesort 的类型为 [t₃]→[t₄]，其中 t₄ 在 Ord< /代码>。

这就是为什么你会得到：（

*Main> :t mergesort 
mergesort :: (Ord a) => [t] -> [a]

上面的逻辑推理描述等同于算法所做的事情，但是算法遵循的具体步骤顺序只是维基百科页面上给出的，例如。）

This is indeed a remarkable example; an infinite loop being detected, essentially, at compile time! There is nothing special about the Hindley–Milner inference in this example; it just proceeds as usual.

Note that ghc gets the types of split and merge correctly:

*Main> :t split
split :: [a] -> ([a], [a])
*Main> :t merge
merge :: (Ord t) => [t] -> [t] -> [t]

Now when it comes to mergesort, it is, in general, a function t₁→t₂ for some types t₁ and t₂. Then it sees the first line:

mergesort [] = []

and realizes that t₁ and t₂ must be list types, say t₁=[t₃] and t₂=[t₄]. So mergesort must be a function [t₃]→[t₄]. The next line

mergesort xs = merge (mergesort p) (mergesort q) 
  where (p,q) = split xs

tells it that:

xs must be an input to split, i.e., of type [a] for some a (which it already is, for a=t₃).
So p and q are also of type [t₃], since split is [a]→([a],[a])
mergesort p, therefore, (recall that mergesort is believed to be of type [t₃]→[t₄]) is of type [t₄].
mergesort q is of type [t₄] for exactly the same reason.
As merge has type (Ord t) => [t] -> [t] -> [t], and the inputs in the expression merge (mergesort p) (mergesort q) are both of type [t₄], the type t₄ must be in Ord.
Finally, the type of merge (mergesort p) (mergesort q) is the same as both its inputs, namely [t₄]. This fits with the previously known type [t₃]→[t₄] for mergesort, so there are no more inferences to be done and the "unification" part of the Hindley–Milner algorithm is complete. mergesort is of type [t₃]→[t₄] with t₄ in Ord.

That's why you get:

*Main> :t mergesort 
mergesort :: (Ord a) => [t] -> [a]

(The description above in terms of logical inference is equivalent to what the algorithm does, but the specific sequence of steps the algorithm follows is simply that given on the Wikipedia page, for example.)

回复收藏 0 原文