为什么第一个 Haskell 函数无法处理无限列表，而第二个片段却成功处理无限列表？

发布于 2024-07-21 06:41:11 字数 3512 浏览 5 评论 0原文

我有两个 Haskell 函数，这两个函数对我来说都非常相似。但是第一个在无限列表上失败了，第二个在无限列表上成功了。我花了好几个小时试图弄清楚到底为什么会这样，但无济于事。

两个片段都是 Prelude 中“words”功能的重新实现。两者都可以很好地处理有限列表。

这是不处理无限列表的版本：

myWords_FailsOnInfiniteList :: String -> [String]
myWords_FailsOnInfiniteList string = foldr step [] (dropWhile charIsSpace string)
   where 
      step space ([]:xs)      | charIsSpace space = []:xs    
      step space (x:xs)       | charIsSpace space = []:x:xs
      step space []           | charIsSpace space = []
      step char (x:xs)                            = (char : x) : xs
      step char []                                = [[char]]

这是处理无限列表的版本：

myWords_anotherReader :: String -> [String]
myWords_anotherReader xs = foldr step [""] xs
   where 
      step x result | not . charIsSpace $ x = [x:(head result)]++tail result
                    | otherwise             = []:result

注意：“charIsSpace”仅仅是 Char.isSpace 的重命名。

以下解释器会话说明第一个解释器会话针对无限列表失败，而第二个解释器会话成功。

*Main> take 5 (myWords_FailsOnInfiniteList  (cycle "why "))
*** Exception: stack overflow

*Main> take 5 (myWords_anotherReader (cycle "why "))
["why","why","why","why","why"]

编辑：感谢下面的回复，我相信我现在明白了。以下是我的结论和修改后的代码：

结论：

我第一次尝试中最大的罪魁祸首是以“step space []”和“step char []”开头的两个方程。 将阶跃函数的第二个参数与 [] 相匹配是不允许的，因为它会强制评估整个第二个参数（但有一个警告将在下面解释）。
在某一时刻，我曾认为 (++) 可能会以某种方式晚于 cons 评估其右侧参数。所以，我想我可以通过将“ = (char:x):xs”更改为“= [char : x] ++ xs”来解决问题。但这是不正确的。
在某一时刻，我认为将第二个参数与 (x:xs) 匹配的模式会导致函数针对无限列表失败。我关于这一点几乎是正确的，但不完全正确！正如我在上面的模式匹配中所做的那样，根据 (x:xs) 评估第二个参数，将导致一些递归。它将“转动曲柄”直到碰到“：”（又名“cons”）。如果这种情况从未发生过，那么我的函数将无法针对无限列表成功。然而，在这种特殊情况下，一切都很好，因为我的函数最终会遇到空格，此时会出现“cons”。并且由匹配 (x:xs) 触发的评估将立即停止，避免无限递归。此时，“x”将被匹配，但 xs 仍将是一个 thunk，所以没有问题。（感谢 Ganesh 真正帮助我理解了这一点）。
一般来说，您可以提及您想要的第二个参数，只要您不强制评估它即可。如果您已与 x:xs 进行匹配，那么您可以随意提及 xs，只要您不强制对其进行评估即可。

所以，这是修改后的代码。我通常会尽量避免使用 head 和 tail，仅仅因为它们是部分函数，而且还因为我需要练习编写模式匹配等效项。

myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step space acc | charIsSpace space = "":acc
      step char (x:xs)                   = (char:x):xs
      step _ []                          = error "this should be impossible"

这可以正确地应对无限列表。请注意，看不到头、尾或 (++) 运算符。

现在，有一个重要的警告： 当我第一次编写更正的代码时，我没有与“step _ []”匹配的第三个方程。结果，我收到了有关非详尽模式匹配的警告。显然，避免该警告是个好主意。

但我以为我会遇到问题。我上面已经提到不能将第二个参数与 [] 进行模式匹配。但我必须这样做才能消除警告。

但是，当我添加“step_[]”方程时，一切都很好！无限列表仍然没有问题！。为什么？

因为修正后的代码中的第三个方程从未达到！

事实上，请考虑以下损坏的版本。它与正确的代码完全相同，只是我已将空列表的模式移至其他模式之上：

myWords_brokenAgain :: String -> [String]
myWords_brokenAgain string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step _ []                              = error "this should be impossible"
      step space acc | charIsSpace space     = "":acc
      step char (x:xs)                       = (char:x):xs

我们回到堆栈溢出，因为调用 step 时发生的第一件事是解释器检查看看等式一是否匹配。为此，它必须查看第二个参数是否为 []。为此，它必须评估第二个参数。

将方程移到其他方程下方可确保永远不会尝试第三个方程，因为第一个或第二个模式总是匹配。第三个方程只是为了免除非详尽的模式警告。

这是一次很棒的学习经历。感谢大家的帮助。

原文

I have two Haskell functions, both of which seem very similar to me. But the first one FAILS against infinite lists, and the second one SUCCEEDS against infinite lists. I have been trying for hours to nail down exactly why that is, but to no avail.

Both snippets are a re-implementation of the "words" function in Prelude. Both work fine against finite lists.

Here's the version that does NOT handle infinite lists:

myWords_FailsOnInfiniteList :: String -> [String]
myWords_FailsOnInfiniteList string = foldr step [] (dropWhile charIsSpace string)
   where 
      step space ([]:xs)      | charIsSpace space = []:xs    
      step space (x:xs)       | charIsSpace space = []:x:xs
      step space []           | charIsSpace space = []
      step char (x:xs)                            = (char : x) : xs
      step char []                                = [[char]]

Here's the version that DOES handle infinite lists:

myWords_anotherReader :: String -> [String]
myWords_anotherReader xs = foldr step [""] xs
   where 
      step x result | not . charIsSpace $ x = [x:(head result)]++tail result
                    | otherwise             = []:result

Note: "charIsSpace" is merely a renaming of Char.isSpace.

The following interpreter session illustrates that the first one fails against an infinite list while the second one succeeds.

*Main> take 5 (myWords_FailsOnInfiniteList  (cycle "why "))
*** Exception: stack overflow

*Main> take 5 (myWords_anotherReader (cycle "why "))
["why","why","why","why","why"]

EDIT: Thanks to the responses below, I believe I understand now. Here are my conclusions and the revised code:

Conclusions:

The biggest culprit in my first attempt were the 2 equations that started with "step space []" and "step char []". Matching the second parameter of the step function against [] is a no-no, because it forces the whole 2nd arg to be evaluated (but with a caveat to be explained below).
At one point, I had thought (++) might evaluate its right-hand argument later than cons would, somehow. So, I thought I might fix the problem by changing " = (char:x):xs" to "= [char : x] ++ xs". But that was incorrect.
At one point, I thought that pattern matching the second arg against (x:xs) would cause the function to fail against infinite lists. I was almost right about this, but not quite! Evaluating the second arg against (x:xs), as I do in a pattern match above, WILL cause some recursion. It will "turn the crank" until it hits a ":" (aka, "cons"). If that never happened, then my function would not succeed against an infinite list. However, in this particular case, everything is OK because my function will eventually encounter a space, at which point a "cons" will occur. And the evaluation triggered by matching against (x:xs) will stop right there, avoiding the infinite recursion. At that point, the "x" will be matched, but the xs will remain a thunk, so there's no problem. (Thanks to Ganesh for really helping me grasp that).
In general, you can mention the second arg all you want, as long as you don't force evaluation of it. If you've matched against x:xs, then you can mention xs all you want, as long as you don't force evaluation of it.

So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.

myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step space acc | charIsSpace space = "":acc
      step char (x:xs)                   = (char:x):xs
      step _ []                          = error "this should be impossible"

This correctly works against infinite lists. Note there's no head, tail or (++) operator in sight.

Now, for an important caveat:
When I first wrote the corrected code, I did not have the 3rd equation, which matches against "step _ []". As a result, I received the warning about non-exhaustive pattern matches. Obviously, it is a good idea to avoid that warning.

But I thought I was going to have a problem. I already mentioned above that it is not OK to pattern match the second arg against []. But I would have to do so in order to get rid of the warning.

However, when I added the "step _ []" equation, everything was fine! There was still no problem with infinite lists!. Why?

Because the 3rd equation in the corrected code IS NEVER REACHED!

In fact, consider the following BROKEN version. It is EXACTLY the SAME as the correct code, except that I have moved the pattern for empty list up above the other patterns:

myWords_brokenAgain :: String -> [String]
myWords_brokenAgain string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step _ []                              = error "this should be impossible"
      step space acc | charIsSpace space     = "":acc
      step char (x:xs)                       = (char:x):xs

We're back to stack overflow, because the first thing that happens when step is called is that the interpreter checks to see if equation number one is a match. To do so, it must see if the second arg is []. To do that, it must evaluate the second arg.

Moving the equation down BELOW the other equations ensures that the 3rd equation is never attempted, because either the first or the second pattern always matches. The 3rd equation is merely there to dispense with the non-exhaustive pattern warning.

This has been a great learning experience. Thanks to everyone for your help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

输什么也不输骨气 2024-07-28 06:41:11

其他人指出了这个问题，即该步骤总是在产生任何输出之前评估其第二个参数，但当foldr应用于无限列表时，其第二个参数最终将取决于步骤的另一次调用的结果。

它不必以这种方式编写，但您的第二个版本有点丑陋，因为它依赖于具有特定格式的步骤的初始参数，并且很难看出头/尾永远不会出错。（我什至不能 100% 确定它们不会！）

您应该做的是重构第一个版本，以便它至少在某些情况下生成输出而不依赖于输入列表。特别是，我们可以看到，当字符不是空格时，输出列表中始终至少有一个元素。因此，将第二个参数的模式匹配延迟到生成第一个元素之后。字符是空格的情况仍然取决于列表，但这很好，因为这种情况可以无限递归的唯一方法是如果您传入无限的空格列表，在这种情况下不会产生任何输出并进入循环是单词的预期行为（它还能做什么？）

回复收藏 0 原文

滥情哥ㄟ 2024-07-28 06:41:11

尝试手动扩展表达式：

 take 5 (myWords_FailsOnInfiniteList  (cycle "why "))
 take 5 (foldr step [] (dropWhile charIsSpace (cycle "why ")))
 take 5 (foldr step [] (dropWhile charIsSpace ("why " ++ cycle "why ")))
 take 5 (foldr step [] ("why " ++ cycle "why "))
 take 5 (step 'w' (foldr step [] ("hy " ++ cycle "why ")))
 take 5 (step 'w' (step 'h' (foldr step [] ("y " ++ cycle "why "))))

下一个扩展是什么？您应该看到，为了对 step 进行模式匹配，您需要知道它是否为空列表。为了找到答案，你必须对其进行评估，至少进行一点评估。但第二项恰好是您要进行模式匹配的函数的 foldr 缩减。换句话说，step 函数在不调用自身的情况下无法查看其参数，因此您有无限递归。

将其与第二个函数的扩展进行对比：

myWords_anotherReader (cycle "why ")
foldr step [""] (cycle "why ")
foldr step [""] ("why " ++ cycle "why ")
step 'w' (foldr step [""] ("hy " ++ cycle "why ")
let result = foldr step [""] ("hy " ++ cycle "why ") in
    ['w':(head result)] ++ tail result
let result = step 'h' (foldr step [""] ("y " ++ cycle "why ") in
    ['w':(head result)] ++ tail result

您可能会看到此扩展将继续，直到达到一个空间。一旦到达一个空格，“head result”将获得一个值，并且您将产生答案的第一个元素。

我怀疑第二个函数对于不包含任何空格的无限字符串会溢出。你能明白为什么吗？

Try expanding the expression by hand:

 take 5 (myWords_FailsOnInfiniteList  (cycle "why "))
 take 5 (foldr step [] (dropWhile charIsSpace (cycle "why ")))
 take 5 (foldr step [] (dropWhile charIsSpace ("why " ++ cycle "why ")))
 take 5 (foldr step [] ("why " ++ cycle "why "))
 take 5 (step 'w' (foldr step [] ("hy " ++ cycle "why ")))
 take 5 (step 'w' (step 'h' (foldr step [] ("y " ++ cycle "why "))))

What's the next expansion? You should see that in order to pattern match for step, you need to know whether it's the empty list or not. In order to find that out, you have to evaluate it, at least a little bit. But that second term happens to be a foldr reduction by the very function you're pattern matching for. In other words, the step function cannot look at its arguments without calling itself, and so you have an infinite recursion.

Contrast that with an expansion of your second function:

myWords_anotherReader (cycle "why ")
foldr step [""] (cycle "why ")
foldr step [""] ("why " ++ cycle "why ")
step 'w' (foldr step [""] ("hy " ++ cycle "why ")
let result = foldr step [""] ("hy " ++ cycle "why ") in
    ['w':(head result)] ++ tail result
let result = step 'h' (foldr step [""] ("y " ++ cycle "why ") in
    ['w':(head result)] ++ tail result

You can probably see that this expansion will continue until a space is reached. Once a space is reached, "head result" will obtain a value, and you will have produced the first element of the answer.

I suspect that this second function will overflow for infinite strings that don't contain any spaces. Can you see why?

回复收藏 0 原文

初相遇 2024-07-28 06:41:11

第二个版本实际上不会评估结果，直到之后它开始生成自己的部分答案。第一个版本通过模式匹配立即评估结果。

这些无限列表的关键是，在开始要求列表元素之前，您必须生成一些东西，以便输出始终“领先于”输入。

（我觉得这个解释不是很清楚，但这是我能做的最好的。）

回复收藏 0 原文

揽清风入怀 2024-07-28 06:41:11

库函数 foldr 具有以下实现（或类似实现）：

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f k (x:xs) = f x (foldr f k xs)
foldr _ k _ = k

myWords_FailsOnInfiniteList 的结果取决于 foldr 的结果，而 foldr 的结果又取决于 < code>step 取决于内部 foldr 的结果，而内部 foldr 取决于...等等无限列表，myWords_FailsOnInfiniteList 将用完无限量在产生第一个词之前先考虑空间和时间。

myWords_anotherReader 中的 step 函数在生成第一个单词的第一个字母之前不需要内部 foldr 的结果。不幸的是，正如 Apocalisp 所说，它在生成下一个单词之前使用 O(第一个单词的长度) 空间，因为在生成第一个单词时，尾部 thunk 不断增长 tail ([...] ++ tail ([...] ++ tail (...)))。

相反，与

myWords :: String -> [String]
myWords = myWords' . dropWhile isSpace where
    myWords' [] = []
    myWords' string =
        let (part1, part2) = break isSpace string
        in part1 : myWords part2

使用可以定义为的库函数相比，

break :: (a -> Bool) -> [a] -> ([a], [a])
break p = span $ not . p

span :: (a -> Bool) -> [a] -> ([a], [a])
span p xs = (takeWhile p xs, dropWhile p xs)

takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p (x:xs) | p x = x : takeWhile p xs
takeWhile _ _ = []

dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p (x:xs) | p x = dropWhile p xs
dropWhile _ xs = xs

请注意，生成中间结果永远不会被未来的计算所阻碍，并且只需要 O(1) 空间，因为结果的每个元素都可供使用。

附录

所以，这是修改后的代码。我通常会尽量避免使用 head 和 tail，仅仅因为它们是部分函数，而且还因为我需要练习编写模式匹配等效项。
myWords :: 字符串 ->   [细绳] 
  myWords 字符串=foldr 步骤[""] (dropWhile charIsSpace 字符串) 
     在哪里  
        步距 acc |   charIsSpace 空格 = "":acc 
        步骤 char (x:xs) = (char:x):xs 
        步骤_[]=错误“这应该是不可能的” 
  

（旁白：您可能不关心，但是库中的 words "" == []，但是您的 myWords "" = [""]。尾随的类似问题空格。）

看起来比 myWords_anotherReader 有很大改进，对于基于 foldr 的解决方案来说非常好。

\n -> tail $ myWords $ replicate n 'a' ++ " b"

不可能比 O(n) 时间做得更好，但 myWords_anotherReader 和 myWords 在这里都占用 O(n) 空间。考虑到使用foldr，这可能是不可避免的。

更糟糕的是，

\n -> head $ head $ myWords $ replicate n 'a' ++ " b"

myWords_anotherReader 的复杂度为 O(1)，但新的 myWords 的复杂度为 O(n)，因为模式匹配 (x:xs) 需要进一步的复杂度。结果。

来解决此问题

myWords :: String -> [String]
myWords = foldr step [""] . dropWhile isSpace
   where 
      step space acc | isSpace space = "":acc
      step char ~(x:xs)              = (char:x):xs

您可以通过~ 引入“无可辩驳的模式” 。无可辩驳的模式永远不会失败，也不会强制立即评估。

The library function foldr has this implementation (or similar):

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f k (x:xs) = f x (foldr f k xs)
foldr _ k _ = k

The result of myWords_FailsOnInfiniteList depends on the result of foldr which depends on the result of step which depends on the result of the inner foldr which depends on ... and so on an infinite list, myWords_FailsOnInfiniteList will use up an infinite amount of space and time before producing its first word.

The step function in myWords_anotherReader does not require the result of the inner foldr until after it has produced the first letter of the first word. Unfortunately, as Apocalisp says, it uses O(length of first word) space before it produces the next word, because as the first word is being produced, the tail thunk keeps growing tail ([...] ++ tail ([...] ++ tail (...))).

In contrast, compare to

myWords :: String -> [String]
myWords = myWords' . dropWhile isSpace where
    myWords' [] = []
    myWords' string =
        let (part1, part2) = break isSpace string
        in part1 : myWords part2

using library functions which may be defined as

break :: (a -> Bool) -> [a] -> ([a], [a])
break p = span $ not . p

span :: (a -> Bool) -> [a] -> ([a], [a])
span p xs = (takeWhile p xs, dropWhile p xs)

takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p (x:xs) | p x = x : takeWhile p xs
takeWhile _ _ = []

dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p (x:xs) | p x = dropWhile p xs
dropWhile _ xs = xs

Notice that producing the intermediate results is never held up by future computation, and only O(1) space is needed as each element of the result is made available for consumption.

Addendum

So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
   where 
      step space acc | charIsSpace space = "":acc
      step char (x:xs)                   = (char:x):xs
      step _ []                          = error "this should be impossible"

(Aside: You may not care, but the words "" == [] from the library, but your myWords "" = [""]. Similar issue with trailing spaces.)

Looks much-improved over myWords_anotherReader, and is pretty good for a foldr-based solution.

\n -> tail $ myWords $ replicate n 'a' ++ " b"

It's not possible to do better than O(n) time, but both myWords_anotherReader and myWords take O(n) space here. This may be inevitable given the use of foldr.

Worse,

\n -> head $ head $ myWords $ replicate n 'a' ++ " b"

myWords_anotherReader was O(1) but the new myWords is O(n), because pattern matching (x:xs) requires the further result.

You can work around this with

myWords :: String -> [String]
myWords = foldr step [""] . dropWhile isSpace
   where 
      step space acc | isSpace space = "":acc
      step char ~(x:xs)              = (char:x):xs

The ~ introduces an "irrefutable pattern". Irrefutable patterns never fail and do not force immediate evaluation.

回复收藏 0 原文

~没有更多了~