当前位置：文江博客话题详情

惰性 I/O 有什么不好呢？

发布于 2024-11-05 04:33:26 字数 81 浏览 6 评论 0原文

我通常听说生产代码应该避免使用惰性 I/O。我的问题是，为什么？除了闲逛之外，还可以使用 Lazy I/O 吗？是什么让替代方案（例如枚举器）更好？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱你不解释 2024-11-12 04:33:26

惰性 IO 存在的问题是，释放您所获取的任何资源都有些不可预测，因为它取决于您的程序如何使用数据——它的“需求模式”。一旦您的程序删除了对该资源的最后一个引用，GC 最终将运行并释放该资源。

惰性流是一种非常方便的编程方式。这就是 shell 管道如此有趣和流行的原因。

但是，如果资源受到限制（例如在高性能场景中，或者期望扩展到机器极限的生产环境中），则依靠 GC 进行清理可能并不能保证充分。

有时你必须急切地释放资源，以提高可扩展性。

那么，有哪些替代惰性 IO 的方案，同时又不意味着放弃增量处理（这又会消耗太多资源）呢？好吧，我们有基于 foldl 的处理，又名迭代器或枚举器，由 Oleg Kiselyov 在后期引入2000 年代，此后被许多基于网络的项目所普及。

我们不是将数据作为惰性流或在一个大批量中处理，而是抽象基于块的严格处理，并在读取最后一个块后保证资源的最终确定。这就是基于迭代的编程的本质，并且提供了非常好的资源约束。

基于 iteratee 的 IO 的缺点是它有一个有点尴尬的编程模型（大致类似于基于事件的编程，而不是基于线程的控制）。对于任何编程语言来说，这绝对是一项先进技术。而且对于绝大多数编程问题来说，惰性IO是完全令人满意的。但是，如果您要打开许多文件，或在许多套接字上进行通信，或以其他方式同时使用许多资源，则迭代器（或枚举器）方法可能会有意义。

回复收藏 0 原文

静待花开 2024-11-12 04:33:26

Dons 提供了一个非常好的答案，但他遗漏了（对我来说）迭代器最引人注目的功能之一：它们使空间管理的推理变得更容易，因为旧数据必须显式保留。考虑一下：

average :: [Float] -> Float
average xs = sum xs / length xs

这是一个众所周知的空间泄漏，因为整个列表xs必须保留在内存中才能计算sum和length。通过创建折叠可以成为高效的消费者：

average2 :: [Float] -> Float
average2 xs = uncurry (/) <gt; foldl (\(sumT, n) x -> (sumT+x, n+1)) (0,0) xs
-- N.B. this will build up thunks as written, use a strict pair and foldl'

但是必须为每个流处理器执行此操作有点不方便。有一些概括（Conal Elliott - 美丽的折叠拉链），但它们似乎并不已经流行起来。然而，迭代者可以获得类似水平的表达。

aveIter = uncurry (/) <gt; I.zip I.sum I.length

这不像折叠那么有效，因为列表仍然会迭代多次，但是它是以块的形式收集的，因此可以有效地对旧数据进行垃圾收集。为了打破这个属性，有必要显式地保留整个输入，例如使用stream2list：

badAveIter = (\xs -> sum xs / length xs) <gt; I.stream2list

作为编程模型的迭代器的状态是一项正在进行的工作，但它比一年前要好得多。我们正在学习哪些组合器有用（例如 zip、breakE、enumWith），哪些组合器不太有用，结果是内置迭代器和组合器不断提供更多的表现力。

也就是说，唐斯是正确的，它们是一种先进的技术；我当然不会用它们来解决所有 I/O 问题。

Dons has provided a very good answer, but he's left out what is (for me) one of the most compelling features of iteratees: they make it easier to reason about space management because old data must be explicitly retained. Consider:

average :: [Float] -> Float
average xs = sum xs / length xs

This is a well-known space leak, because the entire list xs must be retained in memory to calculate both sum and length. It's possible to make an efficient consumer by creating a fold:

average2 :: [Float] -> Float
average2 xs = uncurry (/) <gt; foldl (\(sumT, n) x -> (sumT+x, n+1)) (0,0) xs
-- N.B. this will build up thunks as written, use a strict pair and foldl'

But it's somewhat inconvenient to have to do this for every stream processor. There are some generalizations (Conal Elliott - Beautiful Fold Zipping), but they don't seem to have caught on. However, iteratees can get you a similar level of expression.

aveIter = uncurry (/) <gt; I.zip I.sum I.length

This isn't as efficient as a fold because the list is still iterated over multiple times, however it's collected in chunks so old data can be efficiently garbage collected. In order to break that property, it's necessary to explicitly retain the entire input, such as with stream2list:

badAveIter = (\xs -> sum xs / length xs) <gt; I.stream2list

The state of iteratees as a programming model is a work in progress, however it's much better than even a year ago. We're learning what combinators are useful (e.g. zip, breakE, enumWith) and which are less so, with the result that built-in iteratees and combinators provide continually more expressivity.

That said, Dons is correct that they're an advanced technique; I certainly wouldn't use them for every I/O problem.

回复收藏 0 原文

打小就很酷 2024-11-12 04:33:26

我一直在生产代码中使用惰性 I/O。就像唐提到的那样，这只是在某些情况下才会出现的问题。但对于仅仅读取一些文件来说它工作得很好。

回复收藏 0 原文

审判长 2024-11-12 04:33:26

更新： 最近在 haskell-cafe Oleg Kiseljov 显示 unsafeInterleaveST（用于在 ST monad 中实现惰性 IO）非常不安全 - 它破坏了等式推理。他表明它允许构造 bad_ctx :: ((Bool,Bool) -> Bool) ->布尔型
即使

> bad_ctx (\(x,y) -> x == y)
True
> bad_ctx (\(x,y) -> y == x)
False

== 是可交换的。

惰性 IO 的另一个问题是：实际的 IO 操作可能会被推迟，直到为时已晚，例如在文件关闭之后。引用自 Haskell Wiki - 惰性 IO 问题：

例如，初学者常见的一个错误是在读完文件之前关闭文件：
错误=做
    fileData <- withFile "test.txt" ReadMode hGetContents
    putStr 文件数据
问题是 withFile 在强制 fileData 之前关闭句柄。正确的方法是将所有代码传递给withFile：
right = withFile "test.txt" ReadMode $ \handle ->;做
    fileData <- hGetContents 句柄
    putStr 文件数据
这里，数据在 withFile 完成之前被消耗。

这通常是意想不到的并且很容易犯的错误。

另请参阅：延迟 I/O 问题的三个示例。

Update: Recently on haskell-cafe Oleg Kiseljov showed that unsafeInterleaveST (which is used for implementing lazy IO within the ST monad) is very unsafe - it breaks equational reasoning. He shows that it allows to construct bad_ctx :: ((Bool,Bool) -> Bool) -> Bool
such that

> bad_ctx (\(x,y) -> x == y)
True
> bad_ctx (\(x,y) -> y == x)
False

even though == is commutative.

Another problem with lazy IO: The actual IO operation can be deferred until it's too late, for example after the file is closed. Quoting from Haskell Wiki - Problems with lazy IO:

For example, a common beginner mistake is to close a file before one has finished reading it:
wrong = do
    fileData <- withFile "test.txt" ReadMode hGetContents
    putStr fileData
The problem is withFile closes the handle before fileData is forced. The correct way is to pass all the code to withFile:
right = withFile "test.txt" ReadMode $ \handle -> do
    fileData <- hGetContents handle
    putStr fileData
Here, the data is consumed before withFile finishes.

This is often unexpected and an easy-to-make error.

回复收藏 0 原文