惰性 I/O 有什么不好呢?

发布于 2024-11-05 04:33:26 字数 81 浏览 6 评论 0原文

我通常听说生产代码应该避免使用惰性 I/O。我的问题是,为什么?除了闲逛之外,还可以使用 Lazy I/O 吗?是什么让替代方案(例如枚举器)更好?

I've generally heard that production code should avoid using Lazy I/O. My question is, why? Is it ever OK to use Lazy I/O outside of just toying around? And what makes the alternatives (e.g. enumerators) better?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

爱你不解释 2024-11-12 04:33:26

惰性 IO 存在的问题是,释放您所获取的任何资源都有些不可预测,因为它取决于您的程序如何使用数据——它的“需求模式”。一旦您的程序删除了对该资源的最后一个引用,GC 最终将运行并释放该资源。

惰性流是一种非常方便的编程方式。这就是 shell 管道如此有趣和流行的原因。

但是,如果资源受到限制(例如在高性能场景中,或者期望扩展到机器极限的生产环境中),则依靠 GC 进行清理可能并不能保证充分。

有时你必须急切地释放资源,以提高可扩展性。

那么,有哪些替代惰性 IO 的方案,同时又不意味着放弃增量处理(这又会消耗太多资源)呢?好吧,我们有基于 foldl 的处理,又名迭代器或枚举器,由 Oleg Kiselyov 在后期引入2000 年代,此后被许多基于网络的项目所普及。

我们不是将数据作为惰性流或在一个大批量中处理,而是抽象基于块的严格处理,并在读取最后一个块后保证资源的最终确定。这就是基于迭代的编程的本质,并且提供了非常好的资源约束。

基于 iteratee 的 IO 的缺点是它有一个有点尴尬的编程模型(大致类似于基于事件的编程,而不是基于线程的控制)。对于任何编程语言来说,这绝对是一项先进技术。而且对于绝大多数编程问题来说,惰性IO是完全令人满意的。但是,如果您要打开许多文件,或在许多套接字上进行通信,或以其他方式同时使用许多资源,则迭代器(或枚举器)方法可能会有意义。

Lazy IO has the problem that releasing whatever resource you have acquired is somewhat unpredictable, as it depends on how your program consumes the data -- its "demand pattern". Once your program drops the last reference to the resource, the GC will eventually run and release that resource.

Lazy streams are a very convenient style to program in. This is why shell pipes are so fun and popular.

However, if resources are constrained (as in high-performance scenarios, or production environments that expect to scale to the limits of the machine) relying on the GC to clean up can be an insufficient guarantee.

Sometimes you have to release resources eagerly, in order to improve scalability.

So what are the alternatives to lazy IO that don't mean giving up on incremental processing (which in turn would consume too many resources)? Well, we have foldl based processing, aka iteratees or enumerators, introduced by Oleg Kiselyov in the late 2000s, and since popularized by a number of networking-based projects.

Instead of processing data as lazy streams, or in one huge batch, we instead abstract over chunk-based strict processing, with guaranteed finalization of the resource once the last chunk is read. That's the essence of iteratee-based programming, and one that offers very nice resource constraints.

The downside of iteratee-based IO is that it has a somewhat awkward programming model (roughly analogous to event-based programming, versus nice thread-based control). It is definitely an advanced technique, in any programming language. And for the vast majority of programming problems, lazy IO is entirely satisfactory. However, if you will be opening many files, or talking on many sockets, or otherwise using many simultaneous resources, an iteratee (or enumerator) approach might make sense.

静待花开 2024-11-12 04:33:26

Dons 提供了一个非常好的答案,但他遗漏了(对我来说)迭代器最引人注目的功能之一:它们使空间管理的推理变得更容易,因为旧数据必须显式保留。考虑一下:

average :: [Float] -> Float
average xs = sum xs / length xs

这是一个众所周知的空间泄漏,因为整个列表xs必须保留在内存中才能计算sumlength。通过创建折叠可以成为高效的消费者:

average2 :: [Float] -> Float
average2 xs = uncurry (/) <
gt; foldl (\(sumT, n) x -> (sumT+x, n+1)) (0,0) xs
-- N.B. this will build up thunks as written, use a strict pair and foldl'

但是必须为每个流处理器执行此操作有点不方便。有一些概括(Conal Elliott - 美丽的折叠拉链),但它们似乎并不已经流行起来。然而,迭代者可以获得类似水平的表达。

aveIter = uncurry (/) <
gt; I.zip I.sum I.length

这不像折叠那么有效,因为列表仍然会迭代多次,但是它是以块的形式收集的,因此可以有效地对旧数据进行垃圾收集。为了打破这个属性,有必要显式地保留整个输入,例如使用stream2list:

badAveIter = (\xs -> sum xs / length xs) <
gt; I.stream2list

作为编程模型的迭代器的状态是一项正在进行的工作,但它比一年前要好得多。我们正在学习哪些组合器有用(例如 zipbreakEenumWith),哪些组合器不太有用,结果是内置迭代器和组合器不断提供更多的表现力。

也就是说,唐斯是正确的,它们是一种先进的技术;我当然不会用它们来解决所有 I/O 问题。

Dons has provided a very good answer, but he's left out what is (for me) one of the most compelling features of iteratees: they make it easier to reason about space management because old data must be explicitly retained. Consider:

average :: [Float] -> Float
average xs = sum xs / length xs

This is a well-known space leak, because the entire list xs must be retained in memory to calculate both sum and length. It's possible to make an efficient consumer by creating a fold:

average2 :: [Float] -> Float
average2 xs = uncurry (/) <
gt; foldl (\(sumT, n) x -> (sumT+x, n+1)) (0,0) xs
-- N.B. this will build up thunks as written, use a strict pair and foldl'

But it's somewhat inconvenient to have to do this for every stream processor. There are some generalizations (Conal Elliott - Beautiful Fold Zipping), but they don't seem to have caught on. However, iteratees can get you a similar level of expression.

aveIter = uncurry (/) <
gt; I.zip I.sum I.length

This isn't as efficient as a fold because the list is still iterated over multiple times, however it's collected in chunks so old data can be efficiently garbage collected. In order to break that property, it's necessary to explicitly retain the entire input, such as with stream2list:

badAveIter = (\xs -> sum xs / length xs) <
gt; I.stream2list

The state of iteratees as a programming model is a work in progress, however it's much better than even a year ago. We're learning what combinators are useful (e.g. zip, breakE, enumWith) and which are less so, with the result that built-in iteratees and combinators provide continually more expressivity.

That said, Dons is correct that they're an advanced technique; I certainly wouldn't use them for every I/O problem.

打小就很酷 2024-11-12 04:33:26

我一直在生产代码中使用惰性 I/O。就像唐提到的那样,这只是在某些情况下才会出现的问题。但对于仅仅读取一些文件来说它工作得很好。

I use lazy I/O in production code all the time. It's only a problem in certain circumstances, like Don mentioned. But for just reading a few files it works fine.

审判长 2024-11-12 04:33:26

更新: 最近在 haskell-cafe Oleg Kiseljov 显示 unsafeInterleaveST(用于在 ST monad 中实现惰性 IO)非常不安全 - 它破坏了等式推理。他表明它允许构造 bad_ctx :: ((Bool,Bool) -> Bool) ->布尔型
即使

> bad_ctx (\(x,y) -> x == y)
True
> bad_ctx (\(x,y) -> y == x)
False

== 是可交换的。


惰性 IO 的另一个问题是:实际的 IO 操作可能会被推迟,直到为时已晚,例如在文件关闭之后。引用自 Haskell Wiki - 惰性 IO 问题

例如,初学者常见的一个错误是在读完文件之前关闭文件:

错误=做
    fileData <- withFile "test.txt" ReadMode hGetContents
    putStr 文件数据

问题是 withFile 在强制 fileData 之前关闭句柄。正确的方法是将所有代码传递给withFile:

right = withFile "test.txt" ReadMode $ \handle ->;做
    fileData <- hGetContents 句柄
    putStr 文件数据

这里,数据在 withFile 完成之前被消耗。

这通常是意想不到的并且很容易犯的错误。


另请参阅:延迟 I/O 问题的三个示例

Update: Recently on haskell-cafe Oleg Kiseljov showed that unsafeInterleaveST (which is used for implementing lazy IO within the ST monad) is very unsafe - it breaks equational reasoning. He shows that it allows to construct bad_ctx :: ((Bool,Bool) -> Bool) -> Bool
such that

> bad_ctx (\(x,y) -> x == y)
True
> bad_ctx (\(x,y) -> y == x)
False

even though == is commutative.


Another problem with lazy IO: The actual IO operation can be deferred until it's too late, for example after the file is closed. Quoting from Haskell Wiki - Problems with lazy IO:

For example, a common beginner mistake is to close a file before one has finished reading it:

wrong = do
    fileData <- withFile "test.txt" ReadMode hGetContents
    putStr fileData

The problem is withFile closes the handle before fileData is forced. The correct way is to pass all the code to withFile:

right = withFile "test.txt" ReadMode $ \handle -> do
    fileData <- hGetContents handle
    putStr fileData

Here, the data is consumed before withFile finishes.

This is often unexpected and an easy-to-make error.


See also: Three examples of problems with Lazy I/O.

情绪失控 2024-11-12 04:33:26

迄今为止尚未提及的惰性 IO 的另一个问题是它具有令人惊讶的行为。在普通的 Haskell 程序中,有时很难预测程序的每个部分何时被评估,但幸运的是,由于纯粹性,除非遇到性能问题,否则这并不重要。当引入惰性 IO 时,代码的求值顺序实际上会影响其含义,因此您习惯认为无害的更改可能会导致真正的问题。

举个例子,这里有一个关于代码的问题,看起来很合理,但由于延迟 IO 而变得更加混乱: withFile 与 openFile< /a>

这些问题并不总是致命的,但这是需要考虑的另一件事,并且是一个足够严重的头痛,我个人会避免惰性 IO,除非预先完成所有工作存在真正的问题。

Another problem with lazy IO that hasn't been mentioned so far is that it has surprising behaviour. In a normal Haskell program, it can sometimes be difficult to predict when each part of your program is evaluated, but fortunately due to purity it really doesn't matter unless you have performance problems. When lazy IO is introduced, the evaluation order of your code actually has an effect on its meaning, so changes that you're used to thinking of as harmless can cause you genuine problems.

As an example, here's a question about code that looks reasonable but is made more confusing by deferred IO: withFile vs. openFile

These problems aren't invariably fatal, but it's another thing to think about, and a sufficiently severe headache that I personally avoid lazy IO unless there's a real problem with doing all the work upfront.

少女情怀诗 2024-11-12 04:33:26

惰性 I/O 的糟糕之处在于,作为程序员,您必须对某些资源进行微观管理,而不是对实现进行管理。例如,以下哪项是“不同的”?

  • freeSTRef :: STRef sa ->; ST s()
  • closeIORef::IORef a -> IO()
  • endMVar::MVar a -> IO()
  • discardTVar::TVar -> STM()
  • hClose::Handle -> IO()
  • finalizeForeignPtr::ForeignPtr a -> IO ()

...在所有这些不屑一顾的定义中,最后两个 - hClosefinalizeForeignPtr - 实际上确实存在。至于其余的,他们可以用语言提供什么服务,由实现来更可靠地执行!

因此,如果文件句柄和外部引用等资源的消除也留给实现,那么惰性 I/O 可能不会比惰性评估差。

What's so bad about lazy I/O is that you, the programmer, have to micro-manage certain resources instead of the implementation. For example, which of the following is "different"?

  • freeSTRef :: STRef s a -> ST s ()
  • closeIORef :: IORef a -> IO ()
  • endMVar :: MVar a -> IO ()
  • discardTVar :: TVar -> STM ()
  • hClose :: Handle -> IO ()
  • finalizeForeignPtr :: ForeignPtr a -> IO ()

...out of all these dismissive definitions, the last two - hClose and finalizeForeignPtr - actually do exist. As for the rest, what service they could provide in the language is much more reliably performed by the implementation!

So if the dismissing of resources like file handles and foreign references was also left to the implementation, lazy I/O would probably be no worse than lazy evaluation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文