如何强制 haskell 不存储整个字节串？

发布于 2024-10-17 16:24:01 字数 659 浏览 6 评论 0原文

我出于学术目的在 haskell 上编写了一个小型（相对）应用程序。我正在实施霍夫曼压缩，基于此代码 http://www.haskell.org/haskellwiki/Toy_compression_implementations 。

我的这段代码的变体在这里 https://github.com/kravitz/har/ blob/a5d221f227c27fd1c5587217a29a169a377521a6/huffman.hs 它使用惰性字节串。当我实现 RLE 压缩时，一切都很顺利，因为它一步处理输入流。但是霍夫曼处理它两次，结果我在内存中存储了一个评估的字节串，这对于大文件来说是不利的（但对于相对较小的文件，它也在堆中分配了太多空间）。这不仅仅是我的怀疑，因为分析还显示大部分堆被字节串分配占用。

另外，我在文件中序列化流长度，这也可能导致内存中加载完整的字节串。有没有什么简单的方法可以说 ghc 友善并重新评估流几次？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

街道布景 2024-10-24 16:24:01

您可以传递计算字节串的内容，然后在每次需要时显式重新计算该值，而不是将字节串传递给编码器。

compress :: ST s ByteString -> ST s ByteString
compress makeInput = do
  len      <- (return $!) . ByteString.length =<< makeInput
  codebook <- (return $!) . makeCodebook      =<< makeInput
  return . encode len codebook                =<< makeInput

compressIO :: IO ByteString -> IO ByteString
compressIO m = stToIO (compress (unsafeIOToST m))

compress 的参数实际上应该计算该值。简单地用 return 包装一个值是行不通的。此外，每次调用 makeInput 都必须实际评估其结果，否则当重新计算输入时，内存中将保留一个惰性的、未评估的输入副本。

正如 Barsoap 所说，通常的方法是一次只压缩一个块。

Instead of passing a bytestring to the encoder, you can pass something that computes a bytestring, then explicitly recompute the value each time you need it.

compress :: ST s ByteString -> ST s ByteString
compress makeInput = do
  len      <- (return $!) . ByteString.length =<< makeInput
  codebook <- (return $!) . makeCodebook      =<< makeInput
  return . encode len codebook                =<< makeInput

compressIO :: IO ByteString -> IO ByteString
compressIO m = stToIO (compress (unsafeIOToST m))

The parameter to compress should actually compute the value. Simply wrapping a value with return won't work. Also, each call to makeInput must actually have its result evaluated, else there will remain a lazy, un-evaluated copy of the input in memory when the input is recomputed.

The usual approach, as barsoap said, is to just compress one block at a time.

回复收藏 0 原文