Haskell 中未定义长度列表的二进制序列化

发布于 2024-11-13 11:23:10 字数 376 浏览 1 评论 0原文

我一直在使用 Data.Binary 将数据序列化到文件。在我的应用程序中，我逐渐将项目添加到这些文件中。两个最流行的序列化包（二进制和谷物）都将列表序列化为计数，后跟列表项。因此，我无法附加到我的序列化文件。我当前读取整个文件，反序列化列表，附加到列表，重新序列化列表，然后将其写回到文件中。但是，我的数据集越来越大，内存开始不足。我可能可以通过拆箱数据结构来获得一些空间，但这种方法无法扩展。

一种解决方案是深入研究文件格式以更改初始计数，然后仅附加我的元素。但这并不是很令人满意，更不用说对由于破坏抽象而导致的文件格式的未来变化敏感了。在这里，迭代器/枚举器是一个有吸引力的选择。我寻找一个将它们与二进制序列化相结合的库，但没有找到任何东西。有人知道这是否已经完成了吗？如果没有，那么这个库有用吗？或者我错过了什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

め可乐爱微笑 2024-11-20 11:23:10

所以我说坚持使用 Data.Binary 但为可增长列表编写一个新实例。这是当前（严格）实例：

instance Binary a => Binary [a] where
    put l  = put (length l) >> mapM_ put l
    get    = do n <- get :: Get Int
                getMany n

-- | 'getMany n' get 'n' elements in order, without blowing the stack.
getMany :: Binary a => Int -> Get [a]
getMany n = go [] n
 where
    go xs 0 = return $! reverse xs
    go xs i = do x <- get
                 x `seq` go (x:xs) (i-1)
{-# INLINE getMany #-}

现在，允许您流式传输（以二进制形式）附加到文件的版本需要急切或惰性。惰性版本是最简单的。类似的东西：

import Data.Binary

newtype Stream a = Stream { unstream :: [a] }

instance Binary a => Binary (Stream a) where

    put (Stream [])     = putWord8 0
    put (Stream (x:xs)) = putWord8 1 >> put x >> put (Stream xs)

    get = do
        t <- getWord8
        case t of
            0 -> return (Stream [])
            1 -> do x         <- get
                    Stream xs <- get
                    return (Stream (x:xs))

按摩适当地适用于流媒体。现在，为了处理静默追加，我们需要能够查找到文件末尾，并在添加更多元素之前覆盖最终的0 标记。

So I say stick with Data.Binary but write a new instance for growable lists. Here's the current (strict) instance:

instance Binary a => Binary [a] where
    put l  = put (length l) >> mapM_ put l
    get    = do n <- get :: Get Int
                getMany n

-- | 'getMany n' get 'n' elements in order, without blowing the stack.
getMany :: Binary a => Int -> Get [a]
getMany n = go [] n
 where
    go xs 0 = return $! reverse xs
    go xs i = do x <- get
                 x `seq` go (x:xs) (i-1)
{-# INLINE getMany #-}

Now, a version that lets you stream (in binary) to append to a file would need to be eager or lazy. The lazy version is the most trivial. Something like:

import Data.Binary

newtype Stream a = Stream { unstream :: [a] }

instance Binary a => Binary (Stream a) where

    put (Stream [])     = putWord8 0
    put (Stream (x:xs)) = putWord8 1 >> put x >> put (Stream xs)

    get = do
        t <- getWord8
        case t of
            0 -> return (Stream [])
            1 -> do x         <- get
                    Stream xs <- get
                    return (Stream (x:xs))

Massaged appropriately works for streaming. Now, to handle silently appending, we'll need to be able to seek to the end of the file, and overwrite the final 0 tag, before adding more elements.

回复收藏 0 原文

三生殊途 2024-11-20 11:23:10

这个问题得到回答已有四年了，但我在对唐·斯图尔特的回答的评论中遇到了与 gatoatigrado 相同的问题。 put 方法如宣传的那样工作，但 get 读取整个输入。我相信问题出在 case 语句 Stream xs <- get 中的模式匹配，它必须确定剩余的 get 是否是 Stream a 或返回之前不。

我的解决方案使用 Data.Binary.Get 中的示例作为起点：

import Data.ByteString.Lazy(toChunks,ByteString)
import Data.Binary(Binary(..),getWord8)
import Data.Binary.Get(pushChunk,Decoder(..),runGetIncremental)
import Data.List(unfoldr)

decodes :: Binary a => ByteString -> [a]
decodes = runGets (getWord8 >> get)

runGets :: Get a -> ByteString -> [a]
runGets g = unfoldr (decode1 d) . toChunks
  where d = runGetIncremental g

decode1 _ [] = Nothing
decode1 d (x:xs) = case d `pushChunk` x of
                     Fail _ _ str  -> error str
                     Done x' _ a   -> Just (a,x':xs)
                     k@(Partial _) -> decode1 k xs

注意 getWord8 的使用，这是为了读取编码的 [] 和 : 由流实例的 put 定义生成。另请注意，由于 getWord8 忽略编码的 [] 和 : 符号，因此此实现将不会检测列表的末尾。我的编码文件只是一个列表，因此它适用于该列表，但否则您需要进行修改。

无论如何，在访问头部和最后一个元素的情况下，此解码都在常量内存中运行。

It's four years since this question has been answered, but I ran into the same problems as gatoatigrado in the comment to Don Stewart's answer. The put method works as advertised, but get reads the whole input. I believe the problem lies in the pattern match in the case statement, Stream xs <- get, which must determine whether or not the remaining get is a Stream a or not before returning.

My solution used the example in Data.Binary.Get as a starting point:

import Data.ByteString.Lazy(toChunks,ByteString)
import Data.Binary(Binary(..),getWord8)
import Data.Binary.Get(pushChunk,Decoder(..),runGetIncremental)
import Data.List(unfoldr)

decodes :: Binary a => ByteString -> [a]
decodes = runGets (getWord8 >> get)

runGets :: Get a -> ByteString -> [a]
runGets g = unfoldr (decode1 d) . toChunks
  where d = runGetIncremental g

decode1 _ [] = Nothing
decode1 d (x:xs) = case d `pushChunk` x of
                     Fail _ _ str  -> error str
                     Done x' _ a   -> Just (a,x':xs)
                     k@(Partial _) -> decode1 k xs

Note the use of getWord8 This is to read the encoded [] and : resulting from the definition of put for the stream instance. Also note, since getWord8 ignores the encoded [] and : symbols, this implementation will not detect the end of the list. My encoded file was just a single list so it works for that, but otherwise you'll need to modify.

In any case, this decodes ran in constant memory in both cases of accessing the head and last elements.

回复收藏 0 原文

~没有更多了~