如何释放Haskell中特定数据结构的内存?
假设我有几个非常大的向量。它们存储在磁盘上。我需要通过读取每个相应的文件来单独访问它们,这会将它们放入内存中。我将对单个向量执行某些功能,然后转到我需要访问的下一个向量。每次我需要访问不同的向量时,我需要能够指示内存中的每个向量进行垃圾收集。我不确定如果我的程序中声明我必须稍后通过引用读取向量的相同函数名称来再次访问同一向量,则 performMajorGC
是否会确保向量被垃圾收集从磁盘输入。在这种情况下,我会再次将其读入内存,使用它,然后进行垃圾收集。我如何确保它是车库集合,同时对从同一文件读取的向量使用相同的函数名称?
希望有任何建议,谢谢
回复 Daniel Wagner:
myvec x :: Int -> IO (Vector (Vector ByteString))
myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" ++ (show x))
guard (isRight (Data.Csv.decode NoHeader y))
return y
yy <- ioy
return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])
myvecvec :: Vector (IO (Vector (Vector ByteString)))
myvecvec = generate 100 (\x -> myvec x)
somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
somefunc1 iovv = do vv <- iovv
somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()
-- 对于 somefunc2 和 3 也是如此
oponvec :: IO ()
oponvec = do somefunc1 (myvecvec ! 0)
performGC
somefunc2 (myvecvec ! 1)
performGC
somefunc3 (myvecvec ! 0)
Let’s say I have several very large vectors. They are stored on disk. I need to access them individually by reading from each respective file which would place them into memory. I would perform some function on a single vector and then move to the next one I need access. I need to be able to instruct each vector in memory to be garbage collected every time I need to access a different vector. I’m not sure if performMajorGC
would ensure that the vector would be garbage collected if it is stated in my program that I have to access that same vector again later by referencing the same function name that read the vector in from disk. In such a case I would read it into memory again, use it, then garbage collect it. How would I ensure it’s garage collection while using the same function name for the vector that is read from the same file?
Would appreciate any advice thanks
In response to Daniel Wagner:
myvec x :: Int -> IO (Vector (Vector ByteString))
myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" ++ (show x))
guard (isRight (Data.Csv.decode NoHeader y))
return y
yy <- ioy
return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])
myvecvec :: Vector (IO (Vector (Vector ByteString)))
myvecvec = generate 100 (\x -> myvec x)
somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
somefunc1 iovv = do vv <- iovv
somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()
-- same thing for somefunc2 and 3
oponvec :: IO ()
oponvec = do somefunc1 (myvecvec ! 0)
performGC
somefunc2 (myvecvec ! 1)
performGC
somefunc3 (myvecvec ! 0)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用弱指针来测试这一点,如下所示:
在我的系统上,这会打印
Nothing
,这意味着向量已被释放。不过,我不知道GHC是否保证会发生这种情况。这是一个检查@amalloy建议的程序:
这在我的机器上恰好使用了5GB,这表明从来没有同时分配两个大向量。
You can test this by using a weak pointer as follows:
On my system this prints
Nothing
which means that the vector has been deallocated. However, I don't know if GHC guarantees that this happens.Here's a program which checks @amalloy's suggestion:
This uses exactly 5GB on my machine, which shows that there are never two large vectors allocated at the same time.
你?为什么?如果只是因为它们很大并且您担心将向量拟合到内存中,那么不用担心。如果需要内存空间,并且该对象无法访问,则垃圾收集将拾取它。如果不需要内存空间,则无需执行任何操作。如果该对象是可访问的,则运行 GC 将无济于事。因此,手动干预 GC 不会有任何好处。
如果您想出于释放内存之外的其他原因对其进行 GC,则需要在问题中进行解释,因为该目标肯定会影响答案。
Do you? Why? If it's simply because they're large and you're worried about fitting the vector in memory, then don't worry about it. If memory space is needed, and the object is unreachable, then garbage collection will pick it up. If memory space is not needed, you don't need to do anything. And if the object is reachable, running the GC won't help. So there are no cases where manual intervention in GC will do any good.
And if you want to GC it for some other reason than freeing up memory, you need to explain that in the question, because that goal will surely affect answers.