如何释放Haskell中特定数据结构的内存?

发布于 2025-01-16 02:44:47 字数 1287 浏览 0 评论 0原文

假设我有几个非常大的向量。它们存储在磁盘上。我需要通过读取每个相应的文件来单独访问它们,这会将它们放入内存中。我将对单个向量执行某些功能,然后转到我需要访问的下一个向量。每次我需要访问不同的向量时,我需要能够指示内存中的每个向量进行垃圾收集。我不确定如果我的程序中声明我必须稍后通过引用读取向量的相同函数名称来再次访问同一向量,则 performMajorGC 是否会确保向量被垃圾收集从磁盘输入。在这种情况下,我会再次将其读入内存,使用它,然后进行垃圾收集。我如何确保它是车库集合,同时对从同一文件读取的向量使用相同的函数名称?

希望有任何建议,谢谢

回复 Daniel Wagner:

    myvec x :: Int -> IO (Vector (Vector ByteString))
    myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" ++ (show x))
                              guard (isRight (Data.Csv.decode NoHeader y)) 
                              return y
                 yy <- ioy 
                 return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])

    myvecvec :: Vector (IO (Vector (Vector ByteString)))
    myvecvec = generate 100 (\x -> myvec x)

    somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
    somefunc1 iovv = do vv <- iovv
                        somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()  

-- 对于 somefunc2 和 3 也是如此

    oponvec :: IO ()
    oponvec = do somefunc1 (myvecvec ! 0)
                 performGC
                 somefunc2 (myvecvec ! 1)
                 performGC
                 somefunc3 (myvecvec ! 0)
    

Let’s say I have several very large vectors. They are stored on disk. I need to access them individually by reading from each respective file which would place them into memory. I would perform some function on a single vector and then move to the next one I need access. I need to be able to instruct each vector in memory to be garbage collected every time I need to access a different vector. I’m not sure if performMajorGC would ensure that the vector would be garbage collected if it is stated in my program that I have to access that same vector again later by referencing the same function name that read the vector in from disk. In such a case I would read it into memory again, use it, then garbage collect it. How would I ensure it’s garage collection while using the same function name for the vector that is read from the same file?

Would appreciate any advice thanks

In response to Daniel Wagner:

    myvec x :: Int -> IO (Vector (Vector ByteString))
    myvec x = do let ioy = do y <- Data.ByteString.Lazy.readFile ("data.csv" ++ (show x))
                              guard (isRight (Data.Csv.decode NoHeader y)) 
                              return y
                 yy <- ioy 
                 return (head $ snd $ partitionEithers [Data.Csv.decode NoHeader yy])

    myvecvec :: Vector (IO (Vector (Vector ByteString)))
    myvecvec = generate 100 (\x -> myvec x)

    somefunc1 :: IO (Vector (Vector ByteString)) -> IO ()
    somefunc1 iovv = do vv <- iovv
                        somefunc1x1 vv :: Vector (Vector ByteString) -> IO ()  

-- same thing for somefunc2 and 3

    oponvec :: IO ()
    oponvec = do somefunc1 (myvecvec ! 0)
                 performGC
                 somefunc2 (myvecvec ! 1)
                 performGC
                 somefunc3 (myvecvec ! 0)
    

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

晨与橙与城 2025-01-23 02:44:47

您可以使用弱指针来测试这一点,如下所示:

import qualified Data.Vector.Unboxed as V
import System.Mem.Weak
import System.Mem

main :: IO ()
main = do
  let xs = V.fromList [1..1000000:: Int]
  wkp <- mkWeakPtr xs Nothing
  performGC
  xs' <- deRefWeak wkp
  print xs'

在我的系统上,这会打印 Nothing ,这意味着向量已被释放。不过,我不知道GHC是否保证会发生这种情况。

这是一个检查@amalloy建议的程序:

import qualified Data.Vector.Unboxed as V
import Control.Monad
import Data.Word

{-# NOINLINE newLarge #-}
newLarge :: Word8 -> V.Vector Word8
newLarge n = V.replicate 5000000000 n -- 5GB

main :: IO ()
main = forM_ [1..10] $ \i -> print (V.sum (newLarge i))

这在我的机器上恰好使用了5GB,这表明从来没有同时分配两个大向量。

You can test this by using a weak pointer as follows:

import qualified Data.Vector.Unboxed as V
import System.Mem.Weak
import System.Mem

main :: IO ()
main = do
  let xs = V.fromList [1..1000000:: Int]
  wkp <- mkWeakPtr xs Nothing
  performGC
  xs' <- deRefWeak wkp
  print xs'

On my system this prints Nothing which means that the vector has been deallocated. However, I don't know if GHC guarantees that this happens.

Here's a program which checks @amalloy's suggestion:

import qualified Data.Vector.Unboxed as V
import Control.Monad
import Data.Word

{-# NOINLINE newLarge #-}
newLarge :: Word8 -> V.Vector Word8
newLarge n = V.replicate 5000000000 n -- 5GB

main :: IO ()
main = forM_ [1..10] $ \i -> print (V.sum (newLarge i))

This uses exactly 5GB on my machine, which shows that there are never two large vectors allocated at the same time.

乱世争霸 2025-01-23 02:44:47

每次我需要访问不同的向量时,我需要能够指示内存中的每个向量进行垃圾收集。

你?为什么?如果只是因为它们很大并且您担心将向量拟合到内存中,那么不用担心。如果需要内存空间,并且该对象无法访问,则垃圾收集将拾取它。如果不需要内存空间,则无需执行任何操作。如果该对象是可访问的,则运行 GC 将无济于事。因此,手动干预 GC 不会有任何好处。

如果您想出于释放内存之外的其他原因对其进行 GC,则需要在问题中进行解释,因为该目标肯定会影响答案。

I need to be able to instruct each vector in memory to be garbage collected every time I need to access a different vector.

Do you? Why? If it's simply because they're large and you're worried about fitting the vector in memory, then don't worry about it. If memory space is needed, and the object is unreachable, then garbage collection will pick it up. If memory space is not needed, you don't need to do anything. And if the object is reachable, running the GC won't help. So there are no cases where manual intervention in GC will do any good.

And if you want to GC it for some other reason than freeing up memory, you need to explain that in the question, because that goal will surely affect answers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文