在循环中使用 numpy load 时内存溢出
循环加载 npz 文件会导致内存溢出(取决于文件 列表长度)。
以下内容似乎都没有帮助
删除在文件中存储数据的变量。
使用 mmap。
调用 gc.collect()(垃圾收集)。
以下代码应该重现该现象:
import numpy as np
# generate a file for the demo
X = np.random.randn(1000,1000)
np.savez('tmp.npz',X=X)
# here come the overflow:
for i in xrange(1000000):
data = np.load('tmp.npz')
data.close() # avoid the "too many files are open" error
在我的实际应用程序中,循环遍历文件列表,并且溢出超过 24GB RAM! 请注意,这是在 ubuntu 11.10 和 numpy v 上尝试过的 1.5.1 以及 1.6.0
我已在 numpy Ticket 2048 中提交了一份报告,但是这可能引起更广泛的兴趣,所以我也将其发布在这里(此外,我不确定这是一个错误,但可能是我糟糕的编程的结果)。
解决方案(由 HYRY 提供):
命令
del data.f
应位于命令之前,
data.close()
以获取更多信息和找到解决方案的方法,请阅读下面 HYRY 的友善回答
Looping over npz files load causes memory overflow (depending on the file
list length).
None of the following seems to help
Deleting the variable which stores the data in the file.
Using mmap.
calling gc.collect() (garbage collection).
The following code should reproduce the phenomenon:
import numpy as np
# generate a file for the demo
X = np.random.randn(1000,1000)
np.savez('tmp.npz',X=X)
# here come the overflow:
for i in xrange(1000000):
data = np.load('tmp.npz')
data.close() # avoid the "too many files are open" error
in my real application the loop is over a list of files and the overflow exceeds 24GB of RAM!
please note that this was tried on ubuntu 11.10, and for both numpy v
1.5.1 as well as 1.6.0
I have filed a report in numpy ticket 2048 but this may be of a wider interest and so I am posting it here as well (moreover, I am not sure that this is a bug but may result of my bad programming).
SOLUTION (by HYRY):
the command
del data.f
should precede the command
data.close()
for more information and a method to find the solution, please read HYRY's kind answer below
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为这是一个错误,也许我找到了解决方案:调用“del data.f”。
发现这种内存泄漏。您可以使用以下代码:
在测试程序之后,我创建了一个字典并在 gc.get_objects() 中计数对象。输出如下:
从结果中我们知道 BagObj 和 NpzFile 有问题。找到代码:
NpzFile有del(),NpzFile.f是一个BagObj,而BagObj._obj是NpzFile,这是一个引用循环,会导致NpzFile和BagObj都不可收集。以下是Python文档中的一些解释: http://docs.python.org/library /gc.html#gc.garbage
因此,要打破引用循环,需要调用“del data.f”
I think this is a bug, and maybe I found the solution: call "del data.f".
to found this kind of memory leak. you can use the following code:
After the test program, I created a dict and count objects in gc.get_objects(). Here is the output:
From the result we know that there are something wrong with BagObj and NpzFile. Find the code:
NpzFile has del(), NpzFile.f is a BagObj, and BagObj._obj is NpzFile, this is a reference cycle and will cause both NpzFile and BagObj uncollectable. Here is some explanation in Python document: http://docs.python.org/library/gc.html#gc.garbage
So, to break the reference cycle, will need to call "del data.f"
我找到的解决方案:(python==3.8 和 numpy==1.18.5)
What I found as the solution: (python==3.8 and numpy==1.18.5)