Python 单调增加内存使用量(泄漏?)
我正在使用这个简单的代码并观察单调增加的内存使用情况。我正在使用这个小模块将内容转储到磁盘。我观察到它发生在 unicode 字符串而不是整数上,我做错了什么吗?
当我这样做时:
>>> from utils.diskfifo import DiskFifo
>>> df=DiskFifo()
>>> for i in xrange(1000000000):
... df.append(i)
内存消耗是稳定的,
但是当我这样做时:
>>> while True:
... a={'key': u'value', 'key2': u'value2'}
... df.append(a)
它会达到屋顶。有什么提示吗?模块下方...
import tempfile
import cPickle
class DiskFifo:
def __init__(self):
self.fd = tempfile.TemporaryFile()
self.wpos = 0
self.rpos = 0
self.pickler = cPickle.Pickler(self.fd)
self.unpickler = cPickle.Unpickler(self.fd)
self.size = 0
def __len__(self):
return self.size
def extend(self, sequence):
map(self.append, sequence)
def append(self, x):
self.fd.seek(self.wpos)
self.pickler.dump(x)
self.wpos = self.fd.tell()
self.size = self.size + 1
def next(self):
try:
self.fd.seek(self.rpos)
x = self.unpickler.load()
self.rpos = self.fd.tell()
return x
except EOFError:
raise StopIteration
def __iter__(self):
self.rpos = 0
return self
I'm using this simple code and observing monotonically increasing memory usage. I'm using this little module to dump stuff to disk. I observed it happens with unicode strings and not with integers, is there something I'm doing wrong?
When I do:
>>> from utils.diskfifo import DiskFifo
>>> df=DiskFifo()
>>> for i in xrange(1000000000):
... df.append(i)
Memory consumption is stable
but when I do:
>>> while True:
... a={'key': u'value', 'key2': u'value2'}
... df.append(a)
It goes to the roof. Any hints? below the module...
import tempfile
import cPickle
class DiskFifo:
def __init__(self):
self.fd = tempfile.TemporaryFile()
self.wpos = 0
self.rpos = 0
self.pickler = cPickle.Pickler(self.fd)
self.unpickler = cPickle.Unpickler(self.fd)
self.size = 0
def __len__(self):
return self.size
def extend(self, sequence):
map(self.append, sequence)
def append(self, x):
self.fd.seek(self.wpos)
self.pickler.dump(x)
self.wpos = self.fd.tell()
self.size = self.size + 1
def next(self):
try:
self.fd.seek(self.rpos)
x = self.unpickler.load()
self.rpos = self.fd.tell()
return x
except EOFError:
raise StopIteration
def __iter__(self):
self.rpos = 0
return self
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
pickler 模块存储它在备忘录中看到的所有对象,因此它不必对同一事物进行两次pickle。您想跳过此操作(因此对对象的引用不会存储在pickler对象中)并在转储之前清除备忘录:
来源:http://docs.python.org/library/pickle.html#pickle.Pickler.clear_memo
编辑:
当您使用以下附加函数腌制对象时,您实际上可以看到备忘录的大小增加:
The pickler module is storing all objects it has seen in its memo, so it doesn't have to pickle the same thing twice. You want to skip this (so references to your objects aren't stored in your pickler object) and clear the memo before dumping:
Source: http://docs.python.org/library/pickle.html#pickle.Pickler.clear_memo
Edit:
You can actually watch the size of the memo go up as you pickle your objects by using the following append function:
要添加combatdave@的答案:
我刚刚绕过了pickle中可怕的备忘录缓存,因为清除读取器端的备忘录似乎是不可能的,并且显然是不可避免的内存泄漏。 Pickle 流似乎是为读写中等大小的文件而设计的,而不是为读写无限的数据流而设计的。
相反,我只使用了以下简单的实用函数:
To add to the answer by combatdave@:
I just bypassed the terrible memo caching in pickle since clearing the memo on the reader side seems impossible and was an apparently unavoidable memory leak. Pickle streaming seem to be designed for reading and writing moderately sized files, not for reading and writing unbounded streams of data.
Instead I just used the following simple utility functions: