Python 搁置 OutOfMemory 错误

发布于 2024-08-20 08:33:19 字数 762 浏览 12 评论 0原文

我有一些数据存储在我想要处理的数据库中。数据库访问速度非常慢,因此我决定在进行任何处理之前将所有数据加载到字典中。然而,由于存储的数据量巨大,我收到内存不足错误(我看到使用了超过 2 GB)。所以我决定使用磁盘数据结构,并发现使用 shelve 是一种选择。这就是我所做的(伪 python 代码)

def loadData():
    if (#dict exists on disk):
        d = shelve.open(name)
        return d
    else:
        d = shelve.open(name, writeback=True)
        #access DB and write data to dict
        # d[key] = value 
        # or for mutable values
        # oldValue = d[key]
        # newValue = f(oldValue)
        # d[key] = newValue 
        d.close()
        d = shelve.open(name, writeback=True)
        return d

我有几个问题,

1)我真的需要 writeBack=True 吗?它有什么作用?

2)我仍然遇到 OutofMemory 异常,因为我没有对数据写入磁盘的时间进行任何控制。我该怎么做?我尝试每隔几次迭代执行一次sync(),但这也没有帮助。

谢谢!

I have some data stored in a DB that I want to process. DB access is painfully slow, so I decided to load all data in a dictionary before any processing. However, due to the huge size of the data stored, I get an out of memory error (I see more than 2 gigs being used). So I decided to use a disk data structure, and found out that using shelve is an option. Here's what I do (pseudo python code)

def loadData():
    if (#dict exists on disk):
        d = shelve.open(name)
        return d
    else:
        d = shelve.open(name, writeback=True)
        #access DB and write data to dict
        # d[key] = value 
        # or for mutable values
        # oldValue = d[key]
        # newValue = f(oldValue)
        # d[key] = newValue 
        d.close()
        d = shelve.open(name, writeback=True)
        return d

I have a couple of questions,

1) Do I really need the writeBack=True? What does it do?

2) I still get an OutofMemory exception, since I do not exercise any control over when the data is being written to disk. How do I do that? I tried doing a sync() every few iterations but that didn't help either.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

流星番茄 2024-08-27 08:33:19

writeback=True 强制架子将曾经获取的任何项目保留在内存中,并在架子关闭时将它们写回。因此,它会消耗更多内存,并减慢关闭速度。

该参数的优点是,使用它,您不需要需要在注释中为可变项(其变异器是一个方法)显示的扭曲代码 - 就可以

shelf['foobar'].append(23)

工作(如果 shelf< /code> 是在启用回写的情况下打开的),假设键 'foobar' 处的项目当然是一个列表,而如果 < code>shelf 在没有写回的情况下打开 - 在后一种情况下,您实际上确实需要

thelist = shelf['foobar']
thelist.append(23)
shekf['foobar'] = thelist

按照评论的精神进行编码 - 这在风格上有点令人失望。

然而,由于您内存问题,我绝对建议不要使用这个可疑的写回选项。我想我可以称其为“可疑”,因为我是提出并第一个实施它的人,但那是很多年前的事了,而且我大多后悔这样做了——它比它更容易造成混乱(正如你的 Q 所证明的那样)允许优雅和方便地移动最初编写的与字典一起使用的代码(这将使用第一个习惯用法,而不是第二个习惯用法,因此需要重写以便可以在没有回溯的情况下与架子一起使用)。啊,好吧,抱歉,这在当时看来确实是个好主意。

writeback=True forces the shelf to keep in-memory any item ever fetched, and write them back when the shelf is closed. So, it consumes much more memory, and slows down closing.

The advantage of the parameter is that, with it, you don't need the contorted code you show in your comment for mutable items whose mutator is a method -- just

shelf['foobar'].append(23)

works (if shelf was opened with writeback enabled), assuming the item at key 'foobar' is a list of course, while it would silently be a no-operation (leaving the item on disk unchanged) if shelf was opened without writeback -- in the latter case you actually do need to code

thelist = shelf['foobar']
thelist.append(23)
shekf['foobar'] = thelist

in your comment's spirit -- which is stylistically somewhat of a bummer.

However, since you are having memory problems, I definitely recommend not using this dubious writeback option. I think I can call it "dubious" since I was the one proposing and first implementing it, but that was many years ago, and I've mostly repented of doing it -- it generales more confusion (as your Q evidences) than it allows elegance and handiness in moving code originally written to work with dicts (which would use the first idiom, not the second, and thus need rewriting in order to be usable with shelves without traceback). Ah well, sorry, it did seem a good idea at the time.

ま昔日黯然 2024-08-27 08:33:19

使用 sqlite3 模块可能是您的最佳选择。无论如何,您可能可以完全在内存中使用 sqlite,因为它的内存占用可能比使用 python 对象要小一些。无论如何,这通常是比使用 shelve 更好的选择; shelve 在下面使用 pickle,这很少是您想要的。

天哪,您可以将整个现有数据库转换为 sqlite 数据库。 sqlite 又好又快。

Using the sqlite3 module is probably your best choice here. You might be able to use sqlite entirely in memory anyway since its memory footprint might be a bit smaller than using python objects anyway. It's generally a better choice than using shelve anyway; shelve uses pickle underneath, which is rarely what you want.

Hell, you could just convert your entire existing database to a sqlite database. sqlite is nice and fast.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文