什么数据库对于替换 numpy 数组的字典有用?

发布于 2025-01-07 18:47:18 字数 688 浏览 0 评论 0原文

我的代码正在创建一个 dict (以字符串作为键,以 numpy 数组作为值),它太大,无法放入 RAM,因此程序崩溃('无法分配内存','杀死' ,“中止”)。

阅读了一些 SO 文章后,我得到的印象是,我需要使用数据库来处理这种情况。但我应该使用哪一个呢? bsddb - Berkeley DB 库的接口 推荐 @ 基于Python磁盘的字典仅接受字符串作为值,这确实使它看起来很与 numpy 数组一起使用很麻烦。我还简要地查看了 sqlite3 推荐@ 如何使用 Python 处理内存不足,但我真的很想避免使用 SQL 访问我的数据。

你会推荐什么?

My code is creating a dict (with strings as keys and numpy arrays as values), which is too big too fit into the RAM, so the program is crashing ('Cannot allocate memory','killed','aborted').

Having read some SO articles, I got the impression, that I would need to use a database to handle this case. But which one should I use? The bsddb - Interface to Berkeley DB library recommended @ Python Disk-Based Dictionary only accepts strings as values, which does make it seem very cumbersome to use it with numpy arrays. I also looked briefly at sqlite3 recommended @ How to handle Out of memory with Python, but I would really like to avoid using SQL do access my data.

What would you recommend?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

昔梦 2025-01-14 18:47:18

鉴于您的查询模式非常简单,sqlite 看起来很完美。

我经常提到的另一个选项是 redis ( http://redis.io ),一个键值服务器。

Memcached ( http://memcached.org/ ) 和 MongoDB ( http://www.mongodb.org/ )是其他流行的 NoSQL 数据库。

如果您不喜欢这些,请搜索 NoSQL 看看还有哪些其他项目。

sqlite would seem perfect, given that your query pattern will be very simple.

Another option which I frequently mention is redis ( http://redis.io ), a key-value server.

Memcached ( http://memcached.org/ ) and MongoDB ( http://www.mongodb.org/ ) are other popular NoSQL databases.

If none of these take your fancy, google NoSQL to see what other projects are out there.

〃温暖了心ぐ 2025-01-14 18:47:18

这是一个可能适合您的简单解决方案。不要将数组存储在 dict 中(因此它们位于内存中),而是将它们写入文件。只要您小心处理引用,它们就会被引用计数器清除,直到您再次访问它们。

编辑:您可以通过使用 npz 文件一次存储几个密钥来调整此设置,特别是在不需要随机访问的情况下。

代码

import tempfile
import numpy

class numpy_dict(dict):
    def __setitem__(self, key, value):
        with tempfile.NamedTemporaryFile(delete=False) as f:
            numpy.save(f, value)
            super(numpy_dict, self).__setitem__(key, f.name)

    def __getitem__(self, key):
        path = super(numpy_dict, self).__getitem__(key)
        return numpy.load(path)

示例用法

>>> import so
>>> import numpy as np
>>> x = so.numpy_dict()
>>> x["a"] = np.zeros((2,2))
>>> x["b"] = np.ones((2,2))
>>> x["a"]
array([[ 0.,  0.],
       [ 0.,  0.]])
>>> x["b"]
array([[ 1.,  1.],
       [ 1.,  1.]])
>>> dict.__getitem__(x, "a")
'/tmp/tmpxIxt0O'
>>> dict.__getitem__(x, "b")
'/tmp/tmpIviN4M'
>>> from sys import getrefcount as refs
>>> x = np.zeros((2,2))
>>> refs(x)
2
>>> x = so.numpy_dict()
>>> y = np.zeros((2,2))
>>> refs(y)
2
>>> x["c"] = y
>>> refs(y)
2

Here's a simple solution which might work for you. Instead of storing the arrays in the dict (so they're in memory), write them to a file. As long as you're careful with your references, they'll be cleared up by the reference counter until you access them again.

EDIT: You could tweak this by using npz files to store a few keys at a time, especially if you don't need random access.

Code

import tempfile
import numpy

class numpy_dict(dict):
    def __setitem__(self, key, value):
        with tempfile.NamedTemporaryFile(delete=False) as f:
            numpy.save(f, value)
            super(numpy_dict, self).__setitem__(key, f.name)

    def __getitem__(self, key):
        path = super(numpy_dict, self).__getitem__(key)
        return numpy.load(path)

Example usage

>>> import so
>>> import numpy as np
>>> x = so.numpy_dict()
>>> x["a"] = np.zeros((2,2))
>>> x["b"] = np.ones((2,2))
>>> x["a"]
array([[ 0.,  0.],
       [ 0.,  0.]])
>>> x["b"]
array([[ 1.,  1.],
       [ 1.,  1.]])
>>> dict.__getitem__(x, "a")
'/tmp/tmpxIxt0O'
>>> dict.__getitem__(x, "b")
'/tmp/tmpIviN4M'
>>> from sys import getrefcount as refs
>>> x = np.zeros((2,2))
>>> refs(x)
2
>>> x = so.numpy_dict()
>>> y = np.zeros((2,2))
>>> refs(y)
2
>>> x["c"] = y
>>> refs(y)
2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文