什么数据库对于替换 numpy 数组的字典有用?
我的代码正在创建一个 dict
(以字符串作为键,以 numpy 数组作为值),它太大,无法放入 RAM,因此程序崩溃('无法分配内存','杀死' ,“中止”)。
阅读了一些 SO 文章后,我得到的印象是,我需要使用数据库来处理这种情况。但我应该使用哪一个呢? bsddb - Berkeley DB 库的接口 推荐 @ 基于Python磁盘的字典仅接受字符串作为值,这确实使它看起来很与 numpy 数组一起使用很麻烦。我还简要地查看了 sqlite3 推荐@ 如何使用 Python 处理内存不足,但我真的很想避免使用 SQL 访问我的数据。
你会推荐什么?
My code is creating a dict
(with strings as keys and numpy arrays as values), which is too big too fit into the RAM, so the program is crashing ('Cannot allocate memory','killed','aborted').
Having read some SO articles, I got the impression, that I would need to use a database to handle this case. But which one should I use? The bsddb - Interface to Berkeley DB library recommended @ Python Disk-Based Dictionary only accepts strings as values, which does make it seem very cumbersome to use it with numpy arrays. I also looked briefly at sqlite3 recommended @ How to handle Out of memory with Python, but I would really like to avoid using SQL do access my data.
What would you recommend?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
鉴于您的查询模式非常简单,sqlite 看起来很完美。
我经常提到的另一个选项是 redis ( http://redis.io ),一个键值服务器。
Memcached ( http://memcached.org/ ) 和 MongoDB ( http://www.mongodb.org/ )是其他流行的 NoSQL 数据库。
如果您不喜欢这些,请搜索 NoSQL 看看还有哪些其他项目。
sqlite would seem perfect, given that your query pattern will be very simple.
Another option which I frequently mention is redis ( http://redis.io ), a key-value server.
Memcached ( http://memcached.org/ ) and MongoDB ( http://www.mongodb.org/ ) are other popular NoSQL databases.
If none of these take your fancy, google NoSQL to see what other projects are out there.
这是一个可能适合您的简单解决方案。不要将数组存储在
dict
中(因此它们位于内存中),而是将它们写入文件。只要您小心处理引用,它们就会被引用计数器清除,直到您再次访问它们。编辑:您可以通过使用
npz
文件一次存储几个密钥来调整此设置,特别是在不需要随机访问的情况下。代码
示例用法
Here's a simple solution which might work for you. Instead of storing the arrays in the
dict
(so they're in memory), write them to a file. As long as you're careful with your references, they'll be cleared up by the reference counter until you access them again.EDIT: You could tweak this by using
npz
files to store a few keys at a time, especially if you don't need random access.Code
Example usage