NumPy 数组与 SQLite
我在 Python 中见过的最常见的 SQLite 接口是 sqlite3,但是有什么可以与 NumPy 数组或 rearray 配合良好的吗?我的意思是,它可以识别数据类型,不需要逐行插入,并提取到 NumPy (rec) 数组中......?有点像 RDB 或 sqldf 库中的 R SQL 函数,如果有人熟悉这些函数的话(它们将整个表或表的子集导入/导出/附加到 R 或从 R 导出)数据表)。
The most common SQLite interface I've seen in Python is sqlite3
, but is there anything that works well with NumPy arrays or recarrays? By that I mean one that recognizes data types and does not require inserting row by row, and extracts into a NumPy (rec)array...? Kind of like R's SQL functions in the RDB
or sqldf
libraries, if anyone is familiar with those (they import/export/append whole tables or subsets of tables to or from R data tables).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我发现至少三个Python包到接口SQLite 和 NumPy:
这些包中的每一个都必须处理 SQLite 的问题(通过默认)仅理解标准Python类型 而不是 NumPy 数据类型,例如numpy.int64。
RecSQL 0.7.8+ 对我有用(大多数时候),但我认为这是一个非常糟糕的黑客并浏览了代码,esutil.sqlite_util 似乎更成熟。
I found at least three Python packages to interface SQLite and NumPy:
Each of these packages has to deal with the problem that SQLite (by default) only understands standard Python types and not the NumPy data types such as numpy.int64.
RecSQL 0.7.8+ works for me (most of the time) but I consider it a pretty bad hack and glancing over the code, esutil.sqlite_util appears to be more mature.
为什么不尝试一下 redis 呢?
您感兴趣的两个平台的驱动程序可用 - python(redis,通过包索引]2 )和 R(rredis、CRAN)。
Redis 的天才之处不在于它能够神奇地识别 NumPy 数据类型并允许您插入和提取多维 NumPy 数组,就好像它们是本机 Redis 数据类型一样,而是在于它的非凡易用性您只需几行代码即可创建这样的界面。
(至少)有几个关于 python 中的 redis 的教程; DeGizmo 博客 上的那个特别好。
why not give redis a try?
Drivers for your two platforms of interest are available--python (redis, via package index]2), and R (rredis, CRAN).
The genius of redis is not that it will magically recognize the NumPy data type and allow you to insert and extract multi-dimensional NumPy arrays as if they were native redis datatypes, rather its genius is in the remarkable ease with which you can create such an interface with just a few lines of code.
There are (at least) several tutorials on redis in python; the one on the DeGizmo blog is particularly good.
Doug 对 redis 的建议非常好,但我认为他的代码有点复杂,因此速度相当慢。出于我的目的,我必须在不到十分之一秒的时间内序列化+写入,然后抓取+反序列化大约一百万个浮点数的方阵,所以我这样做了:
对于写入:
然后对于读取:
您可以执行一些基本性能使用 %time 使用 ipython 进行测试,但 tobytes 或 frombuffer 都不需要超过几毫秒。
Doug's suggestion with redis is quite good, but I think his code is a bit complicated and, as a result, rather slow. For my purposes, I had to serialize+write and then grab+deserialize a square matrix of about a million floats in less than a tenth of a second, so I did this:
For writing:
Then for reads:
You can do some basic performance testing with ipython using %time, but neither the tobytes or frombuffer take more than a few milliseconds.
这看起来有点旧,但是有什么理由不能只执行 fetchall() 而不是迭代,然后在声明时初始化 numpy ?
This looks a bit older but is there any reason you cannot just do a fetchall() instead of iterating and then just initializing numpy on declaration?