NumPy 数组与 SQLite

发布于 2024-12-12 03:56:23 字数 208 浏览 0 评论 0原文

我在 Python 中见过的最常见的 SQLite 接口是 sqlite3，但是有什么可以与 NumPy 数组或 rearray 配合良好的吗？我的意思是，它可以识别数据类型，不需要逐行插入，并提取到 NumPy (rec) 数组中......？有点像 RDB 或 sqldf 库中的 R SQL 函数，如果有人熟悉这些函数的话（它们将整个表或表的子集导入/导出/附加到 R 或从 R 导出）数据表）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

握住你手 2024-12-19 03:56:24

我发现至少三个Python包到接口SQLite 和 NumPy：

esutil.sqlite_util，它是 esutil 的一部分
< a href="http://svn.scipy.org/svn/scikits/branches/pierregm/Hydroclimpy/scikits/Hydroclimpy/io/sqlite.py" rel="nofollow">Hydroclimpy.io.sqlite
我自己的 RecSQL （可从 < a href="https://github.com/orbeckst/RecSQL" rel="nofollow">github.com/orbeckst/RecSQL)

这些包中的每一个都必须处理 SQLite 的问题（通过默认）仅理解标准Python类型而不是 NumPy 数据类型，例如numpy.int64。

RecSQL 0.7.8+ 对我有用（大多数时候），但我认为这是一个非常糟糕的黑客并浏览了代码，esutil.sqlite_util 似乎更成熟。

回复收藏 0 原文

海未深 2024-12-19 03:56:23

为什么不尝试一下 redis 呢？

您感兴趣的两个平台的驱动程序可用 - python（redis，通过包索引]2 ）和 R（rredis、CRAN）。

Redis 的天才之处不在于它能够神奇地识别 NumPy 数据类型并允许您插入和提取多维 NumPy 数组，就好像它们是本机 Redis 数据类型一样，而是在于它的非凡易用性您只需几行代码即可创建这样的界面。

（至少）有几个关于 python 中的 redis 的教程； DeGizmo 博客上的那个特别好。

import numpy as NP

# create some data
A = NP.random.randint(0, 10, 40).reshape(8, 5)

# a couple of utility functions to (i) manipulate NumPy arrays prior to insertion 
# into redis db for more compact storage & 
# (ii) to restore the original NumPy data types upon retrieval from redis db
fnx2 = lambda v : map(int, list(v))
fnx = lambda v : ''.join(map(str, v))

# start the redis server (e.g. from a bash prompt)
gt; cd /usr/local/bin      # default install directory for 'nix
gt; redis-server           # starts the redis server

# start the redis client:
from redis import Redis
r0 = Redis(db=0, port=6379, host='localhost')       # same as: r0 = Redis()

# to insert items using redis 'string' datatype, call 'set' on the database, r0, and
# just pass in a key, and the item to insert
r0.set('k1', A[0,:])

# row-wise insertion the 2D array into redis, iterate over the array:
for c in range(A.shape[0]):
    r0.set( "k{0}".format(c), fnx(A[c,:]) )

# or to insert all rows at once
# use 'mset' ('multi set') and pass in a key-value mapping: 
x = dict([sublist for sublist in enumerate(A.tolist())])
r0.mset(x1)

# to retrieve a row, pass its key to 'get'
>>> r0.get('k0')
  '63295'

# retrieve the entire array from redis:
kx = r0.keys('*')           # returns all keys in redis database, r0

for key in kx :
    r0.get(key)

# to retrieve it in original form:
A = []
for key in kx:
    A.append(fnx2(r0.get("{0}".format(key))))

>>> A = NP.array(A)
>>> A
  array([[ 6.,  2.,  3.,  3.,  9.],
         [ 4.,  9.,  6.,  2.,  3.],
         [ 3.,  7.,  9.,  5.,  0.],
         [ 5.,  2.,  6.,  3.,  4.],
         [ 7.,  1.,  5.,  0.,  2.],
         [ 8.,  6.,  1.,  5.,  8.],
         [ 1.,  7.,  6.,  4.,  9.],
         [ 6.,  4.,  1.,  3.,  6.]])

why not give redis a try?

Drivers for your two platforms of interest are available--python (redis, via package index]2), and R (rredis, CRAN).

The genius of redis is not that it will magically recognize the NumPy data type and allow you to insert and extract multi-dimensional NumPy arrays as if they were native redis datatypes, rather its genius is in the remarkable ease with which you can create such an interface with just a few lines of code.

There are (at least) several tutorials on redis in python; the one on the DeGizmo blog is particularly good.

import numpy as NP

# create some data
A = NP.random.randint(0, 10, 40).reshape(8, 5)

# a couple of utility functions to (i) manipulate NumPy arrays prior to insertion 
# into redis db for more compact storage & 
# (ii) to restore the original NumPy data types upon retrieval from redis db
fnx2 = lambda v : map(int, list(v))
fnx = lambda v : ''.join(map(str, v))

# start the redis server (e.g. from a bash prompt)
gt; cd /usr/local/bin      # default install directory for 'nix
gt; redis-server           # starts the redis server

# start the redis client:
from redis import Redis
r0 = Redis(db=0, port=6379, host='localhost')       # same as: r0 = Redis()

# to insert items using redis 'string' datatype, call 'set' on the database, r0, and
# just pass in a key, and the item to insert
r0.set('k1', A[0,:])

# row-wise insertion the 2D array into redis, iterate over the array:
for c in range(A.shape[0]):
    r0.set( "k{0}".format(c), fnx(A[c,:]) )

# or to insert all rows at once
# use 'mset' ('multi set') and pass in a key-value mapping: 
x = dict([sublist for sublist in enumerate(A.tolist())])
r0.mset(x1)

# to retrieve a row, pass its key to 'get'
>>> r0.get('k0')
  '63295'

# retrieve the entire array from redis:
kx = r0.keys('*')           # returns all keys in redis database, r0

for key in kx :
    r0.get(key)

# to retrieve it in original form:
A = []
for key in kx:
    A.append(fnx2(r0.get("{0}".format(key))))

>>> A = NP.array(A)
>>> A
  array([[ 6.,  2.,  3.,  3.,  9.],
         [ 4.,  9.,  6.,  2.,  3.],
         [ 3.,  7.,  9.,  5.,  0.],
         [ 5.,  2.,  6.,  3.,  4.],
         [ 7.,  1.,  5.,  0.,  2.],
         [ 8.,  6.,  1.,  5.,  8.],
         [ 1.,  7.,  6.,  4.,  9.],
         [ 6.,  4.,  1.,  3.,  6.]])

回复收藏 0 原文

装迷糊 2024-12-19 03:56:23

Doug 对 redis 的建议非常好，但我认为他的代码有点复杂，因此速度相当慢。出于我的目的，我必须在不到十分之一秒的时间内序列化+写入，然后抓取+反序列化大约一百万个浮点数的方阵，所以我这样做了：

对于写入：

snapshot = np.random.randn(1024,1024)
serialized = snapshot.tobytes()
rs.set('snapshot_key', serialized)

然后对于读取：

s = rs.get('snapshot_key')
deserialized = np.frombuffer(s).astype(np.float32)
rank = np.sqrt(deserialized.size).astype(int)
snap = deserialized(rank, rank)

您可以执行一些基本性能使用 %time 使用 ipython 进行测试，但 tobytes 或 frombuffer 都不需要超过几毫秒。

Doug's suggestion with redis is quite good, but I think his code is a bit complicated and, as a result, rather slow. For my purposes, I had to serialize+write and then grab+deserialize a square matrix of about a million floats in less than a tenth of a second, so I did this:

For writing:

snapshot = np.random.randn(1024,1024)
serialized = snapshot.tobytes()
rs.set('snapshot_key', serialized)

Then for reads:

s = rs.get('snapshot_key')
deserialized = np.frombuffer(s).astype(np.float32)
rank = np.sqrt(deserialized.size).astype(int)
snap = deserialized(rank, rank)

You can do some basic performance testing with ipython using %time, but neither the tobytes or frombuffer take more than a few milliseconds.

回复收藏 0 原文