h5py:read_direct到多维numpy阵列返回value error
我正在使用read_direct将大量向量从H5文件复制到单个2D numpy数组中。很大的是数百万分。 Read_Direct显然比切片符号要快,因为它避免了中间副本。
我的第一个尝试是:
def _harvest_data(grp: h5py.Group) -> np.array:
data = np.zeros((64, grp['times'].shape[0]))
index = 0
for key, value in grp.items():
if 'X' in key:
value.read_direct(data, source_sel=None, dest_sel=np.s_[index, :])
index += 1
return data.mean(axis=0)
这返回一个错误:
value error:2个索引参数1维度
参数为value.read_direct行。我不明白的是为什么它给我这个错误。数据阵列为2D,因此给出2D索引似乎是完全敏感的。如果我更改为dest_sel = np.s _ [:]
每个数据集将被复制到第一行数据中,这显然不是我想要的。
解决以下工作是:
def _harvest_data(grp: h5py.Group) -> np.array:
data = np.zeros((64, grp['times'].shape[0]))
index = 0
for key, value in grp.items():
if 'X' in key:
value.read_direct(data[index, :], source_sel=None, dest_sel=None)
index += 1
return data.mean(axis=0)
这起作用,但我不明白为什么第一次尝试不进行。
不幸的是,我尝试使用KCW78的答案,
def _harvest_data(grp: h5py.Group) -> np.array:
data = np.zeros((64, grp['times'].shape[0]))
index = 0
for key, value in grp.items():
if 'X' in key:
value.read_direct(data,
source_sel=None,
dest_sel=np.s_[index:index+1, :])
index += 1
return data
它给出了与我的第一次尝试相同的KeyError。
I am using read_direct to copy large vectors from an h5 file into a single 2D numpy array. Large being millions of points. read_direct is apparently faster than slicing notation because it avoids an intermediate copy.
My first attempt was:
def _harvest_data(grp: h5py.Group) -> np.array:
data = np.zeros((64, grp['times'].shape[0]))
index = 0
for key, value in grp.items():
if 'X' in key:
value.read_direct(data, source_sel=None, dest_sel=np.s_[index, :])
index += 1
return data.mean(axis=0)
This returns an error, though:
ValueError: 2 indexing arguments for 1 dimensions
The line given is the value.read_direct line. What I don't understand is why it is giving me this error. The data array is 2D, so giving it a 2D index seems perfectly sensible. If I change to dest_sel=np.s_[:]
every dataset will be copied into the first row of data, which is obviously not what I want.
A work around is to do the following:
def _harvest_data(grp: h5py.Group) -> np.array:
data = np.zeros((64, grp['times'].shape[0]))
index = 0
for key, value in grp.items():
if 'X' in key:
value.read_direct(data[index, :], source_sel=None, dest_sel=None)
index += 1
return data.mean(axis=0)
This works, but I do not understand why the first attempt doesn't.
Working with kcw78's answer, I tried this
def _harvest_data(grp: h5py.Group) -> np.array:
data = np.zeros((64, grp['times'].shape[0]))
index = 0
for key, value in grp.items():
if 'X' in key:
value.read_direct(data,
source_sel=None,
dest_sel=np.s_[index:index+1, :])
index += 1
return data
It gives the same KeyError as my first attempt, unfortunately.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Note :回答更新的May-27-2022,其中包括一个示例,该示例从多个1-D数据集读取到2-D阵列(更紧密地模拟OP的工作流程)。
作为参考,我正在使用
H5PY .__版本__
'3.3.0'
在
np.s _
的numpy文档中的注释引起了我的注意。它说:所以,最初,我认为您需要使用
np.index_exp
而不是np.s _
才能获得H5PY的始终切片。但是,一些测试表明它稍微复杂一些。我无法解释基本的numpy代码...但这将导致解决方案。这是一个新示例,可以演示从1-D数据集(
shape =(10,)
)到2-D阵列的阅读(shape =(5,10)
)。它读取从2个不同数据集到数组的第一行和最后一行的值。它无用于源切片。测试了三种不同的指定目的地切片的方式。所有3个工作。这是从2-D数据集到2-D数组的上一个示例(两者= 100x100)。
arr_out
的输出来自这两种方法:(1)用切片符号读取数据集,以及(2)使用read_direct()
带有np.np.s _ )
Note: Answer updated May-27-2022 to include an example that reads from multiple 1-d datasets to a 2-d array (more closely mimics OP's workflow).
For reference, I am using
h5py.__version__
'3.3.0'
A note in the numpy documentation for
np.s_
caught my eye. It says:So, initially I thought you need to use
np.index_exp
instead ofnp.s_
to get always slice tuples for h5py. However, a little testing shows it's slightly more complicated. I can't explain the underlying numpy code...but this will lead to a solution.This is a new example to demonstrate reading from a 1-d dataset (
shape=(10,)
) to a 2-d array (shape=(5,10)
). It reads values from 2 different datasets to first and last rows of the array. It uses None for source slice. Three different ways of specifying destination slice were tested. All 3 work.This is the previous example that reads from a 2-D dataset to a 2-D array (both shape=100x100).
Output of
arr_out
from both methods: (1) read the dataset with slice notation, and (2) usingread_direct()
with slices fromnp.s_
)