3维阵列重塑? HDF5数据集类型?

发布于 2025-01-17 10:27:17 字数 309 浏览 5 评论 0原文

我有以下形状的数据:(127260,2,1250)

此数据的类型为< hdf5数据集“ data”:shape(127260,2,1250),类型为“ < f8“>

第一个维度(127260)是信号数,第二维(2)是信号的类型,第三维(1250)是每个尺寸的点数信号。

我想做的是减少每个信号的点量,将其切成两半,在每个信号上留下625点,然后将信号的量增加一倍

如何将HDF5数据集转换为诸如Numpy数组之类的东西以及如何重塑?

I have data in the following shape: (127260, 2, 1250)

The type of this data is <HDF5 dataset "data": shape (127260, 2, 1250), type "<f8">

The first dimension (127260) is the number of signals, the second dimension (2) is the type of signal, and the third dimension (1250) is the amount of points in each of the signals.

What I wanted to do is reduce the amount of points for each signal, cut them in half, leave 625 points on each signal, and then have double the amount of signals.

How to convert HDF5 dataset to something like numpy array and how to do this reshape?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

陈甜 2025-01-24 10:27:17

如果我理解,您需要一个形状为:(2*127260, 2, 625) 的新数据集。如果是这样,那么将数据集的 2 个切片读入 2 个 NumPy 数组,从切片创建一个新数组,然后写入新数据集是相当简单的。注意:读取切片既简单又快速。我会按原样保留数据并即时执行此操作,除非您有令人信服的理由来创建新的数据集

代码来执行此操作(其中 h5f 是 h5py 文件对象):

new_arr = np.empty((2*127260, 2, 625))
arr1 = h5f['dataset_name'][:,:, :625]
arr2 = h5f['dataset_name'][:,:,  625:]
new_arr[:127260,:,:] = arr1 
new_arr[127260:,:,:] = arr2 
h5f.create_dataset('new_dataset_name',data=new_arr)

或者您可以做到这一点(并结合两个步骤):

new_arr = np.empty((2*127260, 2, 625))
new_arr[:127260,:,:] = h5f['dataset_name'][:,:, :625]
new_arr[127260:,:,:] = h5f['dataset_name'][:,:,  625:]
h5f.create_dataset('new_dataset_name',data=new_arr)

这是第三种方法。这是最直接的方式,并且减少了内存开销。当您拥有无法容纳在内存中的非常大的数据集时,这一点非常重要。

h5f.create_dataset('new_dataset_name',shape=(2*127260, 2, 625),dtype=float)
h5f['new_dataset_name'][:127260,:,:] = h5f['dataset_name'][:,:, :625]
h5f['new_dataset_name'][127260:,:,:] = h5f['dataset_name'][:,:,  625:]

无论您选择哪种方法,我建议添加一个属性来注释数据源以供将来参考:

h5f['new_dataset_name'].attrs['Data Source'] = 'data sliced from dataset_name'

If I understand, you want a new dataset with shape: (2*127260, 2, 625). If so, it's fairly simple to read 2 slices of the dataset into 2 NumPy arrays, create a new array from the slices, then write to a new dataset. Note: reading slices is simple and fast. I would leave the data as-is and do this on-the-fly unless you have a compelling reason to create a new dataset

Code to do this (where h5f is the h5py file object):

new_arr = np.empty((2*127260, 2, 625))
arr1 = h5f['dataset_name'][:,:, :625]
arr2 = h5f['dataset_name'][:,:,  625:]
new_arr[:127260,:,:] = arr1 
new_arr[127260:,:,:] = arr2 
h5f.create_dataset('new_dataset_name',data=new_arr)

Alternately you can do this (and combine 2 steps):

new_arr = np.empty((2*127260, 2, 625))
new_arr[:127260,:,:] = h5f['dataset_name'][:,:, :625]
new_arr[127260:,:,:] = h5f['dataset_name'][:,:,  625:]
h5f.create_dataset('new_dataset_name',data=new_arr)

Here is a 3rd method. It is the most direct way, and reduces the memory overhead. This is important when you have very large datasets that won't fit in memory.

h5f.create_dataset('new_dataset_name',shape=(2*127260, 2, 625),dtype=float)
h5f['new_dataset_name'][:127260,:,:] = h5f['dataset_name'][:,:, :625]
h5f['new_dataset_name'][127260:,:,:] = h5f['dataset_name'][:,:,  625:]

Whichever method you choose, I suggest adding an attribute to note the data source for future reference:

h5f['new_dataset_name'].attrs['Data Source'] = 'data sliced from dataset_name'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文