将 h5py 数据集输出到文本的快速方法是什么?
我正在使用 h5py python 包来读取 HDF5 格式的文件。 (例如somefile.h5) 我想将数据集的内容写入文本文件。
例如,我想创建一个包含以下内容的文本文件: 1,20,31,75,142,324,78,12,3,90,8,21,1
我可以使用以下代码访问 python 中的数据集:
import h5py
f = h5py.File('/Users/Me/Desktop/thefile.h5', 'r')
group = f['/level1/level2/level3']
dset = group['dsetname']
我天真的方法太慢了,因为我的数据集有超过 20000 个条目:
# write all values to file
for index in range(len(dset)):
# do not add comma after last value
if index == len(dset)-1: txtfile.write(repr(dset[index]))
else: txtfile.write(repr(dset[index])+',')
txtfile.close()
return None
是有没有更快的方法将其写入文件?也许我可以将数据集转换为 NumPy 数组甚至 Python 列表,然后使用一些文件写入工具?
(我可以在写入文件之前尝试将这些值连接成一个更大的字符串,但我希望有一些完全更优雅的东西)
I am using the h5py python package to read files in HDF5 format. (e.g. somefile.h5)
I would like to write the contents of a dataset to a text file.
For example, I would like to create a text file with the following contents:
1,20,31,75,142,324,78,12,3,90,8,21,1
I am able to access the dataset in python using this code:
import h5py
f = h5py.File('/Users/Me/Desktop/thefile.h5', 'r')
group = f['/level1/level2/level3']
dset = group['dsetname']
My naive approach is too slow, because my dataset has over 20000 entries:
# write all values to file
for index in range(len(dset)):
# do not add comma after last value
if index == len(dset)-1: txtfile.write(repr(dset[index]))
else: txtfile.write(repr(dset[index])+',')
txtfile.close()
return None
Is there a faster way to write this to a file? Perhaps I could convert the dataset into a NumPy array or even a Python list, and then use some file-writing tool?
(I could experiment with concatenating the values into a larger string before writing to file, but I'm hoping there's something entirely more elegant)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
构建一个大字符串具有巨大的优势,可以节省愚蠢的“最后一次切换”的需要,这要归功于出色的字符串
join
方法:要替换整个循环,我不确定需要多少您要求您的代码更加优雅......;-)
Building a large string has the huge advantage of saving the need for the goofy "last-time switch" thanks to the excellent
join
method of strings: to replace your whole loop,I'm not sure how much more elegant you demand your code to be...;-)
您最初的怀疑是正确的,首先将其转换为 Numpy 数组,然后将该数组转储为 ASCII。
这将比迭代组对象本身快得多。
Your original suspicion was correct, first convert it to a Numpy array, and then dump that array to ASCII.
This will be dramatically faster than iterating over the group object itself.
也许对 HDF5 文件使用 h5dump ?
我使用(bash)
maybe use h5dump on the HDF5 file?
I use (bash)
哦,我做了同样的事情,我找到了方法。
如果你想像这样访问,
这是非常非常非常慢的。这样做。
只有这一点微小的改变才能取得成功。
Oh I do the same thing and I find the way.
If you want to access for example like this
This is very very very slow.Do like this.
Only this slight change will make success.