将 h5py 数据集输出到文本的快速方法是什么？

发布于 2024-09-04 19:45:26 字数 800 浏览 10 评论 0原文

我正在使用 h5py python 包来读取 HDF5 格式的文件。（例如somefile.h5）我想将数据集的内容写入文本文件。

例如，我想创建一个包含以下内容的文本文件： 1,20,31,75,142,324,78,12,3,90,8,21,1

我可以使用以下代码访问 python 中的数据集：

import h5py
f     = h5py.File('/Users/Me/Desktop/thefile.h5', 'r')
group = f['/level1/level2/level3']
dset  = group['dsetname']

我天真的方法太慢了，因为我的数据集有超过 20000 个条目：

# write all values to file        
for index in range(len(dset)):
        # do not add comma after last value
        if index == len(dset)-1: txtfile.write(repr(dset[index]))
        else:                    txtfile.write(repr(dset[index])+',')
txtfile.close()
    return None

是有没有更快的方法将其写入文件？也许我可以将数据集转换为 NumPy 数组甚至 Python 列表，然后使用一些文件写入工具？

（我可以在写入文件之前尝试将这些值连接成一个更大的字符串，但我希望有一些完全更优雅的东西）

原文

I am using the h5py python package to read files in HDF5 format. (e.g. somefile.h5)
I would like to write the contents of a dataset to a text file.

For example, I would like to create a text file with the following contents:
1,20,31,75,142,324,78,12,3,90,8,21,1

I am able to access the dataset in python using this code:

import h5py
f     = h5py.File('/Users/Me/Desktop/thefile.h5', 'r')
group = f['/level1/level2/level3']
dset  = group['dsetname']

My naive approach is too slow, because my dataset has over 20000 entries:

# write all values to file        
for index in range(len(dset)):
        # do not add comma after last value
        if index == len(dset)-1: txtfile.write(repr(dset[index]))
        else:                    txtfile.write(repr(dset[index])+',')
txtfile.close()
    return None

Is there a faster way to write this to a file? Perhaps I could convert the dataset into a NumPy array or even a Python list, and then use some file-writing tool?

(I could experiment with concatenating the values into a larger string before writing to file, but I'm hoping there's something entirely more elegant)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无声静候 2024-09-11 19:45:26

构建一个大字符串具有巨大的优势，可以节省愚蠢的“最后一次切换”的需要，这要归功于出色的字符串 join 方法：要替换整个循环，

txtfile.write(','.join(repr(item) for item in dset))

我不确定需要多少您要求您的代码更加优雅......;-)

Building a large string has the huge advantage of saving the need for the goofy "last-time switch" thanks to the excellent join method of strings: to replace your whole loop,

txtfile.write(','.join(repr(item) for item in dset))

I'm not sure how much more elegant you demand your code to be...;-)

回复收藏 0 原文

彼岸花ソ最美的依靠 2024-09-11 19:45:26

您最初的怀疑是正确的，首先将其转换为 Numpy 数组，然后将该数组转储为 ASCII。

my_data = my_h5_group['dsetname'].value # is now a Numpy array
my_data.tofile("my_data.txt")

这将比迭代组对象本身快得多。

Your original suspicion was correct, first convert it to a Numpy array, and then dump that array to ASCII.

my_data = my_h5_group['dsetname'].value # is now a Numpy array
my_data.tofile("my_data.txt")

This will be dramatically faster than iterating over the group object itself.

回复收藏 0 原文

长途伴 2024-09-11 19:45:26

也许对 HDF5 文件使用 h5dump ？

我使用（bash）

(h5dump -y -o /dev/stderr -d $dataset $infile >$errorout) 2>&1 | sed -e 's/, /\n/g' -e 's/,$//' | sed 's/ //g' > $outfile 2> $errorout

maybe use h5dump on the HDF5 file?

I use (bash)

(h5dump -y -o /dev/stderr -d $dataset $infile >$errorout) 2>&1 | sed -e 's/, /\n/g' -e 's/,$//' | sed 's/ //g' > $outfile 2> $errorout

回复收藏 0 原文

你列表最软的妹 2024-09-11 19:45:26

哦，我做了同样的事情，我找到了方法。
如果你想像这样访问，

print( hdf5['a'][i][j][k] )

这是非常非常非常慢的。这样做。

arr=hdf5[:] # at the out of loop
print( arr[i][j][k] ) # in the loop

只有这一点微小的改变才能取得成功。

Oh I do the same thing and I find the way.
If you want to access for example like this

print( hdf5['a'][i][j][k] )

This is very very very slow.Do like this.

arr=hdf5[:] # at the out of loop
print( arr[i][j][k] ) # in the loop

Only this slight change will make success.

回复收藏 0 原文

~没有更多了~

关于作者

故人的歌

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

将 h5py 数据集输出到文本的快速方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

将 h5py 数据集输出到文本的快速方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。