使用.H5格式读取文件并在数据集中使用

发布于 2025-02-03 15:20:03 字数 521 浏览 4 评论 0原文

我有两个文件夹(一个用于火车,一个用于测试),每个文件夹都有大约10个以H5格式的文件。我想阅读它们并在数据集中使用它们。我有一个可以阅读它们的功能,但是我不知道如何使用它来阅读课堂中的文件。

def read_h5(path):
    data = h5py.File(path, 'r')
    image = data['image'][:]
    label = data['label'][:]
    return image, label

class Myclass(Dataset):
    def __init__(self, split='train', transform=None):
        raise NotImplementedError

    def __len__(self):
        raise NotImplementedError

    def __getitem__(self, index):
        raise NotImplementedError

你有建议吗? 先感谢您

I have two folders( one for train and one for test) and each one has around 10 files in h5 format. I want to read them and use them in a dataset. I have a function to read them, but I don't know how I can use it to read the file in my class.

def read_h5(path):
    data = h5py.File(path, 'r')
    image = data['image'][:]
    label = data['label'][:]
    return image, label

class Myclass(Dataset):
    def __init__(self, split='train', transform=None):
        raise NotImplementedError

    def __len__(self):
        raise NotImplementedError

    def __getitem__(self, index):
        raise NotImplementedError

Do you have a suggestion?
Thank you in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

红尘作伴 2025-02-10 15:20:03

这可能是您想做的事情的开始。我实现了__ INT __(),但不是__ len __()__ get_item __()。用户提供路径,而INIT函数调用类方法read_h5()获取图像和标签数据的数组。从2个不同的H5文件创建类对象的简短主体。使用文件夹和文件名修改路径列表,以进行所有培训和测试数据。

class H5_data():
    def __init__(self, path): #split='train', transform=None):
        self.path = path
        self.image, self.label = H5_data.read_h5(path)

    @classmethod
    def read_h5(cls,path):
        with h5py.File(path, 'r') as data:
            image = data['image'][()]
            label = data['label'][()]
            return image, label
        
paths = ['train_0.h5', 'test_0.h5']
for path in paths:
    h5_test = H5_data(path)
    print(f'For HDF5 file: {path}')
    print(f'image data, shape: {h5_test.image.shape}; dtype: {h5_test.image.dtype}')
    print(f'label data, shape: {h5_test.label.shape}; dtype: {h5_test.label.dtype}')

恕我直言,创建一个带有数组数据的类是过大的(如果您的数据集很大,可能会导致内存问题)。创建H5PY数据集对象并在需要时访问数据是更有效的内存效率。下面的示例与上面的代码相同,而不创建具有Numpy数组的类对象。

paths = ['train_0.h5', 'test_0.h5']
for path in paths:
    with h5py.File(path, 'r') as data:
        image = data['image']
        label = data['label']               
        print(f'For HDF5 file: {path}')
        print(f'image data, shape: {image.shape}; dtype: {image.dtype}')
        print(f'label data, shape: {label.shape}; dtype: {label.dtype}')

This might be a start for what you want to do. I implemented the __init__(), but not __len__() or __get_item__(). User provides the path, and the init function calls the class method read_h5() to get the arrays of image and label data. There is a short main to create a class objects from 2 different H5 files. Modify the paths list with folder and filenames for all of your training and testing data.

class H5_data():
    def __init__(self, path): #split='train', transform=None):
        self.path = path
        self.image, self.label = H5_data.read_h5(path)

    @classmethod
    def read_h5(cls,path):
        with h5py.File(path, 'r') as data:
            image = data['image'][()]
            label = data['label'][()]
            return image, label
        
paths = ['train_0.h5', 'test_0.h5']
for path in paths:
    h5_test = H5_data(path)
    print(f'For HDF5 file: {path}')
    print(f'image data, shape: {h5_test.image.shape}; dtype: {h5_test.image.dtype}')
    print(f'label data, shape: {h5_test.label.shape}; dtype: {h5_test.label.dtype}')

IMHO, creating a class with the array data is overkill (and could lead to memory problems if you have really large datasets). It is more memory efficient to create h5py dataset objects, and access the data when you need it. Example below does the same as code above, without creating a class object with numpy arrays.

paths = ['train_0.h5', 'test_0.h5']
for path in paths:
    with h5py.File(path, 'r') as data:
        image = data['image']
        label = data['label']               
        print(f'For HDF5 file: {path}')
        print(f'image data, shape: {image.shape}; dtype: {image.dtype}')
        print(f'label data, shape: {label.shape}; dtype: {label.dtype}')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文