当前位置：文江博客话题详情

Pytorch：如何从数据加载器获取前 N 项

发布于 2025-01-10 05:18:51 字数 144 浏览 0 评论 0原文

我的列表中有 3000 张图片，但我只想要其中的前 N 张（例如 1000 张）用于训练。我想知道如何通过更改循环代码来实现此目的：

for (image, label) in enumerate(train_loader):

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

慢慢从新开始 2025-01-17 05:18:51

for (image, label) in list(enumerate(train_loader))[:1000]:

但这并不是划分训练和验证数据的好方法。
首先，dataloader 类支持延迟加载（示例在需要时才加载到内存中），而转换为列表将需要将所有数据加载到内存中，这可能会触发内存不足错误。其次，如果 dataloader 进行了混排，则可能不会总是返回相同的 1000 个元素。一般来说，dataloader 类不支持索引，因此并不适合选择数据集的特定子集。转换为列表可以解决此问题，但会牺牲 dataloader 类的有用属性。

最佳实践是使用单独的 data.dataset 对象进行训练和验证分区，或者至少对数据集中的数据进行分区，而不是依赖于在前 1000 个示例后停止训练。然后，为训练分区和验证分区创建单独的数据加载器。

for (image, label) in list(enumerate(train_loader))[:1000]:

This is not a good way to partition training and validation data though.
First, the dataloader class supports lazy loading (examples are not loaded into memory until needed) whereas casting as a list will require all data to be loaded into memory, likely triggering an out-of-memory error. Second, this may not always return the same 1000 elements if the dataloader has shuffling. In general, the dataloader class does not support indexing so is not really suitable for selecting a specific subset of our dataset. Casting as a list works around this but at the expense of the useful attributes of the dataloader class.

Best practice is to use a separate data.dataset object for the training and validation partitions, or at least to partition the data in the dataset rather than relying on stopping the training after the first 1000 examples. Then, create a separate dataloader for the training partition and validation partition.

回复收藏 0 原文

东京女 2025-01-17 05:18:51

要从 train_loader 获取第一个 N 个项目，可以调用数据加载器的 __iter__() 方法，逐个遍历每个项目__next__()，并将其包装在 for 循环中：

N = 1000    
dataiter = iter(train_loader)

image_list = []
label_list = []
#assume batch size equal to 1, otherwise divide N by batch size
for i in range(0, N): 
  image, label = next(dataiter)
  image_list.append(image)
  label_list.append(label)

To get the first N item from train_loader, one can call the __iter__() method of the dataloader, go through each item one by one through __next__(), and wrap it in a for loop:

N = 1000    
dataiter = iter(train_loader)

image_list = []
label_list = []
#assume batch size equal to 1, otherwise divide N by batch size
for i in range(0, N): 
  image, label = next(dataiter)
  image_list.append(image)
  label_list.append(label)

回复收藏 0 原文

~没有更多了~