Pytorch:如何从数据加载器获取前 N 项
我的列表中有 3000 张图片,但我只想要其中的前 N 张(例如 1000 张)用于训练。 我想知道如何通过更改循环代码来实现此目的:
for (image, label) in enumerate(train_loader):
There are 3000 pictures in my list, but I only want the first N of them, like 1000, for training.
I wonder how can I achieve this by changing the loop code:
for (image, label) in enumerate(train_loader):
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
但这并不是划分训练和验证数据的好方法。
首先,dataloader 类支持延迟加载(示例在需要时才加载到内存中),而转换为列表将需要将所有数据加载到内存中,这可能会触发内存不足错误。其次,如果
dataloader
进行了混排,则可能不会总是返回相同的 1000 个元素。一般来说,dataloader 类不支持索引,因此并不适合选择数据集的特定子集。转换为列表可以解决此问题,但会牺牲dataloader
类的有用属性。最佳实践是使用单独的 data.dataset 对象进行训练和验证分区,或者至少对数据集中的数据进行分区,而不是依赖于在前 1000 个示例后停止训练。然后,为训练分区和验证分区创建单独的数据加载器。
This is not a good way to partition training and validation data though.
First, the
dataloader
class supports lazy loading (examples are not loaded into memory until needed) whereas casting as a list will require all data to be loaded into memory, likely triggering an out-of-memory error. Second, this may not always return the same 1000 elements if thedataloader
has shuffling. In general, thedataloader
class does not support indexing so is not really suitable for selecting a specific subset of our dataset. Casting as a list works around this but at the expense of the useful attributes of thedataloader
class.Best practice is to use a separate
data.dataset
object for the training and validation partitions, or at least to partition the data in the dataset rather than relying on stopping the training after the first 1000 examples. Then, create a separate dataloader for the training partition and validation partition.要从
train_loader
获取第一个N
个项目,可以调用数据加载器的__iter__()
方法,逐个遍历每个项目__next__()
,并将其包装在 for 循环中:To get the first
N
item fromtrain_loader
, one can call the__iter__()
method of the dataloader, go through each item one by one through__next__()
, and wrap it in a for loop: