Pytorch中使用图像增强技术也会增加本地机器上的数据集大小
我正在培训Pytorch的定制模型,数据集非常不平衡。在有10个类中,有些类只有800张图像,而有些则有4000张图像。我发现图像增强是解决我的问题以避免过度拟合的解决方案。但是我之间在实施之间感到困惑,以下代码用于更改图像的功能
loader_transform = transforms.Compose([
transforms.RandomRotation(30),
transforms.RandomResizedCrop(140),
transforms.RandomHorizontalFlip()
])
,但是在训练时,它显示了与新创建的增强数据集Go相同的图像数量相同的图像。而且,如果我想将其保存在本地机器上,甚至可以让所有课程都可以做什么?
I was training a custom model in pytorch and the dataset was very uneven. As in there are 10 classes for which some class have only 800 images while some have 4000 images. I found that image augmentation was a solution for my problem to avoid overfitting. But i got confused in between while implementing, the below codes were used to alter the features of the images
loader_transform = transforms.Compose([
transforms.RandomRotation(30),
transforms.RandomResizedCrop(140),
transforms.RandomHorizontalFlip()
])
but while training it shows the same original number of images where did the newly created augmented dataset go. And if i want to save it on my local machine and to make all classes even what can be done??
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看起来您正在使用
在线增强
,如果您想使用离线
请执行保存图像的预处理步骤,然后在培训步骤中使用它们,请进行 区别
当然,您了解
在线增强
和离线增强
离线或预处理增强以增加数据集的大小,应用增强功能的 作为预处理步骤。通常,当我们想扩展小型培训数据集时,我们会这样做。
应用于较大的数据集时,我们必须考虑磁盘空间
在线或实时增强
该增强正在通过随机增强实时应用。
由于不需要将增强图像保存在磁盘上,因此该方法通常应用于大型数据集。
在每个时期,在线增强模型将看到不同的图像。
It looks like you are using
online augmentations
, If you like to useoffline
please do a pre-processing step that saves the images and then use them in the training stepPlease make sure you understand the difference between
online augmentations
andoffline augmentations
Offline or pre-processing Augmentation
To increase the size of the data set, enhancement is applied as a pre-processing step. Usually, we do this when we want to expand a small training data set.
When applying to larger data sets, we have to consider disk space
Online or real-time Augmentation
The augmentation is being applied in real-time through random augmentations.
Since the augmented images do not need to be saved on the disk, this method is usually applied to large data sets.
At each epoch, the online augmentation model will see a different image.
很难看不见您的数据集/数据集加载程序,但是我怀疑您只是将转换应用于数据集,这不会更改数据集大小,只需增加现有图像即可。如果您想平衡课程,则添加采样器似乎是最简单的解决方案。
这是我用于此目的的(有些简化的)代码,利用pandas,collections和
torch.utils.data.weightedrandsampler
。可能不是最好的,但可以完成工作:在这种情况下,最终火车尺寸将是原件的2倍,但是您也可以使用较小的尺寸进行实验,无论选择的尺寸如何,班级表示都会平衡。
Hard to tell without seeing your dataset/dataloader, but I suspect you're simply applying transformations to your dataset, this won't change the dataset size, just augment the existing images. If you wish to balance classes, adding a sampler seems the easiest solution.
Here's (somewhat simplified) code I use for this purpose, utilizing pandas, collections and
torch.utils.data.WeightedRandomSampler
. Likely not the best out there but it gets the job done:Final train size will be 2x the original in this case, however you may experiment with smaller sizes too, the class representation will be balanced regardless of the size chosen.