使用 Yolov5 从合成数据到现实生活数据进行对象检测

发布于 2025-01-18 18:49:41 字数 776 浏览 1 评论 0原文

目前正在尝试使用自定义合成数据的 yolov5。我们创建的数据集由 8 个不同的对象组成。每个对象至少有 1500 张图片/标签,其中图片在对象周围分为 500/500/500 个正常/雾/干扰物。数据集中的示例图像位于第一个 imgur 链接中。该模型不是从头开始训练的,而是从yolov5标准.pt开始训练的。

到目前为止,我们已经尝试过:

  • 添加更多数据(从每个对象 300 个图像到 4500 个)
  • 创建更复杂的数据(对象上/周围的干扰因素)
  • 运行多次训练
  • 使用网络大小小、中、大、超大进行训练
  • 不同的批量大小4-32 之间(取决于模型大小)

到目前为止,一切都对合成数据产生了良好/出色的检测,但在实际数据上使用时完全关闭。 示例:认为不相关物体的整个图片是纸盒,墙壁是托盘等。最后一个 imgur 链接中的快速示例图像。

有人知道如何改进训练或数据以更好地适合现实生活中的检测吗?或者如何更好地解读结果?我不明白该模型如何得出这样的结论:具有不相关对象的整个图片是一个盒子/托盘。

训练结果上传至 imgur: https://i.sstatic.net/z4GbR.jpg

现实生活数据示例: https://i.sstatic.net/VfHNc.jpg

Currently trying yolov5 with custom synthetic data. The dataset we've created consists of 8 different objects. Each object has a minimum of 1500 pictures/labels, where the pictures are split 500/500/500 of normal/fog/distractors around object. Sample images from the dataset is in the first imgur link. The model is not trained from scratch, but from yolov5 standard .pt.

So far we've tried:

  • Adding more data (from 300 images per object, to 4500)
  • Creating more complex data (distractors on/around objects)
  • Running multiple runs of training
  • Trained with network size small, medium, large, xlarge
  • Different batch size between 4-32 (depending on model size)

Everything so far has resulted in good/great detection on synthetic data, but completely off when used on real-life data.
Examples: Thinks that the whole pictures of unrelated objects is a paperbox, walls are pallets, etc. Quick sample images in the last imgur link.

Anyone got clues for how to improve the training or data to be better suited for real life detection? Or how to better interpret the results? I don't understand how the model draws the conclusion that a whole picture, with unrelated objects, is a box/pallet.

Results from training uploaded to imgur:
https://i.sstatic.net/z4GbR.jpg

Example on real life data:
https://i.sstatic.net/VfHNc.jpg

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

笔芯 2025-01-25 18:49:41

有几件事可以改善结果。

  1. 在使用合成数据训练模型后,用实际的培训数据来微调模型,学习率较小(也许1/10)。这将减少合成和现实生活图像之间的差距。在某些情况下而不是微调的情况下,用混合(合成+真实)训练模型会产生更好的结果。
  2. 生成图像在结构上类似于现实生活中的例子。例如,将人类放入叉车中,货盘或桶上的叉子等。模型从中学习。
  3. 随机对要检测到的项目的纹理进行随机化纹理。模型倾向于专注于检测的纹理。通过随机化纹理,具有许多可变性,包括MON自然发生,您迫使模型学会识别不基于其纹理的对象。尽管对象的纹理有时是一个良好的标识符,但合成数据遭受了不够复制该功能的良好功能,因此域间隙,因此您可以减少其对模型决策的影响。
  4. 我不确定屏幕截图是否准确地表示您的数据生成分布,如果是的,则必须将对象,尺寸和遮挡的角度随机化更多。
  5. 使用您不想检测到的对象,但会在您将作为干扰因素进行推断的图像中,而不是像球这样的简单形状。
  6. 更随机照明。强度,颜色,角度等
  7. 增加了背景和地面随机化。使用HDRI,有很多免费的HDRIS
  8. 余额您的数据集

https://i.sstatic.net/jq9yj。 JPG

There are couple of things to improve results.

  1. After training your model with synthetic data, fine tune your model with real training data, with a smaller learning rate (1/10th maybe). This will reduce the gap between synthetic and real life images. In some cases rather than fine tuning, training the model with mixed (synthetic+real) produces better results.
  2. Generate images structurally similar to real life examples. For example, put humans inside forklifts, or pallets or barrels on forks, etc. Models learn from it.
  3. Randomize the texture on items that you want to detect. Models tend to focus on textures for detection. By randomizing textures, with lots of variability including mon natural occurrences, you force model to learn to identify objects not based on its textures. Although, texture of an object sometimes is a good identifier, synthetic data suffers from not replicating that feature good enough, hence the domain gap, so you reduce its impact on model decision.
  4. I am not sure whether the screenshot accurately represent your data generation distribution, if so, you have to randomize the angles of objects, sizes and occlusion amounts more.
  5. Use objects that you don’t want to detect but will be in the images you will do inference as distractors, rather than simple shapes like spheres.
  6. Randomize lighting more. Intensity, color, angles etc.
  7. Increase background and ground randomization. Use hdris, there are lots of free hdris
  8. Balance your dataset

https://i.sstatic.net/jQ9yj.jpg

无敌元气妹 2025-01-25 18:49:41

检查结果的答案是,您的综合数据与您希望其工作的现实生活数据不同。尝试生成更接近现实生活的合成场景,并再次培训可以显然改善您的结果。其中包括更现实的背景和场景构图。我不知道您的训练集是否类似于您在此处共享的验证图像,但是如果确实,请尝试每个图像具有更多对象,靠近相机并为其相对位置增加变化。在图像的中间只有一个随机的3D对象不会提供良好的结果。顺便说一句,您已经过度拟合了模型,因此在这一点上,更多的培训图像无济于事。

Checking your results the answer is that your synthetic data is way to dissimilar to the real life data you want it to work for. Try to generate synthetic scenes that are closer to your real life counterparts and training again would clearly improve your results. That includes more realistic backgrounds and scene compositions. I don't know if your training set resembles the validation images you shared here but in case it does, try to have more objects per image, closer to the camera and add variation to their relative positions. Having just one random 3D object in the middle of an image is not going to provide good results. By the way, you are already overfitting your models, so more training images wouldn't help at this point.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文