使用 Yolov5 从合成数据到现实生活数据进行对象检测
目前正在尝试使用自定义合成数据的 yolov5。我们创建的数据集由 8 个不同的对象组成。每个对象至少有 1500 张图片/标签,其中图片在对象周围分为 500/500/500 个正常/雾/干扰物。数据集中的示例图像位于第一个 imgur 链接中。该模型不是从头开始训练的,而是从yolov5标准.pt开始训练的。
到目前为止,我们已经尝试过:
- 添加更多数据(从每个对象 300 个图像到 4500 个)
- 创建更复杂的数据(对象上/周围的干扰因素)
- 运行多次训练
- 使用网络大小小、中、大、超大进行训练
- 不同的批量大小4-32 之间(取决于模型大小)
到目前为止,一切都对合成数据产生了良好/出色的检测,但在实际数据上使用时完全关闭。 示例:认为不相关物体的整个图片是纸盒,墙壁是托盘等。最后一个 imgur 链接中的快速示例图像。
有人知道如何改进训练或数据以更好地适合现实生活中的检测吗?或者如何更好地解读结果?我不明白该模型如何得出这样的结论:具有不相关对象的整个图片是一个盒子/托盘。
训练结果上传至 imgur: https://i.sstatic.net/z4GbR.jpg
现实生活数据示例: https://i.sstatic.net/VfHNc.jpg
Currently trying yolov5 with custom synthetic data. The dataset we've created consists of 8 different objects. Each object has a minimum of 1500 pictures/labels, where the pictures are split 500/500/500 of normal/fog/distractors around object. Sample images from the dataset is in the first imgur link. The model is not trained from scratch, but from yolov5 standard .pt.
So far we've tried:
- Adding more data (from 300 images per object, to 4500)
- Creating more complex data (distractors on/around objects)
- Running multiple runs of training
- Trained with network size small, medium, large, xlarge
- Different batch size between 4-32 (depending on model size)
Everything so far has resulted in good/great detection on synthetic data, but completely off when used on real-life data.
Examples: Thinks that the whole pictures of unrelated objects is a paperbox, walls are pallets, etc. Quick sample images in the last imgur link.
Anyone got clues for how to improve the training or data to be better suited for real life detection? Or how to better interpret the results? I don't understand how the model draws the conclusion that a whole picture, with unrelated objects, is a box/pallet.
Results from training uploaded to imgur:
https://i.sstatic.net/z4GbR.jpg
Example on real life data:
https://i.sstatic.net/VfHNc.jpg
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有几件事可以改善结果。
https://i.sstatic.net/jq9yj。 JPG
There are couple of things to improve results.
https://i.sstatic.net/jQ9yj.jpg
检查结果的答案是,您的综合数据与您希望其工作的现实生活数据不同。尝试生成更接近现实生活的合成场景,并再次培训可以显然改善您的结果。其中包括更现实的背景和场景构图。我不知道您的训练集是否类似于您在此处共享的验证图像,但是如果确实,请尝试每个图像具有更多对象,靠近相机并为其相对位置增加变化。在图像的中间只有一个随机的3D对象不会提供良好的结果。顺便说一句,您已经过度拟合了模型,因此在这一点上,更多的培训图像无济于事。
Checking your results the answer is that your synthetic data is way to dissimilar to the real life data you want it to work for. Try to generate synthetic scenes that are closer to your real life counterparts and training again would clearly improve your results. That includes more realistic backgrounds and scene compositions. I don't know if your training set resembles the validation images you shared here but in case it does, try to have more objects per image, closer to the camera and add variation to their relative positions. Having just one random 3D object in the middle of an image is not going to provide good results. By the way, you are already overfitting your models, so more training images wouldn't help at this point.