Conv1D、Conv2D、Conv3D 之间的区别以及在卷积神经网络 (CNN) 中使用的位置
我是深度学习的新手,正在做深度学习的最后一年项目。我知道我们在图像相关任务中使用 Conv2D,但我的教授问我为什么我们不使用 Conv1D 或 Conv3D?为什么这里专门使用Conv2D。我搜索了整个互联网以获得这个问题的正确答案,但我似乎没有找到任何可靠的答案。请帮助我解决这个问题,因为我很困惑,似乎找不到任何正确的答案。
谢谢你!
I am newbie in deep learning and doing my Final Year Project in Deep learning. I know that we use Conv2D in image related task but my professor asked me that why don't we use Conv1D or Conv3D? Why do we specifically use Conv2D here. I've searched whole internet to get proper answer to this question but i don't seem to find any solid answer to it. Please help me in this question because i am very confused and don't seem to find any proper answer.
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在一维 CNN 中,内核沿 1 个方向移动。一维 CNN 的输入和输出数据是二维的。它主要用于时间序列数据,因为您可以向左或向右移动 (x)。
在二维 CNN 中,内核在两个方向上移动。 2 维 CNN 的输入和输出数据是 3 维的。正如您所提到的,它广泛用于图像相关任务,因为除了左右移动之外,您还可以上下移动(x,y)。
在 3 维 CNN 中,内核在 2 个方向上移动。 3 维 CNN 的输入和输出数据是 4 维的。由于内核在 3 个维度上滑动,因此您有 (x,y,z) 种可能的运动。一个示例用例是医学成像,因为它们是通过切片拍摄然后重建的 3 维图像。添加在一起的所有切片必须作为一个整体进行分析,因此采用单个图像并应用 2 维卷积是没有意义的,因为关系会丢失,您需要堆叠所有图像以获得“3d”表示并使用3 维卷积。
In a 1 dimensional CNN, the kernel moves in 1 direction. Input and output data of a 1 dimensional CNN is 2 dimensional. It is mostly used on Time-Series data since you can just move left or right (x).
In a 2 dimensional CNN, the kernel moves in 2 directions. Input and output data of a 2 dimensional CNN is 3 dimensional. As you have mentioned it is widely used for instance in image related tasks since apart from left and right you can move up and down (x,y).
In a 3 dimensional CNN, the kernel moves in 2 directions. Input and output data of a 3 dimensional CNN is 4 dimensional. Since the kernel slides in 3 dimensions you have (x,y,z) possible movements. One example use case is medical imaging since they are 3 dimensional images taken by slices and then recostructed. All the slices added together must be analysed as a whole, so it has no sense taking single images and apply a 2 dimensional convolution since relationships are getting lost, you need to stack all the images to have a "3d" representation and analyse it with 3 dimensional convolutions.