定向梯度直方图
我一直在阅读有关用于对象(人类)检测的 HOG 描述符的理论。但我对实施有一些疑问,这听起来可能是一个无关紧要的细节。
关于包含块的窗口;窗口是否应该在图像上逐像素移动,其中窗口在每一步重叠,如下所示:
或者应该在不造成任何重叠的情况下移动窗口,如下所示:
我所看到的插图是这样的目前使用的是第二种方法。但是,考虑到检测窗口的大小为 64x128,通过在图像上滑动窗口很可能无法覆盖整个图像。如果图像大小为 64x255,则不会检查最后 127 像素是否有对象。所以,第一种方法似乎更合理,但是,消耗更多的时间和CPU。
有什么想法吗? 先感谢您。
编辑:我尝试坚持 Dalal 和 Triggs 的原始论文。一篇实现该算法并使用第二种方法的论文可以在这里找到:http://www.cs.bilkent.edu.tr/~cansin/projects/cs554-vision/pedestrian-detection/pedestrian-detection-paper.pdf
I have been reading theory about HOG descriptors for object(human) detection. But I have some questions about the implementation, which might sound like an insignificant detail.
Regarding the window that contains the blocks; should the window be moved over the image pixel by pixel where the windows overlap at each step, as illustrated here:
or should the window be moved without causing any overlapping, as here:
The illustrations that I have seen so far used the second approach. But, considering the detection window being size of 64x128, it is highly probable that by sliding the window over the image one cannot cover the whole image. In case of image being size of 64x255, then the last 127 pixel will not be check for object. So, first approach seems more reasonable, however, more time and cpu consuming.
Any ideas?
Thank you in advance.
EDIT: I try to stick to the original paper of Dalal and Triggs. One paper that implemented the algorithm and uses the second approach can be found here: http://www.cs.bilkent.edu.tr/~cansin/projects/cs554-vision/pedestrian-detection/pedestrian-detection-paper.pdf
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
编辑:
抱歉——我误解了你的问题。 (此外,我对错误问题提供的答案是错误的 - 我已根据上下文在下面进行了调整。)
您询问的是使用 HOG 描述符用于检测,而不是生成 HOG 描述符。
在您上面引用的实现论文中,看起来它们与检测窗口重叠。窗口大小为 64x128,而他们使用 32 像素的水平跨度和 64 的垂直跨度。他们还提到他们尝试了更小的跨度值,但这导致了更高的误报率(在其实现的上下文中)。
最重要的是,他们使用输入图像的 3 个比例:1、1/2 和 1/4。他们没有提到检测窗口的任何相应缩放——我不确定从检测的角度来看这会产生什么影响。看来这也会隐含产生重叠。
原始答案(已更正):
查看 Dalal 和 Triggs 论文(第 6.4 节),看起来他们在生成 HOG 描述符时提到了 i) 无块重叠,以及 ii) 半块和四分之一块重叠。根据他们的结果,听起来更大的重叠可以产生更好的检测性能(尽管资源/处理成本更高)。
EDIT:
Sorry -- I misunderstood your question. (Also, the answer I provided to the wrong question was in error -- I've since adjusted that below for context.)
You're asking about using the HOG descriptor for detection, not generating the HOG descriptor.
In the implementation paper you reference above, it looks like they are overlapping the detection window. The window size is 64x128, while they use a horizontal stride of 32 pixels and a vertical stride of 64. They also mention that they tried smaller stride values, but this led to a higher false positive rate (in the context of their implementation.)
On top of that, they're using 3 scales of the input image: 1, 1/2, and 1/4. They don't mention any corresponding scaling of the detection window -- I'm not sure what effect that would have from a detection standpoint. It seems that this would implicitly create overlap as well.
Original answer (corrected):
Looking at the Dalal and Triggs paper (in section 6.4) it looks like they mention both i) no block overlap, as well as ii) half- and quarter- block overlap when generating the HOG descriptor. Based on their results, it sounds like greater overlap produced better detection performance (albeit at a greater resource/processing cost).