图像处理:什么是遮挡?
我正在开发一个图像处理项目,在许多科学论文中都遇到过“遮挡”这个词,遮挡在图像处理中意味着什么?字典只是给出了一般的定义。谁能使用图像作为上下文来描述它们?
I'm developing an image processing project and I come across the word occlusion in many scientific papers, what do occlusions mean in the context of image processing? The dictionary is only giving a general definition. Can anyone describe them using an image as a context?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
遮挡意味着您想看到某些内容,但由于传感器设置的某些属性或某些事件而无法看到。
它的具体表现方式或您处理问题的方式将因当前的问题而异。
一些示例:
如果您正在开发一个跟踪对象(人、汽车等)的系统,那么如果您正在跟踪的对象被另一个对象隐藏(遮挡),就会发生遮挡。就像两个人擦肩而过,或者一辆汽车在桥下行驶。
这种情况下的问题是当一个对象消失并再次出现时你该怎么做。
如果您使用的是测距相机,那么遮挡就是您没有任何信息的区域。一些激光测距相机的工作原理是将激光束发射到您正在检查的表面上,然后设置相机来识别结果图像中激光的影响点。这给出了该点的 3D 坐标。然而,由于相机和激光不一定对齐,因此在检查表面上可能存在相机可以看到但激光无法击中(遮挡)的点。
这里的问题更多的是传感器设置的问题。
如果场景的某些部分只能由两个摄像机之一看到,那么在立体成像中也会发生同样的情况。显然无法从这些点收集范围数据。
可能还有更多的例子。
如果您指定您的问题,那么也许我们可以定义在这种情况下什么是遮挡,以及它会带来什么问题
Occlusion means that there is something you want to see, but can't due to some property of your sensor setup, or some event.
Exactly how it manifests itself or how you deal with the problem will vary due to the problem at hand.
Some examples:
If you are developing a system which tracks objects (people, cars, ...) then occlusion occurs if an object you are tracking is hidden (occluded) by another object. Like two persons walking past each other, or a car that drives under a bridge.
The problem in this case is what you do when an object disappears and reappears again.
If you are using a range camera, then occlusion is areas where you do not have any information. Some laser range cameras works by transmitting a laser beam onto the surface you are examining and then having a camera setup which identifies the point of impact of that laser in the resulting image. That gives the 3D-coordinates of that point. However, since the camera and laser is not necessarily aligned there can be points on the examined surface which the camera can see but the laser can not hit (occlusion).
The problem here is more a matter of sensor setup.
The same can occur in stereo imaging if there are parts of the scene which are only seen by one of the two cameras. No range data can obviously be collected from these points.
There are probably more examples.
If you specify your problem, then maybe we can define what occlusion is in that case, and what problems it entails
遮挡问题是计算机视觉普遍困难的主要原因之一。具体来说,这在对象跟踪中问题更大。参见下图:
请注意,这位女士的脸在帧
0519
和 0519 中不完全可见。0835
与帧0005
中的脸部相对。这是另一张照片,该男子的脸部在所有三个画面中都部分隐藏。
请注意下图中红色和红色情侣的跟踪情况。由于遮挡(即被前面的另一个人部分隐藏),绿色边界框在中间帧中丢失,但当它们变得(几乎 )完全可见。
图片提供:斯坦福大学,南加州大学
The problem of occlusion is one of the main reasons why computer vision is hard in general. Specifically, this is much more problematic in Object Tracking. See the below figures:
Notice, how the lady's face is not completely visible in frames
0519
&0835
as opposed to the face in frame0005
.And here's one more picture where the face of the man is partially hidden in all three frames.
Notice in the below image how the tracking of the couple in red & green bounding box is lost in the middle frame due to occlusion (i.e. partially hidden by another person in front of them) but correctly tracked in the last frame when they become (almost) completely visible.
Picture courtesy: Stanford, USC
遮挡是指遮挡我们的视线。在这里显示的图像中,我们可以很容易地看到前排的人。但第二行是部分可见的,而第三行则不太可见。这里,我们说第二行被第一行部分遮挡,第三行被第一行和第二行遮挡。
当物体很多时,我们可以在教室(学生排成一排)、交通路口(等待信号的车辆)、森林(树木和植物)等处看到这种遮挡。
Occlusion is the one which blocks our view. In the image shown here, we can easily see the people in the front row. But the second row is partly visible and third row is much less visible. Here, we say that second row is partly occluded by first row, and third row is occluded by first and second rows.
We can see such occlusions in class rooms (students sitting in rows), traffic junctions (vehicles waiting for signal), forests (trees and plants), etc., when there are a lot of objects.
除了已经说过的内容之外,我想添加以下内容:
在密集立体视觉重建的情况下,当用左相机看到某个区域而用右相机看不到该区域时,就会发生遮挡(反之亦然)。在视差图中,该遮挡区域显示为黑色(因为该区域中的相应像素在其他图像中没有等效像素)。一些技术使用所谓的背景填充算法,该算法用来自背景的像素填充被遮挡的黑色区域。其他重建方法只是让那些在视差图中没有值的像素,因为来自背景填充方法的像素在这些区域中可能是不正确的。下面是使用密集立体方法获得的 3D 投影点。这些点向右旋转了一点(在 3D 空间中)。在所呈现的场景中,视差图中被遮挡的值未重建(黑色),由于这个原因,我们在 3D 图像中看到人身后的黑色“阴影”。
Additionally to what has been said I want to add the following:
In the case of dense Stereo Vision reconstruction, occlusion happens when a region is seen with the left camera and not seen with the right(or vice versa). In the disparity map this occluded region appears black (because the corresponding pixels in that region have no equivalent in the other image). Some techniques use the so called background filling algorithms which fill the occluded black region with pixels coming from the background. Other reconstruction methods simply let those pixels with no values in the disparity map, because the pixels coming from the background filling method may be incorrect in those regions. Bellow you have the 3D projected points obtained using a dense stereo method. The points were rotated a bit to the right(in the 3D space). In the presented scenario the values in the disparity map which are occluded are left unreconstructed (with black) and due to this reason in the 3D image we see that black "shadow" behind the person.
由于其他答案已经很好地解释了遮挡,我仅对此进行补充。基本上,我们和计算机之间存在语义鸿沟。
对于 RGB 图像中的每种颜色,计算机实际上将每个图像视为值序列,通常在 0-255 范围内。对于图像中的每个点,这些值以 (row, col) 的形式进行索引。因此,如果物体在相机上改变其位置,其中物体的某些方面隐藏(让人的手不显示),计算机将看到不同的数字(或边缘或任何其他特征),因此这将改变计算机算法检测、识别或跟踪物体。
As the other answers have explained the occlusion well, I will only add to that. Basically, there is semantic gap between us and the computers.
Computer actually see every image as the sequence of values, typically in the range 0-255, for every color in RGB Image. These values are indexed in the form of (row, col) for every point in the image. So if the objects change its position w.r.t the camera where some aspect of the object hides (lets hands of a person are not shown), computer will see different numbers (or edges or any other features) so this will change for the computer algorithm to detect, recognize or track the object.