我想使用子采样深度缓冲区来提高程序的性能。就我而言,是否存在伪影或几何体弹出并不重要。
我已经像这样设置了我的帧缓冲区:
// Color attachment
glBindTexture(GL_TEXTURE_2D, colorAttachment);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 640, 360, 0, GL_RGBA, GL_UNSIGNED_BYTE, nil);
// Depth attachment
glBindRenderbuffer(GL_RENDERBUFFER, depthAttachment);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT16, 160, 90);
// Framebuffer
glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, colorAttachment, 0);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, depthAttachment);
但是,现在, glCheckFramebufferStatus(GL_FRAMEBUFFER)
返回 GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS
,它代表“并非所有附加图像都具有相同的宽度和高度” 文档。
有一篇名为“使用粒子滤波器进行全 3D 边缘跟踪的研究论文”,第 3.5 节描述了他们实际上使用了子采样深度缓冲区来提高应用程序的性能。
子采样深度缓冲区:沿图像边缘的相邻像素密切相关
测试每个单独的边缘像素是多余的。对于单假设追踪器,
通常将样本点沿边缘分散 10-20 个像素的距离。
仅对每第 n 个边缘像素进行采样还可以减少所需的图形带宽,并且
因此仅对每 4 个像素进行采样。这里不是显式地绘制点画线,而是通过使用子采样深度缓冲区(160 x 120)来实现,因为这进一步实现了
用于清除和填充深度缓冲区的带宽减少。然而,这也
意味着隐藏线去除可能不精确到大约四个像素。公寓
由此看来,系统的准确性不受影响。
唯一明显的解决方法是
- 使用片段着色器程序对先前渲染的深度缓冲区执行查找,以手动应用深度检查。
- 以较低的分辨率渲染深度缓冲区,然后将其重新采样到更大的分辨率,然后像以前一样使用它。
这两种方法听起来都不是最有效的想法。实现子采样深度缓冲区的最简洁方法是什么?
I want to use a sub-sampled depth buffer to increase performance of a program. In my case, it does not matter if there are artifacts or geometry popping will occur.
I have set up my framebuffer like this:
// Color attachment
glBindTexture(GL_TEXTURE_2D, colorAttachment);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 640, 360, 0, GL_RGBA, GL_UNSIGNED_BYTE, nil);
// Depth attachment
glBindRenderbuffer(GL_RENDERBUFFER, depthAttachment);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT16, 160, 90);
// Framebuffer
glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, colorAttachment, 0);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, depthAttachment);
However, now, glCheckFramebufferStatus(GL_FRAMEBUFFER)
returns GL_FRAMEBUFFER_INCOMPLETE_DIMENSIONS
which stands for "Not all attached images have the same width and height" according to the documentation.
There exists a research paper called "Full-3D Edge Tracking with a Particle Filter" which describes in section 3.5 that they actually used a sub-sampled depth buffer to increase the performance in their application.
Sub-sampled depth buffer: Adjacent pixels along an image edge are so closely correlated
that testing each individual edge pixel is redundant. For single-hypothesis trackers,
it is common to spread sample points a distance of 10-20 pixels apart along an edge.
Sampling only every nth edge pixel also reduces the graphics bandwidth required and
so only every 4th pixel is sampled. Instead of explicitly drawing stippled lines, this is here achieved by using a sub-sampled depth buffer (160 x 120) since this further achieves
a bandwidth reduction for clearing and populating the depth buffer. However, this also
means that hidden line removal can be inaccurate to approximately four pixels. Apart
from this, the accuracy of the system is unaffected.
The only workarounds which are obvious are
- Using a fragment shader program to perform the lookup into the previously rendered depth buffer to apply the depth-check manually.
- Rendering the depthbuffer in the lower resolution, then resample it to the bigger resolution, then use it as before.
Both approaches don't sound like they would be the most performant ideas. What is the cleanest way to achieve a sub-sampled depth buffer?
发布评论
评论(1)
您引用的文档页面指的是 OpenGL ES 1.0 和 2.0。 OpenGL wiki 有关 2.0 和 3.0 之间差异的更多信息,即从3.0(和 ARB_framebuffer_object),帧缓冲区纹理可以具有不同的大小。但是,如果我没记错的话,当附加不同大小的纹理时,实际使用的纹理大小是所有 FBO 附加纹理的交集。我认为这不是你想要的。
为了减小深度纹理的大小,我建议使用 glBlitFramebuffer 将大纹理转换为较小的纹理。这个操作完全在GPU上完成,所以速度非常快。然后,最终较小的纹理可以用作着色器中进一步渲染操作的输入,这肯定会节省带宽。不是对每个像素着色器执行执行多个深度值的平均,而是在较小的纹理中对每个纹素执行一次。较小的纹理本质上采样速度也更快,因为它更适合缓存。
但请记住,平均深度样本可能会产生极大的不准确性,因为深度值不是线性分布的。
The doc page you referenced refers to OpenGL ES 1.0 and 2.0. The OpenGL wiki has more information as to the difference between 2.0 and 3.0, namely that starting with 3.0 (and ARB_framebuffer_object), framebuffer textures can be of different sizes. However, if I recall correctly, when you have textures of different sizes attached, the actual texture size used is the intersection of all FBO attached textures. I don't think this is what you want.
In order to reduce the size of your depth texture, I suggest using glBlitFramebuffer to transform your large texture into a smaller one. This operation is completely done on the GPU so it's very fast. The final smaller texture can then be used as input for further rendering operations in your shaders which will definitely provide bandwidth savings. Instead of performing the averaging of multiple depth values for each pixel shader execution, it will be done once per texel in the smaller texture. A smaller texture is also inherently faster to sample since it fits in cache better.
Keep in ming however that averaging depth samples can produce wild inaccuracies because the depth values are not linearly spread.