自定义 glBlendFunc 比原生慢很多
我正在尝试通过片段着色器执行我自己的自定义 glBlendFunc,但是,我的解决方案比本机 glBlendFunc 慢很多,即使它们执行精确的混合功能也是如此。
我想知道是否有人对如何以更有效的方式做到这一点有任何建议。
我的解决方案的工作原理如下:
void draw(fbo fbos[2], render_item item)
{
// fbos[0] is the render target
// fbos[1] is the previous render target used to read "background" to blend against in shader
// Both fbos have exactly the same content, however they need to be different since we can't both read and write to the same texture. The texture we render to needs to have the entire content since we might not draw geometry everywhere.
fbos[0]->attach(); // Attach fbo
fbos[1]->bind(1); // Bind as texture 1
render(item);
glCopyTexSubImage2D(...); // copy from fbos[0] to fbos[1], fbos[1] == fbos[0]
}
fragment.glsl
vec4 blend_color(vec4 fore)
{
vec4 back = texture2D(background, gl_TexCoord[1].st); // background is read from texture "1"
return vec4(mix(back.rgb, fore.rgb, fore.a), back.a + fore.a);
}
I'm trying to do my own custom glBlendFunc through fragment shaders, however, my solution is a lot slower than the native glBlendFunc, even when they do the exact blending function.
I was wondering if anyone had any suggestion on how to do this in a more efficient way.
My solution works something like this:
void draw(fbo fbos[2], render_item item)
{
// fbos[0] is the render target
// fbos[1] is the previous render target used to read "background" to blend against in shader
// Both fbos have exactly the same content, however they need to be different since we can't both read and write to the same texture. The texture we render to needs to have the entire content since we might not draw geometry everywhere.
fbos[0]->attach(); // Attach fbo
fbos[1]->bind(1); // Bind as texture 1
render(item);
glCopyTexSubImage2D(...); // copy from fbos[0] to fbos[1], fbos[1] == fbos[0]
}
fragment.glsl
vec4 blend_color(vec4 fore)
{
vec4 back = texture2D(background, gl_TexCoord[1].st); // background is read from texture "1"
return vec4(mix(back.rgb, fore.rgb, fore.a), back.a + fore.a);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
提高基于 FBO 的混合性能的最佳选择是 NV_texture_barrier。尽管有这个名字,AMD 也已经实现了它,所以如果你坚持使用 Radeon HD 级卡,它应该可供你使用。
基本上,它允许您进行乒乓球操作,而无需 FBO 绑定或纹理附加操作等重量级操作。该规范的底部有一个部分显示了通用算法。
另一种选择是 EXT_shader_image_load_store。这将需要 DX11/GL 4.x 级硬件。 OpenGL 4.2 最近通过 ARB_shader_image_load_store 将其提升为核心。
即使如此,正如达西所说,你永远无法打败常规的混合。它使用着色器无法访问的特殊硬件结构(因为它们在着色器运行后发生)。仅当存在某种您绝对无法通过任何其他方式实现的效果时,您才应该进行编程混合。
Your best bet for improving the performance of FBO-based blending is NV_texture_barrier. Despite the name, AMD has implemented it as well, so if you stick to Radeon HD-class cards, it should be available to you.
Basically, it allows you to ping-pong without heavyweight operations like FBO binding or texture attachment operations. The specification has a section towards the bottom that shows the general algorithm.
Another alternative is EXT_shader_image_load_store. This will require DX11/GL 4.x class hardware. OpenGL 4.2 recently promoted this to core with ARB_shader_image_load_store.
Even with this, as Darcy said, you're never going to beat regular blending. It uses special hardware structures that shaders can't access (since they happen after the shaders have run). You should only do programmatic blending if there is some effect that you absolutely cannot accomplish any other way.
它的效率要高得多,因为混合操作直接内置于 GPU 硬件中,因此您可能无法在速度上击败它。话虽如此,请确保您已关闭深度测试、背面剔除、硬件混合和任何其他不需要的操作。我不能说这会产生巨大的影响,但它可能会产生一些影响。
It is a lot more efficient because blending operations are built directly into the GPU hardware, so you probably aren't going to be able to beat it for speed. Having said that,make sure you have depth-testing, back-face culling , hardware blending, and any other unneeded operations turned off. I can't say it will make a huge difference, but it may make some.