全屏四边形的GLSL片段处理顺序
我使用 GLSL 进行一些图像处理,因此绘制全屏四边形并在片段着色器中进行处理。我想知道我们是否可以期望片段以任何特定的优先级顺序进行处理?
我知道片段正在并行处理,我们无法保证任何特定片段的完成时间,那么这是如何处理的呢?仅仅是一个大队列吗?图案会是什么样子,即。扫描线、块等。
这会依赖于驱动程序吗?
I'm using GLSL for some image processing stuff, so drawing a full screen quad and doing processing in the fragment shader. I'm wondering if we can expect fragments to be processed in any particular priority order?
I know the fragments are being processed in parallel and we can't make any guarantees on the finish time for any particular fragment, so how is this handled? Is it just a big queue? And what would the pattern look like ie. scanline, blocks etc.
Will this be driver dependent?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
没有关于它的文档,因为它是任意处理的。硬件能够以完全任意的顺序处理片段;您不得以任何方式、形状或形式了解片段处理的顺序。没有任何控件可以更改片段处理的顺序、影响该顺序,甚至根本无法检测它。
好吧,直到 4.2 和 ARB_shader_image_load_store。但即便如此,也内置了控制功能,让硬件拥有尽可能多的自由度。
简而言之,如果您正在做的事情处理顺序很重要,那么您就做错了。
在我看来,您正在尝试执行反馈循环,同时读取和写入帧缓冲区(通过绑定纹理并将同一图像附加到 FBO 渲染目标)。这是不允许的。
好的,所以这是关于性能,而不是功能。
您可以假设 CPU 的通常情况:内存访问将被缓存。片段着色器进入的顺序并不重要;其中一个将首先访问内存,而稍后访问的则从缓存中受益。
请记住:GPU 针对执行此操作进行了优化。 GPU 的销售取决于渲染纹理、着色器处理的三角形的速度。实现知道如何使用纹理,并且您可以预期它在如何排序其输出的片段方面不会愚蠢。
如果必须随机访问,那就必须随机访问;你对此无能为力。但除此之外,您不应该浪费时间担心或尝试优化这一点。让硬件和驱动程序编写者完成他们的工作。
There is no documentation on it because it is handled arbitrarily. Hardware is afforded the ability to process fragments in a completely arbitrary order; you are not allowed to know about the order of fragment processing in any way, shape, or form. There are no controls to change the order of fragment processing, affect that order, or even detect it at all.
Well, until 4.2 and ARB_shader_image_load_store. But even that has controls built into it to allow the hardware as much freedom as possible.
In short, if you're doing something where the order of processing matters, you're doing something wrong.
It sounds to me like you're trying to do a feedback loop, where you read from and write to the framebuffer simultaneously (by binding a texture and attaching that same image to an FBO render target). That is not allowed.
OK, so this is about performance, not functionality.
You can assume what you normally would for CPUs: that memory accesses will be cached. The order that the fragment shaders go in doesn't matter; one of them will hit the memory first, and the one that hits it later benefits from the cache.
Remember: GPUs are optimized for doing this stuff. GPUs sell based on how fast textured, shader processed triangles are rendered. Implementations know how textures are going to be used and you can expect that it will not be stupid with how it orders the fragments it outputs.
If you have to random access, then you have to random access; there's not much you can do about it. But otherwise, this isn't something you should be wasting any time worrying about or trying to optimize around. Let the hardware and driver writers do their jobs.
回复 @nicol-bolas 回复,因为我没有正确回复的声誉。
某些算法取决于此顺序。
滑动盒过滤器(尽管仅在整数颜色存储时可靠)是非常值得注意的例子,它对于任何窗口大小都同样快。其中 2 个将盒子转为三角形(双线性插值),3 个制作出漂亮的平滑二次曲线。
可以使用着色器中的条件来模拟这样的算法的顺序性质:
Replying to @nicol-bolas reply, as I have no reputation for proper reply.
Cetain algorythm repend on this order.
Sliding box filter (though only reliable with integer color storage), which is equally fast with any window size, is very notable example. 2 of them turn box to triangle (bilinear interpolation) and 3 make nice smooth quadratic curve.
It may be possible to mimic sequential nature of algorithms like this, using condition in the shader: