我正在阅读很棒的 OpenGL 教程。这真的很棒,相信我。我当前的主题是Z-buffer。除了解释这一切之外,作者还提到我们可以执行自定义深度测试,例如 GL_LESS、GL_ALWAYS 等。他还解释了深度值(哪些是顶部,哪些不是)的实际含义也可以是定制。到目前为止我明白了。然后作者说了一些令人难以置信的话:
范围zNear可以大于范围zFar;如果是,那么
就构成而言,窗口空间值将被反转
距离观看者最近或最远。
之前,有人说窗口空间 Z 值 0 是最接近的,
1 是最远的。然而,如果我们的剪辑空间 Z 值被否定,
深度为 1 将最接近视图,深度为 0 将是
最远。然而,如果我们翻转深度测试的方向(GL_LESS 到
GL_GREATER 等),我们得到完全相同的结果。所以这实际上只是一个
习俗。 确实,翻转 Z 符号并进行深度测试一次
对于许多游戏来说至关重要的性能优化。
如果我理解正确的话,从性能角度来看,翻转 Z 的符号和深度测试只不过是将 <
与>
比较。因此,如果我理解正确并且作者没有撒谎或捏造事实,那么将 <
更改为 >
曾经是一个至关重要的针对许多游戏进行优化。
是作者编造的吗,我是否误解了什么,或者确实是 <
比 慢(至关重要,正如作者所说) >
?
感谢您澄清这个非常奇怪的问题!
免责声明:我完全意识到算法复杂性是优化的主要来源。此外,我怀疑现在它肯定不会有任何区别,我并不是要求它优化任何东西。我只是极度地、痛苦地、也许是过度地好奇。
I am reading an awesome OpenGL tutorial. It's really great, trust me. The topic I am currently at is Z-buffer. Aside from explaining what's it all about, the author mentions that we can perform custom depth tests, such as GL_LESS, GL_ALWAYS, etc. He also explains that the actual meaning of depth values (which is top and which isn't) can also be customized. I understand so far. And then the author says something unbelievable:
The range zNear can be greater than the range zFar; if it is, then the
window-space values will be reversed, in terms of what constitutes
closest or farthest from the viewer.
Earlier, it was said that the window-space Z value of 0 is closest and
1 is farthest. However, if our clip-space Z values were negated, the
depth of 1 would be closest to the view and the depth of 0 would be
farthest. Yet, if we flip the direction of the depth test (GL_LESS to
GL_GREATER, etc), we get the exact same result. So it's really just a
convention. Indeed, flipping the sign of Z and the depth test was once
a vital performance optimization for many games.
If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a <
comparison to a >
comparison. So, if I understand correctly and the author isn't lying or making things up, then changing <
to >
used to be a vital optimization for many games.
Is the author making things up, am I misunderstanding something, or is it indeed the case that once <
was slower (vitally, as the author says) than >
?
Thanks for clarifying this quite curious matter!
Disclaimer: I am fully aware that algorithm complexity is the primary source for optimizations. Furthermore, I suspect that nowadays it definitely wouldn't make any difference and I am not asking this to optimize anything. I am just extremely, painfully, maybe prohibitively curious.
发布评论
评论(3)
我没有解释得特别清楚,因为这并不重要。我只是觉得这是一个有趣的琐事值得补充。我并不打算专门讨论该算法。
然而,背景是关键。我从来没有说过 a <比较比 a > 更快比较。请记住:我们谈论的是图形硬件深度测试,而不是 CPU。不是
运算符<
。我指的是一种特定的旧优化,其中一帧您将使用范围为 [0, 0.5] 的 GL_LESS 。下一帧,您使用
GL_GREATER
进行渲染,范围为 [1.0, 0.5]。你来回走动,每帧都“翻转 Z 符号和深度测试”。这会损失一点深度精度,但您不必清除深度缓冲区,这曾经是一个相当慢的操作。由于如今深度清理不仅是免费的,而且实际上比这种技术更快,所以人们不再这样做了。
I didn't explain that particularly well, because it wasn't important. I just felt it was an interesting bit of trivia to add. I didn't intend to go over the algorithm specifically.
However, context is key. I never said that a < comparison was faster than a > comparison. Remember: we're talking about graphics hardware depth tests, not your CPU. Not
operator<
.What I was referring to was a specific old optimization where one frame you would use
GL_LESS
with a range of [0, 0.5]. Next frame, you render withGL_GREATER
with a range of [1.0, 0.5]. You go back and forth, literally "flipping the sign of Z and the depth test" every frame.This loses one bit of depth precision, but you didn't have to clear the depth buffer, which once upon a time was a rather slow operation. Since depth clearing is not only free these days but actually faster than this technique, people don't do it anymore.
几乎可以肯定的是,无论使用何种形式的芯片+驱动器,Hierarchical Z 都只能在一个方向上工作——这在当时是一个相当普遍的问题。低级汇编/分支与之无关 - Z 缓冲是在固定功能硬件中完成的,并且是管道化的 - 没有推测,因此没有分支预测。
The answer is almost certainly that for whatever incarnation of chip+driver was used, the Hierarchical Z only worked in the one direction - this was a fairly common issue back in the day. Low level assembly/branching has nothing to do with it - Z-buffering is done in fixed function hardware, and is pipelined - there is no speculation and hence, no branch prediction.
它与高度调整的汇编中的标志位有关。
x86同时具有jl和jg指令,但大多数RISC处理器只有jl和jz(没有jg)。
It has to do with flag bits in highly tuned assembly.
x86 has both jl and jg instructions, but most RISC processors only have jl and jz (no jg).