现代 GPU 上的纹理更改(和其他状态更改)成本
我正在编写一个基于场景图的图形引擎用于建模目的。我正在使用 XNA 4。 在我读过的许多地方,在渲染过程中应该最小化纹理变化(和其他状态变化)(所以我必须按材质等对图元进行排序)。
我在 XNA 4 中创建了一个小型测试应用程序,它使用单个纹理渲染数百个斯坦福兔子模型,然后执行相同的操作,切换 2 个不同的纹理。渲染时间没有差异(但是我使用了大约 100x100 的小纹理)。
所以我的问题是:
- 我真的应该关心按纹理/颜色/其他材质参数对图元进行排序吗?或者说它对于现代 GPU 来说不那么重要?
- 如果我不这样做,预期的性能损失百分比是多少?
- 是否还有其他状态变化会影响性能?
- 我在哪里可以找到有关此的最新文献/最佳实践指南?
感谢您的任何帮助或链接!
I'm writing a scene-graph based graphics engine for modeling purposes. I'm using XNA 4.
On many places I have been reading, that texture changes (and other state changes) should be minimized during rendering (so I have to order my primitives by materials, etc.).
I created a small test application in XNA 4, which was rendering hundreds of stanford bunny models with a single texture, then doing the same toggling 2 different textures. There was no difference in rendering time (however I used small ~100x100 textures).
So my questions are:
- Should I really care about sorting my primitives by texture/color/other material parameters? Or is it less important on modern GPUs?
- What is the expectable percentage of performance loss, if I don't?
- Are there any other state changes, which can effect the performance?
- Where can I find some up to date literature/best practice guide about this?
Thank you for any help or links!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
很长一段时间以来,状态更改的成本并不是很高。 批次成本高昂。(状态更改需要新批次)。批处理基本上是对
Draw*Primitives
函数的调用。来自 nVidia 的此 PDF 详细解释了它。它还提供了减少批次数量的想法。
批次是基于 CPU(而非 GPU)的限制。该 PDF 列出“< 130 tris/batch”作为提交批次主导性能的点,并且 GPU 处于空闲状态等待更多批次 (详细信息)。它还表示,在 60 FPS、每 1GHz 专用于提交批次的 CPU 功率下,每帧您可以获得大约 400 个批次。 (虽然 PDF 有点旧,所以这些数字有点过时了。)
我的 gamedev 网站上对类似问题的回答应该提供更多详细信息。 这也是。
State changes haven't really been expensive for a long time. Batches are expensive. (And a state change necessitates a new batch). A batch is basically a call to a
Draw*Primitives
function.This PDF from nVidia explains it in detail. It also gives ideas for reducing your batch count.
Batches are a CPU-based limit (not GPU). That PDF lists "< 130 tris/batch" as the point where submitting batches dominates performance and the GPU sits idle waiting for more batches (details). It also says you get about 400 batches per frame, at 60 FPS, per 1GHz of CPU power dedicated to submitting batches. (Although the PDF is a bit old, so those figures are a bit out of date.)
My answer on the gamedev site to a similar question should provide some more details. This one too.