iPhone OpenGL ES 工具中的 Tiler Utilization 统计数据意味着什么?
我一直在尝试执行一些 OpenGL ES 性能优化,以提高 iPhone 应用程序中每秒渲染的三角形数量,但我遇到了困难。我尝试将 OpenGL ES 数据类型从固定点转换为浮点(按照 Apple 的建议),交错顶点缓冲区对象,并最大限度地减少绘图状态的变化,但这些变化都没有对渲染速度产生影响。无论如何,我似乎无法在运行 3.0 操作系统的 iPhone 3G 上将我的应用程序推至每秒 320,000 个三角形以上。根据此基准测试,通过我正在使用的平滑着色,我应该能够在该硬件上达到 687,000 个三角形/秒。
在我的测试中,当我在 Instruments 中针对正在运行的设备运行 OpenGL ES 性能工具时,我发现在渲染基准测试时统计数据“Tiler Utilization”几乎达到 100%,但“Renderer Utilization”仅达到约 30 %。这可能提供了有关显示过程中瓶颈是什么的线索,但我不知道这些值的含义,并且我没有找到任何有关它们的文档。有人对 iPhone OpenGL ES 仪器中的这个和其他统计数据代表什么有很好的描述吗?我知道 iPhone 3G 中的 PowerVR MBX Lite 是 基于图块的延迟渲染器,但我不确定该架构中的渲染器和 Tiler 之间有什么区别。
如果有任何帮助,请提供此应用程序的(BSD 许可的)源代码 noreferrer">。在当前配置中,每次加载新的分子结构时,它都会启动一个小基准测试,并将三角形输出到控制台。
I have been trying to perform some OpenGL ES performance optimizations in an attempt to boost the number of triangles per second that I'm able to render in my iPhone application, but I've hit a brick wall. I've tried converting my OpenGL ES data types from fixed to floating point (per Apple's recommendation), interleaving my vertex buffer objects, and minimizing changes in drawing state, but none of these changes have made a difference in rendering speed. No matter what, I can't seem to push my application above 320,000 triangles / s on an iPhone 3G running the 3.0 OS. According to this benchmark, I should be able to hit 687,000 triangles/s on this hardware with the smooth shading I'm using.
In my testing, when I run the OpenGL ES performance tool in Instruments against the running device, I'm seeing the statistic "Tiler Utilization" reaching nearly 100% when rendering my benchmark, yet the "Renderer Utilization" is only getting to about 30%. This may be providing a clue as to what the bottleneck is in the display process, but I don't know what these values mean, and I've not found any documentation on them. Does someone have a good description of what this and the other statistics in the iPhone OpenGL ES instrument stand for? I know that the PowerVR MBX Lite in the iPhone 3G is a tile-based deferred renderer, but I'm not sure what the difference would be between the Renderer and Tiler in that architecture.
If it helps in any way, the (BSD-licensed) source code to this application is available if you want to download and test it yourself. In the current configuration, it starts a little benchmark every time you load a new molecular structure and outputs the triangles / s to the console.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
平铺器利用率和渲染器利用率百分比分别测量顶点和片段处理硬件的占空比。在 MBX 上,Tiler 利用率通常会随着发送到 GPU 的顶点数据量而变化(就顶点数量和每个顶点发送的属性大小而言),而片段利用率通常会随着透支和纹理采样而增加。
就您而言,最好的办法是减小您发送的每个顶点的大小。首先,我会尝试按颜色对原子和键进行分箱,并使用常量颜色而不是数组发送每个分箱。我还建议在适当的缩放比例下调查空头是否适合您的仓位和正常情况。在这种情况下,如果为提供足够精度而缩放的短裤未覆盖您需要的范围,您可能还必须按位置进行分类。这些技术可能需要额外的绘制调用,但我怀疑顶点吞吐量的改进将超过额外的每次绘制调用 CPU 开销。
请注意,确保每个顶点属性从 32 位边界开始通常是有益的(在 MBX 和其他地方),这意味着如果将位置和法线切换为短路,则应将它们填充到 4 个组件。 MBX 平台的特殊性还使得您希望在这种情况下在对 glVertexPointer 的调用中实际包含位置的 W 分量。
您还可以考虑为多边形数据(尤其是球体)采用 DOT3 等替代照明方法,但这需要特别小心,以确保您不会使渲染片段受限,或者无意中发送比以前更多的顶点数据。
The Tiler Utilization and Renderer Utilization percentages measure the duty cycle of the vertex and fragment processing hardware, respectively. On the MBX, Tiler Utilization typically scales with the amount of vertex data being sent to the GPU (in terms of both the number of vertices and the size of the attributes sent per-vertex), and Fragment Utilization generally increases with overdraw and texture sampling.
In your case, the best thing would be to reduce the size of each vertex you’re sending. For starters, I’d try binning your atoms and bonds by color, and sending each of these bins using a constant color instead of an array. I’d also suggest investigating if shorts are suitable for your positions and normals, given appropriate scaling. You might also have to bin by position in this case, if shorts scaled to provide sufficient precision aren’t covering the range you need. These sorts of techniques might require additional draw calls, but I suspect the improvement in vertex throughput will outweigh the extra per-draw call CPU overhead.
Note that it’s generally beneficial (on MBX and elsewhere) to ensure that each vertex attribute begins on a 32-bit boundary, which implies that you should pad your positions and normals out to 4 components if you switch them to shorts. The peculiarities of the MBX platform also make it such that you want to actually include the W component of the position in the call to glVertexPointer in this case.
You might also consider pursuing alternate lighting methods like DOT3 for your polygon data, particularly the spheres, but this requires special care to make sure that you aren’t making your rendering fragment-bound, or inadvertently sending more vertex data than before.
很好的答案,@Pivot!作为参考,此 Apple 文档 定义了这些术语:
Great answer, @Pivot! For reference, this Apple doc defines these terms: