平滑渲染 1.2 GB 纹理,1 GB GPU 是如何做到的?
我的目标是看看当使用的纹理数据多于物理 GPU 内存所能容纳的数量时会发生什么。我的第一次尝试是加载最多 40 个 DDS 纹理,导致内存占用量远高于 GPU 内存。然而,我的场景在 9500 GT 上仍会以 200+ fps 的速度渲染。
我的结论是:GPU/OpenGL 很聪明,只将 mipmap 的某些部分保留在内存中。我认为这在标准配置上是不可能的,但无论如何。
第二次尝试:禁用 mip 映射,以便 GPU 始终必须从高分辨率纹理中进行采样。我再次在内存中加载了大约 40 个 DDS 纹理。我使用 gDEBugger 验证了纹理内存使用情况:1.2 GB。尽管如此,我的场景仍以 200+ fps 的速度渲染。
我唯一注意到的是,当将镜头移开然后再次将其置于场景中心时,会出现严重的滞后。好像只有这样它才会将纹理从主内存传输到 GPU。 (我启用了一些基本的视锥体剔除)
我的问题:发生了什么事?这个 1 GB GPU 如何以 200+ fps 的速度从 1.2 GB 纹理数据中采样?
My goal is to see what would happen when using more texture data than what would fit in physical GPU memory. My first attempt was to load up to 40 DDS textures, resulting in a memory footprint way higher than there was GPU memory. However, my scene would still render at 200+ fps on a 9500 GT.
My conclusion: the GPU/OpenGL is being smart and only keeps certain parts of the mipmaps in memory. I thought that should not be possible on a standard config, but whatever.
Second attempt: disable mip mapping, such that the GPU will always have to sample from the high res textures. Once again, I loaded about 40 DDS textures in memory. I verified the texture memory usage with gDEBugger: 1.2 GB. Still, my scene was rendering at 200+ fps.
The only thing I noticed was that when looking away with the camera and then centering it once again on the scene, a serious lag would occur. As if only then it would transfer textures from main memory to the GPU. (I have some basic frustum culling enabled)
My question: what is going on? How does this 1 GB GPU manage to sample from 1.2 GB of texture data at 200+ fps?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
OpenGL 可以在绘制调用之间(不仅仅是在帧之间)将完整的纹理分页进出纹理内存。只有当前绘制调用所需的那些实际上需要驻留在图形内存中,其他的可以驻留在系统 RAM 中。它可能只对纹理数据的一小部分进行此操作。它与任何缓存几乎相同 - 当 CPU 上只有 MB 的缓存时,如何在 GB 的数据上运行算法?
此外,PCI-E 总线具有非常高的吞吐量,因此您不会真正注意到驱动程序执行分页。
如果您想验证这一点,
glAreTexturesResident
可能有帮助,也可能没有帮助,具体取决于驱动程序的实现程度。OpenGL can page complete textures in and out of texture memory in between draw-calls (not just in between frames). Only those needed for the current draw-call actually need to be resident in graphics memory, the others can just reside in system RAM. It likely only does this with a very small subset of your texture data. It's pretty much the same as any cache - how can you run algorithms on GBs of data when you only have MBs of cache on your CPU?
Also PCI-E busses have a very high throughput, so you don't really notice that the driver does the paging.
If you want to verify this,
glAreTexturesResident
might or might-not help, depending on how well the driver is implemented.即使您在测试中强制进行纹理抖动(每帧丢弃一些纹理并从系统内存上传到 GPU 内存)(我不确定您是否这样做),现代 GPU 和 PCI-E 具有如此巨大的带宽,以至于某些抖动确实对性能有那么大的影响。其中一款 9500GT 型号的带宽为 25.6 GB/s,16 个 PCI-E 插槽(500 MB/sx 16 = 8 GB/s)是标准配置。
至于延迟,我假设 GPU + CPU 在您不绘制可见纹理时会降低功耗,而当您突然超载时,它们需要短暂的瞬间启动。在现实生活中的应用程序和游戏中,这种 0%-100% 的突然工作负载变化永远不会发生,因此我想,轻微的滞后是完全可以理解和预期的。
Even if you were forcing texture thrashing in your test (discarding and uploading of some textures from system memory to GPU memory every frame), which I'm not sure you are, modern GPUs and PCI-E have such a huge bandwidth that some thrashing does impact performance that much. One of the 9500GT models is quoted to have a bandwidth of 25.6 GB/s, and 16x PCI-E slots (500 MB/s x 16 = 8 GB/s) are the norm.
As for the lag, I would assume the GPU + CPU throttle down their power usage when you aren't drawing visible textures, and when you suddenly overload them they need a brief instant to power up. In real life apps and games this 0%-100% sudden workload changes never happen, so a slight lag is totally understandable and expected, I guess.