异步 glTexSubImage2D 和 OGL 线程阻塞
我正在开发一个 GPGPU 应用程序,该应用程序使用 PBO 在 cpu 和 gpu 之间传输数据。我的应用程序中的一项要求是 OpenGL 渲染线程应尽可能少地阻塞,并且处理应具有尽可能低的延迟。
我的问题是,是否必须在调用 glTexSubImage2D(启动从主机到设备的转换)和实际使用/渲染纹理之间添加延迟?对于例如尺寸为 1024x1024 的纹理,这样的延迟应该有多大?
for(auto texture: textures)
{
glBindTexture(GL_TEXTURE_2D, texture.id());
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, ...);
glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, ..., NULL, GL_STREAM_DRAW);
void* mem = glMapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY);
copy(mem, data);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB);
glTexSubImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, 0, 0, ..., NULL);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
glBindTexture(GL_TEXTURE_2D, 0);
}
do_other_cpu_stuff_while_data_is_transferring(); // Is this needed to avoid blocking calls while rendering? If so, what strategy can I use to estimate the minimum amount of time needed to transfer the data.
for(auto texture: textures)
{
render(texture);
}
I'm working on a GPGPU application that transfers data between the cpu and gpu using PBOs. One requirement in my application is that the OpenGL rendering thread should be blocking as little as possible and the processing should have as low latency as possible.
My question is whether I have to add latency between the call to glTexSubImage2D (which starts the transform from host to device) and actually using/rendering with the texture? How large should such a latency be for a texture with e.g. size 1024x1024?
for(auto texture: textures)
{
glBindTexture(GL_TEXTURE_2D, texture.id());
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, ...);
glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, ..., NULL, GL_STREAM_DRAW);
void* mem = glMapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY);
copy(mem, data);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB);
glTexSubImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, 0, 0, ..., NULL);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, 0);
glBindTexture(GL_TEXTURE_2D, 0);
}
do_other_cpu_stuff_while_data_is_transferring(); // Is this needed to avoid blocking calls while rendering? If so, what strategy can I use to estimate the minimum amount of time needed to transfer the data.
for(auto texture: textures)
{
render(texture);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想说,最大的延迟将出现在对 copy() 和/或 glUnmapBuffer() 的调用中,但这取决于很多因素(主要是您的硬件),因此您最好的选择是在开始时进行一次传输程序并测量它们。
对于计时,您应该使用 glFinish() 函数和高分辨率计时器(例如 QuerPerformanceCounter)。
I would say that the most latency will be in the call to copy() and/or glUnmapBuffer(), but it will depend on so many things (your hardware, mainly) that your best bet is to do one transfer at the beginning of the program and measure them.
For the timing you should use the glFinish() function with a high resolution timer (such as QuerPerformanceCounter).
由于这是结构化的,它可能会在
glTexSubImage
中阻塞(尽管它最终取决于实现,理论上实现可以推迟这一点)。如果您首先上传几个缓冲区,然后按照定义/上传的顺序对每个缓冲区调用glTexSubImage
,那么您的停顿可能会少得多。do_other_cpu_stuff
调用可能不会有太大帮助,因为它之前已经阻塞了。如果您有 ARB_copy_buffer 功能可用,您可以通过首先在临时缓冲区中定义一些缓冲区数据,然后告诉 OpenGL 在 GPU 上执行缓冲区到缓冲区的复制来进一步避免停顿。
直觉上,这应该不会更快(而是更慢),但由于某些我无法理解的原因,它实际上更快。
As this is structured, it will likely block in
glTexSubImage
(though it finally depends on the implementation, in theory an implementation could defer this). You would likely have a lot fewer stalls if you uploaded a couple of buffers first and then calledglTexSubImage
on each one in the order they were defined/uploaded.The
do_other_cpu_stuff
call will likely not help a lot because it already blocks earlier.If you have ARB_copy_buffer functionality available, you can further avoid stalling by first defining some buffer data in a temp buffer and then telling OpenGL to do a buffer-to-buffer copy on the GPU.
Intuitively, this should be none faster (rather slower) but for some reason that's beyond me, it is actually faster.