GPU编程——传输瓶颈
由于我希望 GPU 为我做一些计算,因此我对测量“纹理”上传和下载速度的主题感兴趣 - 因为我的“纹理”是 GPU 应该处理的数据。
我知道从主内存传输到 GPU 内存是首选方式,因此我预计只有在需要处理大量数据且读回的结果很少的情况下,此类应用程序才会高效。
不管怎样,有这样的基准应用程序吗?我的意思是,为了测量主内存<>GPU传输吞吐量...
编辑(问题澄清):
一旦有一个应用程序,您启动了它,它给出了2个数字:
主内存之间的mb/s传输速率和显存,从主图到图,纹理上传
mb/s 主存和显卡之间的传输速率内存,从图形到主,纹理下载
我只想再次把手放在上面。
另一个编辑(发现了一些东西):
这里 http://www.benchmarkhq.ru/ english.html?/be_mm.html(搜索 TexBench)是一款以一种方式测量吞吐量的应用程序...
As I would like my GPU to do some of calculation for me, I am interested in the topic of measuring a speed of 'texture' upload and download - because my 'textures' are the data that GPU should crunch.
I know that transfer from main memory to GPU memory is the preffered way to go, so I expect such application to be efficient only if there is a lot of data to be processed and little results read back.
Anyway, any such benchmark application? I mean, for measuring main memory<>GPU transfer throughput...
EDIT (question clarification):
Once there was an application, which you started, and it gave out 2 numbers:
mb/s transfer rate between main memory and graphic card memory, from main TO graph, texture upload
mb/s transfer rate between main memory and graphic card memory, from graph TO main, texture download
I would just want to put my hands on that, again.
YET ANOTHER EDIT (found something):
Here http://www.benchmarkhq.ru/english.html?/be_mm.html (search for TexBench) is an app that measure the throughput ONE WAY...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要测量主机到设备的内存带宽,您可以使用 CUDA SDK 中的
bandwidthTest
示例(从 CUDA 站点)。To measure host to device memory bandwidth, you can use the
bandwidthTest
sample from the CUDA SDK (download from the CUDA site).第一:全局(GPU)内存和纹理之间的区别是由缓存定义的。纹理有它,全局内存则没有。
其次:从主机到(GPU)设备的纹理和全局内存的传输速率是相同的。
第三:从主机到 (GPU) 设备的传输速率随 GPU 代数的不同而变化,并由 PCI-express 总线和数据大小决定。
例如,请参阅: http://www.accelereyes.com/wiki/index .php?title=GPU_Memory_Transfer
First: the difference between global (GPU) memory and texture is defined by cache. Textures have it, global memory - does not.
Second: the transfer rate from a host to a (GPU) device is the same for textures and for global memory.
Third: the transfer rate from a host to a (GPU) device varies with GPU generation and is determined by PCI-express bus and the size of your data.
See, for example: http://www.accelereyes.com/wiki/index.php?title=GPU_Memory_Transfer
您可以使用 cuda 配置文件来告诉您在 cuda 函数中花费的时间,包括内存传输时间。您可以编写非常简单的传输测试用例并对其进行测量。我认为,当您测量特定的测试用例时,这会更好。
查找 CUDA_PROFILE 以及如何使用它。 http://www.drdobbs.com/cpp/209601096?pgno=2
你的问题有点难以理解,你想测量主机和GPU之间的传输(纹理缓存并不真正相关)还是从内核内部读取纹理?
you can use cuda profile to tell you time spent in cuda functions, including memory transfer time. You can write very simple transfer test case and measured that. this would be better in my opinion as you measure your particular test cases.
Lookup CUDA_PROFILE and how to use it. http://www.drdobbs.com/cpp/209601096?pgno=2
your question is a bit difficult to understand, do you want to measure transfer between host and GPU (texture cache is not really relevant than) or texture reads from within kernel?