Nvidia CUDA 中的预取

发布于 2024-12-10 17:34:49 字数 129 浏览 0 评论 0原文

我正在 nVidia CUDA 中进行数据预取。我阅读了一些有关设备本身预取的文档,即从共享内存预取到缓存。

但我对 CPU 和 GPU 之间的数据预取感兴趣。任何人都可以给我提供一些有关此事的文件或信息吗?任何帮助将不胜感激。

I'm working on data prefetching in nVidia CUDA. I read some documents on prefetching on device itself i.e. Prefetching from shared memory to cache.

But I'm interested in data prefetching between CPU and GPU. Can anyone connect me with some documents or something regarding this matter. Any help would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

谈场末日恋爱 2024-12-17 17:34:49

根据您的评论回答:

当我们想要对大数据执行计算时,理想情况下我们会将最大数据发送到 GPU,执行计算,将其发送回 CPU,即发送、计算、发送(返回到 CPU),现在当它发送回 CPU GPU 时拖延,现在我的计划是给CU程序,假设它在整个全局内存中运行,我将迫使它在全局内存的一半中运行,以便剩下的一半我可以用于数据预取,所以虽然计算是正在表演在一半中同时我在另一半中预取数据。所以不会有任何停顿。现在告诉我这样做是否可行?性能会降低还是升级?应该增强..

引入 CUDA 来准确启用这种方法。

如果您的计算相当密集,那么是的——它可以大大提高您的性能。另一方面,如果数据传输占用了您 90% 的时间,您将仅节省计算时间 - 也就是说 - 最多 10%...

有关如何使用流的详细信息(包括示例)在 CUDA 中提供编程指南。
对于 4.0 版本,这将是“3.2.5.5 Streams”部分,特别是“3.2.5.5.5 Overlapping Behaviour”——在那里,他们在内核仍在运行时启动另一个异步内存副本。

Answer based on your comment:

when we to want perform computation on large data ideally we'll send max data to GPU,perform computation,send it back to CPU i.e SEND,COMPUTE,SEND(back to CPU) now whn it sends back to CPU GPU has to stall,now my plan is given CU program,say it runs in entire global mem,i'll compel it to run it in half of the global mem so that rest of the half i can use for data prefetching,so while computation is being performed in one half simultaneously i cn prefetch data in otherhalf.so no stalls will be there..now tell me is it feasible to do?performance will be degraded or upgraded?should enhance..

CUDA streams were introduced to enable exactly this approach.

If your compoutation is rather intensive, then yes --- it can greatly speed up your performance. On the other hand, if data transfers take, say, 90% of your time, you will save only on computation time - that is - 10% tops...

The details, including examples, on how to use streams is provided in CUDA Programming Guide.
For version 4.0, that will be section "3.2.5.5 Streams", and in particular "3.2.5.5.5 Overlapping Behavior" --- there, they launch another, asynchronous memory copy, while a kernel is still running.

離殇 2024-12-17 17:34:49

也许您对 CUDA 4.0 的异步主机/设备内存传输功能感兴趣?您可以使用页锁定主机内存来重叠主机/设备内存传输和内核。您可以使用它来...

  1. 复制工作集 #1 & #2 从主机到设备。
  2. 同时处理 #i、提升 #i+1 和加载 #i+2。

因此,您可以将数据传入和传出 GPU,并同时进行计算(!)。请参阅《CUDA 4.0 编程指南》和《CUDA 4.0 最佳实践指南》以获取更多详细信息。祝你好运!

Perhaps you would be interested in the asynchronous host/device memory transfer capabilities of CUDA 4.0? You can overlap host/device memory transfers and kernels by using page-locked host memory. You could use this to...

  1. Copy working set #1 & #2 from host to device.
  2. Process #i, promote #i+1, and load #i+2 - concurrently.

So you could be streaming data in and out of the GPU and computing on it all at once (!). Please refer to the CUDA 4.0 Programming Guide and CUDA 4.0 Best Practices Guide for more detailed information. Good luck!

陪你搞怪i 2024-12-17 17:34:49

Cuda 6 将消除复制的需要,即复制将是自动的。
但是您仍然可以从预取中受益。

简而言之,您希望在完成当前计算时传输“下一个”计算的数据。要实现这一点,CPU 上至少需要两个线程,以及某种信号方案(以知道何时发送下一个数据)。分块当然会发挥很大的作用并影响性能。

上述操作在 APU(同一芯片上的 CPU+GPU)上可能更容易,因为两个处理器可以访问相同的内存,因此消除了复制的需要。

如果你想找到一些关于 GPU 预取的论文,只需使用 google 学术即可。

Cuda 6 will eliminate the need to copy, ie the copying will be automatic.
however you may still benefit from prefetching.

In a nutshell you want the data for the "next" computation transferring while you complete the current computation. to achieve that you need to have at least two threads on the CPU, and some kind of signalling scheme (to know when to send the next data). Chunking will of course play a big role and affect performance.

The above may be easier on an APU (CPU+GPU on the same die) as the need to copy is eliminated as both processors can access the same memory.

If you want to find some papers on GPU prefetching just use google scholar.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文