为什么 CUDA 和 OpenCL 中的 GPU 线程分配在网格中?

发布于 2024-08-03 02:19:36 字数 306 浏览 6 评论 0原文

我刚刚学习 OpenCL,并且我正在尝试启动内核。为什么GPU线程是在网格中管理的?

我将详细阅读有关此内容的更多内容,但最好有一个简单的解释。使用 GPGPU 时总是这样吗?

I'm just learning OpenCL, and I'm at the point when trying to launch a kernel. Why is it that the GPU threads are managed in a grid?

I'm going to read more about this in detail, but it would be nice with a simple explanation. Is it always like this when working with GPGPUs?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

绅刃 2024-08-10 02:19:36

这是一种常见的方法,在 CUDA、OpenCL 和我认为 ATI 流中都有使用。

网格背后的想法是在正在处理的数据和执行数据处理的线程之间提供简单但灵活的映射。在 GPGPU 执行模型的简单版本中,为 1D、2D 或 3D 数据网格中的每个输出元素“分配”一个 GPU 线程。为了处理该输出元素,线程将从输入数据网格中的相应位置或相邻位置读取一个(或多个)元素。通过将线程组织在网格中,线程可以更轻松地确定要读取哪些输入数据元素以及在哪里存储输出数据元素。

这与常见的多核 CPU 线程模型形成鲜明对比,在常见的多核 CPU 线程模型中,每个 CPU 核心分配一个线程,每个线程处理许多输入和输出元素(例如,四核系统中的 1/4 数据)。

This is a common approach, which is used in CUDA, OpenCL and I think ATI stream.

The idea behind the grid is to provide a simple, but flexible, mapping between the data being processed and the threads doing the data processing. In the simple version of the GPGPU execution model, one GPU thread is "allocated" for each output element in a 1D, 2D or 3D grid of data. To process this output element, the thread will read one (or more) elements from the corresponding location or adjacent locations in the input data grid(s). By organizing the threads in a grid, it's easier for the threads to figure out which input data elements to read and where to store the output data elements.

This contrasts with the common multi-core, CPU threading model where one thread is allocated per CPU core and each thread processes many input and output elements (e.g. 1/4 of the data in a quad-core system).

平安喜乐 2024-08-10 02:19:36

简单的答案是 GPU 设计用于处理 2D 像素网格的图像和纹理。当您在 DirectX 或 OpenGL 中渲染三角形时,硬件会将其光栅化为像素网格。

The simple answer is that GPUs are designed to process images and textures that are 2D grids of pixels. When you render a triangle in DirectX or OpenGL, the hardware rasterizes it into a grid of pixels.

心头的小情儿 2024-08-10 02:19:36

我将引用将方钉插入圆孔的经典类比。好吧,在这种情况下,GPU 是一个非常方的孔,并不像 GP(通用)所建议的那么圆。

上面的解释提出了2D纹理等思想。GPU的架构是这样的,所有的处理都是在流中完成的,每个流中的管线都是相同的,所以需要对正在处理的数据进行这样的分段。

I will invoke the classic analogy of putting a square peg in a round hole. Well, in this case the GPU is a very square hole and not as well rounded as GP (general purpose) would suggest.

The above explanations put forward the ideas of 2D textures, etc. The architecture of the GPU is such that all processing is done in streams with the pipeline being identical in each stream, so the data being processed need to be segmented like that.

清秋悲枫 2024-08-10 02:19:36

这是一个很好的 API 的原因之一是,您通常使用具有多个嵌套循环的算法。如果您有一个、两个或三个循环,那么一维、二维或三维网格可以很好地映射问题,为您提供每个索引值的线索。

因此,您在内核中需要的值(索引值)自然会在 API 中表达。

One reason why this is a nice API is that typically you are working with an algorithm that has several nested loops. If you have one, two or three loops then a grid of one, two or three dimensions maps nicely to the problem, giving you a thread for the value of each index.

So values that you need in your kernel (index values) are naturally expressed in the API.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文