OpenCL 本地内存有限制吗?

发布于 10-21 01:36 字数 247 浏览 8 评论 0原文

今天,我向内核添加了四个 __local 变量以转储中间结果。但是,只需将这四个变量添加到内核签名并添加相应的内核参数,就会将内核的所有输出呈现为“0” s。 cl 函数均不会返回错误代码。

我进一步尝试仅添加两个较小变量之一。如果我只添加其中之一,它会起作用,但如果我添加它们两个,它就会崩溃。

那么 OpenCL 的这种行为是否意味着我分配了很多 __local 内存?我如何知道我可以使用多少 __local 内存?

Today I added four more __local variables to my kernel to dump intermediate results in. But just adding the four more variables to the kernel's signature and adding the corresponding Kernel arguments renders all output of the kernel to "0"s. None of the cl functions returns an error code.

I further tried only to add one of the two smaller variables. If I add only one of them, it works, but if I add both of them, it breaks down.

So could this behavior of OpenCL mean, that I allocated to much __local memory? How do I find out, how much __local memory is usable by me?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

遥远的她2024-10-28 01:36:13

设备在每个计算单元上提供的本地内存量可以通过使用 clGetDeviceInfo 函数的 CL_DEVICE_LOCAL_MEM_SIZE 标志来查询:

cl_ulong size;
clGetDeviceInfo(deviceID, CL_DEVICE_LOCAL_MEM_SIZE, sizeof(cl_ulong), &size, 0);

返回的大小以字节为单位。每个工作组都可以严格为自己分配这么多内存。但请注意,如果它确实分配最大值,则可能会阻止在同一计算单元上同时调度其他工作组。

The amount of local memory which a device offers on each of its compute units can be queried by using the CL_DEVICE_LOCAL_MEM_SIZE flag with the clGetDeviceInfo function:

cl_ulong size;
clGetDeviceInfo(deviceID, CL_DEVICE_LOCAL_MEM_SIZE, sizeof(cl_ulong), &size, 0);

The size returned is in bytes. Each workgroup can allocate this much memory strictly for itself. Note, however, that if it does allocate maximum, this may prevent scheduling other workgrups concurrently on the same compute unit.

玩心态2024-10-28 01:36:13

当然有,因为本地内存是物理而不是虚拟

我们习惯于使用 CPU 上的虚拟地址空间,理论上拥有我们想要的尽可能多的内存 - 可能会由于分页文件/交换分区耗尽而在非常大的大小下失败,或者甚至可能不会,直到我们真正尝试使用太多内存,使其无法映射到物理 RAM 和磁盘。

对于需要访问实际 RAM 中特定区域的计算机操作系统内核(或其较低级别部分)之类的东西来说,情况并非如此。

GPU 全局内存和本地内存也不是这种情况。没有*内存分页(将感知的线程地址重新映射到物理内存地址);并且没有交换。特别是关于本地内存,每个计算单元(= GPU 上的每个对称多处理器)都有一堆 RAM 用作本地内存;这里的绿色平板:

在此处输入图像描述

每个此类平板的大小就是您使用

clGetDeviceInfo( · ,CL_DEVICE_LOCAL_MEM_SIZE,·,·)

举例来说,在 nVIDIA Kepler GPU 上,本地内存大小为 16 KB或 48 KB(64 KB 的补充用于缓存对全局内存的访问)。因此,截至目前,GPU 本地内存相对于全局设备内存来说非常小


1 - 在从 Pascal 架构开始的 nVIDIA GPU 上,支持分页;但这不是使用设备内存的常见方式。

Of course there is, since local memory is physical rather than virtual.

We are used, from working with a virtual address space on CPUs, to theoretically have as much memory as we want - potentially failing at very large sizes due to paging file / swap partition running out, or maybe not even that, until we actually try to use too much memory so that it can't be mapped to the physical RAM and the disk.

This is not the case for things like a computer's OS kernel (or lower-level parts of it) which need to access specific areas in the actual RAM.

It is also not the case for GPU global and local memory. There is no* memory paging (remapping of perceived thread addresses to physical memory addresses); and no swapping. Specifically regarding local memory, every compute unit (= every symmetric multiprocessor on a GPU) has a bunch of RAM used as local memory; the green slabs here:

enter image description here

the size of each such slab is what you get with

clGetDeviceInfo( · , CL_DEVICE_LOCAL_MEM_SIZE, · , ·).

To illustrate, on nVIDIA Kepler GPUs, the local memory size is either 16 KBytes or 48 KBytes (and the complement to 64 KBytes is used for caching accesses to Global Memory). So, as of today, GPU local memory is very small relative to the global device memory.


1 - On nVIDIA GPUs beginning with the Pascal architecture, paging is supported; but that's not the common way of using device memory.

几味少女2024-10-28 01:36:13

我不确定,但我觉得必须看到这一点。

只需浏览以下链接即可。阅读它。

很棒的读物:OpenCL – 内存空间

一些相关的东西:

I'm not sure, but I felt this must be seen.

Just go through the following links. Read it.

A great read : OpenCL – Memory Spaces.

A bit related stuff's :

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文