AMD 设备上的物理内存:本地内存与私有内存
我正在 OpenCL 中编写一个算法,其中我需要每个工作单元记住相当一部分数据,例如 long[70]
和 long[200]< 之间的数据/code> 每个内核左右。
最近的 AMD 设备拥有 32 KiB __local 内存,(对于每个内核给定的数据量)足以存储 20-58 个工作单元的信息。然而,根据我对架构的理解(尤其是这张图),每个着色器核心还拥有专用的私有内存。然而我找不到它的大小。
谁能告诉我如何找出每个内核有多少私有内存?
我对 HD7970 特别好奇,因为我计划很快购买其中一些。
编辑:问题已解决,答案在此处附录D中。
I'm writing an algorithm in OpenCL in which I'd need every work unit to remember a fair portion of data, say something between a long[70]
and a long[200]
or so per kernel.
Recent AMD devices have 32 KiB __local
memory, which is (for the given amount of data per kernel) enough to store the info for 20-58 work units. However, from what I understand from the architecture (and especially from this drawing), each shader core also has a dedicated amount of private memory. I however fail to find its size.
Can anyone tell me how to find out how much private memory each kernel has?
I'm particularly curious about the HD7970, since I plan to buy some of these soon.
Edit: Problem solved, the answer is here in appendix D.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
答案是由用户talonmies在评论中给出的,所以我将在这里写一个新答案来结束问题。
这些值可以在 AMD APP OpenCL 编程指南 http://developer.amd.com/sdks/amdappsdk/assets/amd_accelerated_parallel_processing_opencl_programming_guide.pdf(nVidia 也有类似的文档)。显然,AMD 设备的寄存器为 128 位 (4x32),所有现代高端设备有 16384 个寄存器,因此每个计算单元有 256KB 的内存。
The answer was given by user talonmies in the comments, so I'll write it in a new answer here to close the question.
These values can be found in Appendix D of the AMD APP OpenCL Programming Guide http://developer.amd.com/sdks/amdappsdk/assets/amd_accelerated_parallel_processing_opencl_programming_guide.pdf (a similar document exists for nVidia). Apparently a register is 128 bits (4x32) for AMD devices and there are 16384 registers for all modern high-end devices, so that's a remarkable 256KB per compute unit.
我认为你正在寻找 __local 内存。这就是 32KB 本地数据存储所指的。我认为您无法轮询设备来获取私有内存量。
您可以传入 NULL long* cl_mem 引用来分配内存。我认为最好为每个 WI 使用静态内存量。假设每个工作项都需要 long[200],您将使用下面的代码。将工作划分为具有相同(或相似)内存要求的组也是一个好主意,以便充分利用 LDS 内存。
I think you are looking for __local memory. That is what 32KB of local data storage is referring to. I don't think you can poll the device to get the private memory amount.
You can pass in a NULL long* cl_mem reference to allocate the memory. I think it is best to use a static amount of memory per WI. Assuming that long[200] will be required for each work item, you would use the code below. It would also be a good idea to divide the work into groups that have the same (or similar) memory requirements, in order to get the most out of the LDS memory.
要回答 79xx 系列卡中的寄存器文件有多大,因为它基于 GCN 架构,根据 anandtech 中的图像,它是 64KB: http://www.anandtech.com/print/5261
要回答您的问题如何找出每个内核使用了多少内存..您可以在您的内核上运行 AMD APP Profiler,它在内核占用部分告诉您内核使用了多少空间。
To answer how large is register file in a 79xx series card, since its based on GCN architecture it is 64KB as per the image in anandtech : http://www.anandtech.com/print/5261
To answer your question how to find out how much memory each kernel uses.. you can look run AMD APP Profiler on your kernel, it tell you in the kernel occupancy section how much space is utilized by the kernel.