金属仪器 simdgroup_load 或 simdgroup_store 有什么问题?

发布于 2025-01-18 00:33:58 字数 1655 浏览 8 评论 0原文

操作系统:MacOS 12.2.1

硬件:MacBook Pro 2020,M1

金属:2.4

Xcode:13.2.1

这是我的测试计算机内核, 使用SimdGroup_load adn With With With with With with simdgroup_store_store_store 读取输入缓冲区

kernel void fun(
const device half * Src                 [[ buffer(0) ]],
constant uint4 & SrcShape               [[ buffer(1) ]],
device half * Dst                       [[ buffer(2) ]],
constant uint4 & DstShape               [[ buffer(3) ]],
const device half * Weight              [[ buffer(4) ]],
ushort3 threadgroup_position_in_grid    [[ threadgroup_position_in_grid ]],
ushort3 thread_position_in_threadgroup  [[ thread_position_in_threadgroup ]],
ushort3 threads_per_threadgroup         [[ threads_per_threadgroup ]],
ushort3 thread_position_in_grid         [[ thread_position_in_grid ]])
{

    const int SrcSlices = (int)SrcShape[0];
    const int SrcHeight = (int)SrcShape[1];
    const int SrcWidth  = (int)SrcShape[2];
    const int DstSlices = (int)DstShape[0];
    const int DstHeight = (int)DstShape[1];
    const int DstWidth  = (int)DstShape[2];
    const int Kernel_X = 3;
    const int KernelElemNum = 3 * 3;
    const int N_Pack = 8;

   // Test only 1 thread
   if(thread_position_in_grid.z != 0|| thread_position_in_grid.y != 0|| thread_position_in_grid.x * N_Pack != 0) return;

    simdgroup_half8x8 sgMatY;
    simdgroup_load(sgMatY, Src);

    simdgroup_store(sgMatY, Dst);

}

这是一个简单的着色器,但是,输出缓冲区仅从输入缓冲区保存前2个值,其他62个值都是零。

的结果

这是Xcode Metal Capture

。 net/fi1pu.png“ alt =”“>

OS: MacOS 12.2.1

Hardwear: MacBook Pro 2020, M1

Metal: 2.4

Xcode: 13.2.1

Here is my test computer kernel,which read input buffer with simdgroup_load adn write output buffer with simdgroup_store

kernel void fun(
const device half * Src                 [[ buffer(0) ]],
constant uint4 & SrcShape               [[ buffer(1) ]],
device half * Dst                       [[ buffer(2) ]],
constant uint4 & DstShape               [[ buffer(3) ]],
const device half * Weight              [[ buffer(4) ]],
ushort3 threadgroup_position_in_grid    [[ threadgroup_position_in_grid ]],
ushort3 thread_position_in_threadgroup  [[ thread_position_in_threadgroup ]],
ushort3 threads_per_threadgroup         [[ threads_per_threadgroup ]],
ushort3 thread_position_in_grid         [[ thread_position_in_grid ]])
{

    const int SrcSlices = (int)SrcShape[0];
    const int SrcHeight = (int)SrcShape[1];
    const int SrcWidth  = (int)SrcShape[2];
    const int DstSlices = (int)DstShape[0];
    const int DstHeight = (int)DstShape[1];
    const int DstWidth  = (int)DstShape[2];
    const int Kernel_X = 3;
    const int KernelElemNum = 3 * 3;
    const int N_Pack = 8;

   // Test only 1 thread
   if(thread_position_in_grid.z != 0|| thread_position_in_grid.y != 0|| thread_position_in_grid.x * N_Pack != 0) return;

    simdgroup_half8x8 sgMatY;
    simdgroup_load(sgMatY, Src);

    simdgroup_store(sgMatY, Dst);

}

It's a simple shader, however output buffer only save first 2 values from input buffer,the other 62 values are ALL ZERO.

Here is the result from Xcode Metal Capture

How to debug or fix it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文