如何在金属着色器中执行2个矢量的外产物?

发布于 2025-01-22 04:49:05 字数 544 浏览 0 评论 0原文

因此,我正在开发一个神经网络以在GPU上在iOS中运行的神经网络,因此,使用我需要的矩阵符号(为了将错误反向传播)能够执行2个向量的外产品。


// Outer product of vector A and Vector B
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         device float *outVector [[ buffer(2) ]],
                         uint id [[ thread_position_in_grid ]]) {
    
    outVector[id] = inVectorA[id] * inVectorB[***?***]; // How to find this position on the thread group (or grid)?
}

So I'm developing a Neural Network to run in iOS on the GPU, so using matrix notation I need (in order to backpropagate the errors) be able to perform an outer product of 2 vectors.


// Outer product of vector A and Vector B
kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         device float *outVector [[ buffer(2) ]],
                         uint id [[ thread_position_in_grid ]]) {
    
    outVector[id] = inVectorA[id] * inVectorB[***?***]; // How to find this position on the thread group (or grid)?
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

在你怀里撒娇 2025-01-29 04:49:05

您正在使用thread_position_in_grid错误地。如果要派遣2D网格,则应是uint2ushort2,否则仅获得x坐标。请参阅金属阴影语言规范)的表5.7。

我不确定我们在谈论哪种外部产品,但我认为输出应该是矩阵。如果您要线性存储它,则计算Outvector的代码应该看起来像这样:

kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         uint2 gridSize [[ threads_per_grid ]],
                         device float *outVector [[ buffer(2) ]],
                         uint2 id [[ thread_position_in_grid ]]) {
    
    outVector[id.y * gridSize.x + id.x] = inVectorA[id.x] * inVectorB[id.y];
}

另外,如果您要派发网格的大小,则invectora xInvectorB ,您可以在内核参数上使用属性threads_per_grid以找出网格的大小。

另外,您可以将向量的大小与向量本身一起传递。

You are using thread_position_in_grid incorrectly. If you are dispatching a 2D grid, it should be uint2 or ushort2, otherwise it only gets the x coordinate. Refer to table 5.7 in Metal Shading Language specification.

I'm not sure which outer product are we talking about, but I think the output should be a matrix. If you are storing it linearly, then your code to calculate the outVector should look something like this:

kernel void outerProduct(const device float *inVectorA [[ buffer(0) ]],
                         const device float *inVectorB [[ buffer(1) ]],
                         uint2 gridSize [[ threads_per_grid ]],
                         device float *outVector [[ buffer(2) ]],
                         uint2 id [[ thread_position_in_grid ]]) {
    
    outVector[id.y * gridSize.x + id.x] = inVectorA[id.x] * inVectorB[id.y];
}

Also, if you are dispatching a grid exactly the size of inVectorAxinVectorB, you can use attribute threads_per_grid on a kernel argument to find out how big the grid is.

Alternatively, you can just pass the sizes of the vectors alongside the vectors themselves.

Bonjour°[大白 2025-01-29 04:49:05

我惊讶地发现金属没有2D跨产品(又称内部产品),所以在这里

float cross( float2 A, float2 B )
{
    float2 C = A.xy * B.yx;  // <- note B's swizzle
    return C.x - C.y;
}

回答您的问题:

float X = cross( inVectorA.read( id ), inVectorB.read( id ) );
outVector.write( X, id );

I was surprised to learn that Metal doesn't have a 2D cross product (aka inner product), so here it is

float cross( float2 A, float2 B )
{
    float2 C = A.xy * B.yx;  // <- note B's swizzle
    return C.x - C.y;
}

So to answer you question:

float X = cross( inVectorA.read( id ), inVectorB.read( id ) );
outVector.write( X, id );
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文