memcpy 是否有标准的跨步版本？

发布于 2024-11-07 02:57:13 字数 355 浏览 9 评论 0原文

我有一个列向量 A，它有 10 个元素长。我有一个 10 x 10 的矩阵 B。B 的内存存储是列主的。我想用列向量 A 覆盖 B 中的第一行。

显然，我可以这样做：

for ( int i=0; i < 10; i++ )
{
    B[0 + 10 * i] = A[i];
}

将 0 + 10 * i 中的零保留为强调 B 使用列优先存储（零是行索引）。

今晚在 CUDA 领域进行了一些恶作剧之后，我想到可能有一个 CPU 函数来执行跨步 memcpy？我猜想在低级别上，性能将取决于跨步加载/存储指令的存在，我不记得 x86 程序集中是否存在该指令？

原文

I have a column vector A which is 10 elements long. I have a matrix B which is 10 by 10. The memory storage for B is column major. I would like to overwrite the first row in B with the column vector A.

Clearly, I can do:

for ( int i=0; i < 10; i++ )
{
    B[0 + 10 * i] = A[i];
}

where I've left the zero in 0 + 10 * i to highlight that B uses column-major storage (zero is the row-index).

After some shenanigans in CUDA-land tonight, I had a thought that there might be a CPU function to perform a strided memcpy?? I guess at a low-level, performance would depend on the existence of a strided load/store instruction, which I don't recall there being in x86 assembly?

分享到QQ

分享到微博