SIMD 在这种情况下表现如何?

发布于 2024-12-27 13:49:17 字数 400 浏览 1 评论 0原文

我使用的引擎允许编写 SIMD 代码,并且执行速度很快。但只有一个块包含所有代码。

我知道这段代码是在每个实体上同时独立运行的,但是当只有 1 件事发生变化时,无论如何计算它是否仍然更快?这就是 SIMD 并行性的想法吗?

例如:

void simdFunction ()
{
    center = mesh.center();    // always the same
    vert.pos.x = center.x;    // run on each vertex
}

在这种情况下,中心总是相同的,那么是否会在SIMD上对每个顶点进行计算?如果是这样,这仍然有效吗?

基本上,无论从一般的 SIMD 编程意义上来说,能够并行运行它是否超过计算它的成本?

I am using an engine that allows SIMD code to be written, and it performs fast. But there is only one block that has all the code.

I understand that this code is run independently on each entity concurrently, but when there is only 1 thing changing, is it still faster to calculate it regardless? Is this the idea with SIMD, parallelism?

For instance:

void simdFunction ()
{
    center = mesh.center();    // always the same
    vert.pos.x = center.x;    // run on each vertex
}

In this case, the center is always the same, so will it be calculated for each vertex on SIMD? If so, is this still efficient?

Basically does being able to run this in parallel outweighs the cost of calculating it regardless in the general SIMD programming sense?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

_蜘蛛 2025-01-03 13:49:17

此代码在每个实体上同时独立运行

不,这不是 SIMD 的工作原理。

使用 SIMD,所有算术单元都以锁步方式工作,执行相同的操作。一点独立性都没有。

但一般来说,最好在顺序代码中只计算一次共享常量。这样,SIMD 引擎将在每个顶点切片上花费更少的时间。

例外情况是,如果计算时间很短,SIMD 是协处理器(如 GPGPU),并且数据已经在该协处理器中。然后,使用 SIMD 进行计算可能会轻松地将数据移回顺序处理器并返回。

this code is run independently on each entity concurrently

No, that's not how SIMD works.

With SIMD, all arithmetic units are working in lock-step, performing identical operations. There's no independence whatsoever.

Generally though, you're better off computing shared constants just once, in sequential code. That way the SIMD engine will spend less time on each slice of vertices.

The exception would be if the computation is short, the SIMD is a co-processor (like GPGPU), and the data is already in that co-processor. Then computing it using SIMD might easily beat moving data back to the sequential processor and back.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文