SIMD 在这种情况下表现如何?
我使用的引擎允许编写 SIMD 代码,并且执行速度很快。但只有一个块包含所有代码。
我知道这段代码是在每个实体上同时独立运行的,但是当只有 1 件事发生变化时,无论如何计算它是否仍然更快?这就是 SIMD 并行性的想法吗?
例如:
void simdFunction ()
{
center = mesh.center(); // always the same
vert.pos.x = center.x; // run on each vertex
}
在这种情况下,中心总是相同的,那么是否会在SIMD上对每个顶点进行计算?如果是这样,这仍然有效吗?
基本上,无论从一般的 SIMD 编程意义上来说,能够并行运行它是否超过计算它的成本?
I am using an engine that allows SIMD code to be written, and it performs fast. But there is only one block that has all the code.
I understand that this code is run independently on each entity concurrently, but when there is only 1 thing changing, is it still faster to calculate it regardless? Is this the idea with SIMD, parallelism?
For instance:
void simdFunction ()
{
center = mesh.center(); // always the same
vert.pos.x = center.x; // run on each vertex
}
In this case, the center is always the same, so will it be calculated for each vertex on SIMD? If so, is this still efficient?
Basically does being able to run this in parallel outweighs the cost of calculating it regardless in the general SIMD programming sense?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不,这不是 SIMD 的工作原理。
使用 SIMD,所有算术单元都以锁步方式工作,执行相同的操作。一点独立性都没有。
但一般来说,最好在顺序代码中只计算一次共享常量。这样,SIMD 引擎将在每个顶点切片上花费更少的时间。
例外情况是,如果计算时间很短,SIMD 是协处理器(如 GPGPU),并且数据已经在该协处理器中。然后,使用 SIMD 进行计算可能会轻松地将数据移回顺序处理器并返回。
No, that's not how SIMD works.
With SIMD, all arithmetic units are working in lock-step, performing identical operations. There's no independence whatsoever.
Generally though, you're better off computing shared constants just once, in sequential code. That way the SIMD engine will spend less time on each slice of vertices.
The exception would be if the computation is short, the SIMD is a co-processor (like GPGPU), and the data is already in that co-processor. Then computing it using SIMD might easily beat moving data back to the sequential processor and back.