手动矢量化 C 代码的最佳方法
我想手动矢量化一些 C 代码,以加快速度。为此(Cell 处理器上的 SPE 或 CBE),我想使用 SIMD 数学。该代码最初使用了一些物理向量计算(速度、加速度等),因此在代码的某些部分有很多操作,例如;
ax=a*vx+b*rx;
ay=a*vy+b*ry;
az=d*vz+b*rz;
所以此时我考虑将 v 和 r 转换为向量(在 SPE 上,一个向量可以包含 4 个单浮点值),所以在伪代码中它应该是这样的,
vector V,R,A;
V.x=vx;
R.x=r.x; (and same for the others "y,z")
A=spu_sum(spu_prod(a,V),spu_prod(b,R));
ax=A.x; (and same for the others "y,z")
所以你认为这种方法值得还是你可以考虑一种更好的方法一?
谢谢
I want to vectorize by hand some C code, in order to it speedup. For that purpose (SPE on the Cell processor or CBE) I want to use SIMD math. The code originally uses some physical vector calculations (speed, acceleration, etc), so in some parts of the code there is a lot of operations like;
ax=a*vx+b*rx;
ay=a*vy+b*ry;
az=d*vz+b*rz;
so at this point I thought about converting v's and r's to vectors (on the SPE, one vector can contain 4 single float values), so in pseudocode it should be something like
vector V,R,A;
V.x=vx;
R.x=r.x; (and same for the others "y,z")
A=spu_sum(spu_prod(a,V),spu_prod(b,R));
ax=A.x; (and same for the others "y,z")
so do you think this approach worths or can you think about a better one?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您必须在每次 SIMD 计算时打包和解包组件,那么您根本不可能获得太多(如果有的话)加速。
您确实需要看看是否可以进行更深层次的更改,以便组件通常保持矢量形式并尽可能作为矢量传递。
If you have to pack and unpack the components at every SIMD calculation, you're unlikely to get much, if any, speedup at all.
You really need to see if you can make deeper changes, so that the components are normally kept in vector form and passed around as vectors as much as possible.