GCC 中用于除法的 SIMD (SSE) 指令
如果可能的话,我想使用 SSE 指令优化以下代码片段:
/*
* the data structure
*/
typedef struct v3d v3d;
struct v3d {
double x;
double y;
double z;
} tmp = { 1.0, 2.0, 3.0 };
/*
* the part that should be "optimized"
*/
tmp.x /= 4.0;
tmp.y /= 4.0;
tmp.z /= 4.0;
这可能吗?
I'd like to optimize the following snippet using SSE instructions if possible:
/*
* the data structure
*/
typedef struct v3d v3d;
struct v3d {
double x;
double y;
double z;
} tmp = { 1.0, 2.0, 3.0 };
/*
* the part that should be "optimized"
*/
tmp.x /= 4.0;
tmp.y /= 4.0;
tmp.z /= 4.0;
Is this possible at all?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我在windows下使用过SIMD扩展,但在linux下还没有使用过。话虽如此,您应该能够利用 DIVPS SSE 操作,它将 4 个浮点向量除以另一个 4 个浮点向量。但您使用的是双打,因此您需要 SSE2 版本
DIVPD
。我差点忘了,确保使用-msse2
开关进行构建。我找到了一个详细介绍一些 SSE GCC 内置函数的页面。它看起来有点旧,但应该是一个好的开始。
http://ds9a.nl/gcc-simd/
I've used SIMD extension under windows, but have not yet under linux. That being said you should be able to take advantage of the
DIVPS
SSE operation which will divide a 4 float vector by another 4 float vector. But you are using doubles, so you'll want the SSE2 versionDIVPD
. I almost forgot, make sure to build with-msse2
switch.I found a page which details some SSE GCC builtins. It looks kind of old, but should be a good start.
http://ds9a.nl/gcc-simd/
tmp.x *= 0.25;
足够了吗?请注意,对于 SSE 指令(如果您想使用它们),重要的是:
1) 所有内存访问都是 16 字节对齐
2) 操作在循环中执行
3) 没有 int <-> 。浮动或浮动<->执行双重转换
4) 尽可能避免除法
Is
tmp.x *= 0.25;
enough?Note that for SSE instructions (in case that you want to use them) it's important that:
1) all the memory access is 16 bytes alighed
2) the operations are performed in a loop
3) no int <-> float or float <-> double conversions are performed
4) avoid divisions if possible
您正在寻找的内在函数是
_mm_div_pd
。这是一个有效的示例,应该足以引导您走向正确的方向:The intrinsic you are looking for is
_mm_div_pd
. Here is a working example which should be enough to steer you in the right direction: