SSE 内在函数中的 if/else 语句
我正在尝试使用 SSE 内在函数优化一小段代码(我是该主题的初学者),但我对条件的使用有点困惑。
我原来的代码是:
unsigned long c;
unsigned long constant = 0x12345678;
unsigned long table[256];
int n, k;
for( n = 0; n < 256; n++ )
{
c = n;
for( k = 0; k < 8; k++ )
{
if( c & 1 ) c = constant ^ (c >> 1);
else c >>= 1;
}
table[n] = c;
}
这段代码的目标是计算一个crc表(常数可以是任何多项式,它在这里不起作用),
我想我的优化代码会是这样的:
__m128 x;
__m128 y;
__m128 *table;
x = _mm_set_ps(3, 2, 1, 0);
y = _mm_set_ps(3, 2, 1, 0);
//offset for incrementation
offset = _mm_set1_ps(4);
for( n = 0; n < 64; n++ )
{
y = x;
for( k = 0; k < 8; k++ )
{
//if do something with y
//else do something with y
}
table[n] = y;
x = _mm_add_epi32 (x, offset);
}
我不知道如何检查 if-else 语句,但我怀疑有一个聪明的技巧。有人知道如何做到这一点吗?
(除此之外,我的优化可能相当差 - 任何建议或更正都会得到最大的同情)
I am trying to optimize a small piece of code with SSE intrinsics (I am a complete beginner on the topic), but I am a little stuck on the use of conditionals.
My original code is:
unsigned long c;
unsigned long constant = 0x12345678;
unsigned long table[256];
int n, k;
for( n = 0; n < 256; n++ )
{
c = n;
for( k = 0; k < 8; k++ )
{
if( c & 1 ) c = constant ^ (c >> 1);
else c >>= 1;
}
table[n] = c;
}
The goal of this code is to compute a crc table (the constant can be any polynomial, it doesn't play a role here),
I suppose my optimized code would be something like:
__m128 x;
__m128 y;
__m128 *table;
x = _mm_set_ps(3, 2, 1, 0);
y = _mm_set_ps(3, 2, 1, 0);
//offset for incrementation
offset = _mm_set1_ps(4);
for( n = 0; n < 64; n++ )
{
y = x;
for( k = 0; k < 8; k++ )
{
//if do something with y
//else do something with y
}
table[n] = y;
x = _mm_add_epi32 (x, offset);
}
I have no idea how to go through the if-else statement, but I suspect there is a clever trick. Has anybody an idea on how to do that?
(Aside from this, my optimization is probably quite poor - any advice or correction on it would be treated with the greatest sympathy)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以完全摆脱 if/else。在我生成 MMX 汇编代码的时代,这是一项常见的编程活动。让我从对“假”语句的一系列改造开始:
为什么我要引入异或?因为异或也出现在“true”语句中:
注意到相似性了吗?在“真”部分中,我们与常数进行异或,而在假部分中,我们与零进行异或。
现在我将向您展示整个 if/else 语句的一系列转换:
现在,两个分支仅在二进制与的第二个参数上有所不同,这可以根据条件本身进行简单计算,从而使我们能够摆脱 if/else:
免责声明:此解决方案仅适用于二进制补码架构,其中 -1 表示“所有位设置”。
You can get rid of the if/else entirely. Back in the days when I produced MMX assembly code, that was a common programming activity. Let me start with a series of transformations on the "false" statement:
Why did I introduce the exclusive-or? Because exclusive-or is also found in the "true" statement:
Note the similarity? In the "true" part, we xor with a constant, and in the false part, we xor with zero.
Now I'm going to show you a series of transformations on the entire if/else statement:
Now the two branches only differ in the second argument to the binary-and, which can be calculated trivially from the condition itself, thus enabling us to get rid of the if/else:
Disclaimer: This solution only works on a two's complement architecture where -1 means "all bits set".
SSE 的想法是构建两个结果,然后将结果混合在一起。
例如:
另外注意,这不是完整的代码,只是为了演示原理。
The idea in SSE is to build both results and then blend the results together.
E.g. :
Note beside, this is not complete code, only to demonstrate the principle.
高效计算 CRC 的第一步是使用比位更宽的基本单位。有关操作方法的示例,请参阅此处每个字节这个字节。
The first step in efficiently computing CRC is using a wider basic unit than the bit. See here for an example of how to do this byte per byte.