如何利用SIMD功能来使RGBA像素的8位组件之间的平方差异总和?
以下代码试图提取像素值的红色,绿色和蓝色通道,并用另一组RGB值执行算术。 看来代码在逻辑周围试图执行平方和添加的逻辑很慢。
可以用更快的版本替换它的可能性,因为此逻辑似乎根本没有使用SIMD功能。
typedef struct {
unsigned char b, g, r, a;
} pixel;
register pixel *pPixel;
register int i, red1, green1, blue1, alpha1;
register int red2, green2, blue2, alpha2;
register long oldD, newD;
red1 = GetRed( *pPixel );
green1 = GetGreen( *pPixel );
blue1 = GetBlue( *pPixel );
alpha1 = GetAlpha( *pPixel );
oldD = 2000000000;
for ( i = 0; i < newcolors; ++i ) {
red2 = GetRed( mycolormap[i].acolor );
green2 = GetGreen( mycolormap[i].acolor );
blue2 = GetBlue( mycolormap[i].acolor );
alpha2 = GetAlpha( mycolormap[i].acolor );
newD = ( red1 - red2 ) * ( red1 - red2 ) +
( green1 - green2 ) * ( green1 - green2 ) +
( blue1 - blue2 ) * ( blue1 - blue2 ) +
( alpha1 - alpha2 ) * ( alpha1 - alpha2 );
if ( newD < oldD ) {
oldD = newD;
}
}
下面的代码部分似乎需要改进
newD = ( red1 - red2 ) * ( red1 - red2 ) +
( green1 - green2 ) * ( green1 - green2 ) +
( blue1 - blue2 ) * ( blue1 - blue2 ) +
( alpha1 - alpha2 ) * ( alpha1 - alpha2 );
The below code is trying to extract the red, green and blue channel of a pixel value and performing an arithmetic with another set of RGB values.
It seems that code is slow around the logic where its trying to perform the squaring and addition.
What would be the possibility to replace it with a faster version as this logic doesn't seems to be using SIMD capabilities at all.
typedef struct {
unsigned char b, g, r, a;
} pixel;
register pixel *pPixel;
register int i, red1, green1, blue1, alpha1;
register int red2, green2, blue2, alpha2;
register long oldD, newD;
red1 = GetRed( *pPixel );
green1 = GetGreen( *pPixel );
blue1 = GetBlue( *pPixel );
alpha1 = GetAlpha( *pPixel );
oldD = 2000000000;
for ( i = 0; i < newcolors; ++i ) {
red2 = GetRed( mycolormap[i].acolor );
green2 = GetGreen( mycolormap[i].acolor );
blue2 = GetBlue( mycolormap[i].acolor );
alpha2 = GetAlpha( mycolormap[i].acolor );
newD = ( red1 - red2 ) * ( red1 - red2 ) +
( green1 - green2 ) * ( green1 - green2 ) +
( blue1 - blue2 ) * ( blue1 - blue2 ) +
( alpha1 - alpha2 ) * ( alpha1 - alpha2 );
if ( newD < oldD ) {
oldD = newD;
}
}
Below section of code seems to be requiring improvement
newD = ( red1 - red2 ) * ( red1 - red2 ) +
( green1 - green2 ) * ( green1 - green2 ) +
( blue1 - blue2 ) * ( blue1 - blue2 ) +
( alpha1 - alpha2 ) * ( alpha1 - alpha2 );
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这比看起来要困难。不幸的是,对于您来说,C ++编译器中的自动矢量化器很少在整数算术上做得很好,就像您在那里一样。
以下实现只需要SSE4.1。如果您有可能通过将所有这些向量升级到32字节来实质上改进,但是这将使几件事,剩余和最终减少变得复杂。
我不仅认为您想要最小点产品,还要是像素的索引。如果仅想要最小点产品,请删除
BestIndices
字段和处理该字段的代码。It’s harder than it seems. Unfortunately for you, automatic vectorizers in C++ compilers are very rarely doing a good job for integer arithmetic, like you have there.
The following implementation only needs SSE4.1. If you have AVX2 possible to improve substantially by upgrading all these vectors to 32-byte ones, however this will complicate a couple things, remainder and final reduction.
I assumed not only you want the minimum dot product, also the index of the pixel. If you only want the minimum dot product, remove
bestIndices
field and the code which handles that field.