SSE 从 __m128 中提取整数以索引数组
在一些我已转换为 SSE 的代码中,我执行了一些光线追踪,使用 __m128 数据类型一次追踪 4 条光线。
在确定首先击中哪些对象的方法中,我循环遍历所有对象,测试相交并创建一个蒙版,表示哪些光线比之前发现的更早有相交。
我还需要维护与最佳命中时间相对应的对象 ID 的数据。为此,我维护一个名为 objectNo 的 __m128 数据类型,并使用根据交集时间确定的掩码来更新 objectNo,如下所示:
objectNo = _mm_blendv_ps(objectNo,_mm_set1_ps((float)pobj->getID()),mask);
其中 pobj->getID() 将返回表示当前对象 id 的整数。进行此投射并使用混合似乎是更新所有 4 条光线的 objectNo 的最有效方法。
测试所有交叉点后,我尝试单独提取 objectNo 并使用它们来访问数组以注册交叉点。最常见的是我尝试过这个:
int o0 = _mm_extract_ps(objectNo, 0);
prv_noHits[o0]++;
但是,当提取值为 1.0 的浮点数转换为值为 1065353216 的 int 时,这会导致 EXC_BAD_ACCESS 崩溃。
如何正确地将 __m128 解包为可用于索引数组的整数?
In some code I have converted to SSE I preform some ray tracing, tracing 4 rays at a time using __m128 data types.
In the method where I determine which objects are hit first, I loop through all objects, test for intersection and create a mask representing which rays had an intersection earlier than previously found .
I also need to maintain data on the id of the objects which correspond to the best hit times. I do this by maintaining a __m128 data type called objectNo and I use the mask determined from the intersection times to update objectNo as follows:
objectNo = _mm_blendv_ps(objectNo,_mm_set1_ps((float)pobj->getID()),mask);
Where pobj->getID() will return an integer representing the id of the current object. Making this cast and using the blend seemed to be the most efficient way of updating the objectNo for all 4 rays.
After all intersections are tested I try to extract the objectNo's individually and use them to access an array to register the intersection. Most commonly I have tried this:
int o0 = _mm_extract_ps(objectNo, 0);
prv_noHits[o0]++;
However this crashes with EXC_BAD_ACCESS as extracting a float with value 1.0 converts to an int of value 1065353216.
How do I correctly unpack the __m128 into ints which can be used to index an array?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有两个 SSE2 转换内在函数似乎可以满足您的要求:
_mm_cvtps_epi32()
_mm_cvttps_epi32()
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_conversion.htm
这些将转换 4 个单- 精度 FP 为 4 个 32 位整数。第一个是通过四舍五入来实现的。第二个使用截断。
所以它们可以这样使用:
编辑:根据你想要做的事情,我觉得可以更好地优化如下:
这个版本摆脱了不必要的转换。但您需要使用不同的掩码向量。
编辑 2:这是一种无需更改掩码的方法:
请注意,
_mm_castsi128_ps()
内在函数不映射任何指令。它只是从__m128i
到__m128
的逐位数据类型转换,以解决 C/C++ 中的“类型问题”。There are two SSE2 conversion intrinsics which seem to do what you want:
_mm_cvtps_epi32()
_mm_cvttps_epi32()
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_conversion.htm
These will convert 4 single-precision FP to 4 32-bit integers. The first one does it with rounding. The second one uses truncation.
So they can be used like this:
EDIT : Based on what you're trying to do, I feel this can be better optimized as follows:
This version gets rid of the unnecessary conversions. But you will need to use a different mask vector.
EDIT 2: Here's a way so that you won't have to change your mask:
Note that the
_mm_castsi128_ps()
intrinsic doesn't map any instruction. It's just a bit-wise datatype conversion from__m128i
to__m128
to get around the "typeness" in C/C++.