从脚本中较高的函数调用脚本中较低的函数
我正在尝试想出一种方法让计算机为我做一些工作。我正在使用 SIMD(SSE2 和 SSE3)来计算叉积,我想知道它是否可以更快。目前我有以下内容:
const int maskShuffleCross1 = _MM_SHUFFLE(3,0,2,1); // y z x
const int maskShuffleCross2 = _MM_SHUFFLE(3,1,0,2); // z x y
__m128 QuadCrossProduct(__m128* quadA, __m128* quadB)
{
// (y * other.z) - (z * other.y)
// (z * other.x) - (x * other.z)
// (x * other.y) - (y * other.x)
return
(
_mm_sub_ps
(
_mm_mul_ps
(
_mm_shuffle_ps(*quadA, *quadA, maskShuffleCross1),
_mm_shuffle_ps(*quadB, *quadB, maskShuffleCross2)
),
_mm_mul_ps
(
_mm_shuffle_ps(*quadA, *quadA, maskShuffleCross2),
_mm_shuffle_ps(*quadB, *quadB, maskShuffleCross1)
)
)
);
}
如您所见,其中有四个 _mm_shuffle_ps
,我想知道是否可以用 _mm_unpackhi_ps
和 的组合替换它们>_mm_unpacklo_ps
分别返回 a2 a3 b2 b3
和 a0 a1 b0 b1
并且速度稍快。
我无法在纸上弄清楚,但我想到了一个解决方案。如果让计算机暴力破解所需的步骤怎么办?只需递归地逐步执行不同的选项,看看什么给出了正确的答案。
我让它与乘法一起工作,当我希望它返回时它会返回这个 (3, 12, 27, 0):
startA = _mm_set_ps(1.00, 2.00, 3.00, 0.00);
startB = _mm_set_ps(3.00, 3.00, 3.00, 0.00);
result0 = _mm_mul_ps(startA, startB);
// (3.00, 6.00, 9.00, 0.00)
result1 = _mm_mul_ps(startA, result0);
// (3.00, 12.00, 27.00, 0.00)
非常好,如果我自己这么说的话。
然而,当我想实现divide时,我偶然发现了一个问题。乘法不仅必须调用乘法,还必须调用除法。好的,所以我们把除法放在乘法之上。但是divide不仅仅需要调用divide,它还必须调用multiply,它在脚本中的位置较低,所以它还不存在。
我从 Visual C++ 中的空控制台应用程序开始,并将所有内容放入 QuadTests.cpp 中。
如何确保这两个函数可以互相调用?
提前致谢。
I'm trying to come up with a way to make the computer do some work for me. I'm using SIMD (SSE2 & SSE3) to calculate the cross product, and I was wondering if it could go any faster. Currently I have the following:
const int maskShuffleCross1 = _MM_SHUFFLE(3,0,2,1); // y z x
const int maskShuffleCross2 = _MM_SHUFFLE(3,1,0,2); // z x y
__m128 QuadCrossProduct(__m128* quadA, __m128* quadB)
{
// (y * other.z) - (z * other.y)
// (z * other.x) - (x * other.z)
// (x * other.y) - (y * other.x)
return
(
_mm_sub_ps
(
_mm_mul_ps
(
_mm_shuffle_ps(*quadA, *quadA, maskShuffleCross1),
_mm_shuffle_ps(*quadB, *quadB, maskShuffleCross2)
),
_mm_mul_ps
(
_mm_shuffle_ps(*quadA, *quadA, maskShuffleCross2),
_mm_shuffle_ps(*quadB, *quadB, maskShuffleCross1)
)
)
);
}
As you can see, there are four _mm_shuffle_ps
's in there, and I wondered if I could replace them with a combination of _mm_unpackhi_ps
and _mm_unpacklo_ps
which return a2 a3 b2 b3
and a0 a1 b0 b1
respectively and are slightly faster.
I couldn't figure it out on paper, but I thought of a solution. What if let the computer bruteforce the steps required? Just recursively step through the different options and see what gives the correct answer.
I got it work with multiply, it returns this when I want it to return (3, 12, 27, 0):
startA = _mm_set_ps(1.00, 2.00, 3.00, 0.00);
startB = _mm_set_ps(3.00, 3.00, 3.00, 0.00);
result0 = _mm_mul_ps(startA, startB);
// (3.00, 6.00, 9.00, 0.00)
result1 = _mm_mul_ps(startA, result0);
// (3.00, 12.00, 27.00, 0.00)
Very nice, if I say so myself.
However, when I wanted to implement divide I stumbled on a problem. Multiply doesn't just have to call multiply, it also has to call divide. Okay, so we put divide above multiply. But divide doesn't just have to call divide, it also has to call multiply, which is lower in the script, so it doesn't exist yet.
I started with an empty console application in Visual C++ and put everything in QuadTests.cpp.
How do I make sure these two functions can call each other?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只是为了确认,您的问题是像这样排列的函数不起作用,因为在您从
getFoo
调用它时尚未声明doStuff
:要解决此问题,您需要对
int doStuff(int)
进行前向声明 。通常,这是通过头文件完成的——无论哪种方式,您只需要添加如下内容:Just to confirm, your problem is that functions arranged like this don't work, because
doStuff
isn't declared by the time you call it fromgetFoo
:To fix this, you need to make a forward declaration of
int doStuff(int)
. Often, this is done with a header file -- either way, you just need to add something like this: