从脚本中较高的函数调用脚本中较低的函数

发布于 2024-08-03 21:05:23 字数 1610 浏览 2 评论 0原文

我正在尝试想出一种方法让计算机为我做一些工作。我正在使用 SIMD(SSE2 和 SSE3)来计算叉积,我想知道它是否可以更快。目前我有以下内容:

const int maskShuffleCross1 = _MM_SHUFFLE(3,0,2,1); // y z x
const int maskShuffleCross2 = _MM_SHUFFLE(3,1,0,2); // z x y

__m128 QuadCrossProduct(__m128* quadA, __m128* quadB)
{
   // (y * other.z) - (z * other.y)
   // (z * other.x) - (x * other.z)
   // (x * other.y) - (y * other.x)

   return
   (
      _mm_sub_ps
      (
         _mm_mul_ps
         (
            _mm_shuffle_ps(*quadA, *quadA, maskShuffleCross1),
            _mm_shuffle_ps(*quadB, *quadB, maskShuffleCross2)
         ),
         _mm_mul_ps
         (
            _mm_shuffle_ps(*quadA, *quadA, maskShuffleCross2),
            _mm_shuffle_ps(*quadB, *quadB, maskShuffleCross1)
         )
      )
   );
}

如您所见,其中有四个 _mm_shuffle_ps,我想知道是否可以用 _mm_unpackhi_ps的组合替换它们>_mm_unpacklo_ps 分别返回 a2 a3 b2 b3a0 a1 b0 b1 并且速度稍快。

我无法在纸上弄清楚,但我想到了一个解决方案。如果让计算机暴力破解所需的步骤怎么办?只需递归地逐步执行不同的选项,看看什么给出了正确的答案。

我让它与乘法一起工作,当我希望它返回时它会返回这个 (3, 12, 27, 0):

startA = _mm_set_ps(1.00, 2.00, 3.00, 0.00);
startB = _mm_set_ps(3.00, 3.00, 3.00, 0.00);
result0 = _mm_mul_ps(startA, startB);
// (3.00, 6.00, 9.00, 0.00)
result1 = _mm_mul_ps(startA, result0);
// (3.00, 12.00, 27.00, 0.00)

非常好,如果我自己这么说的话。

然而,当我想实现divide时,我偶然发现了一个问题。乘法不仅必须调用乘法,还必须调用除法。好的,所以我们把除法放在乘法之上。但是divide不仅仅需要调用divide,它还必须调用multiply,它在脚本中的位置较低,所以它还不存在。

我从 Visual C++ 中的空控制台应用程序开始,并将所有内容放入 QuadTests.cpp 中。

如何确保这两个函数可以互相调用?

提前致谢。

I'm trying to come up with a way to make the computer do some work for me. I'm using SIMD (SSE2 & SSE3) to calculate the cross product, and I was wondering if it could go any faster. Currently I have the following:

const int maskShuffleCross1 = _MM_SHUFFLE(3,0,2,1); // y z x
const int maskShuffleCross2 = _MM_SHUFFLE(3,1,0,2); // z x y

__m128 QuadCrossProduct(__m128* quadA, __m128* quadB)
{
   // (y * other.z) - (z * other.y)
   // (z * other.x) - (x * other.z)
   // (x * other.y) - (y * other.x)

   return
   (
      _mm_sub_ps
      (
         _mm_mul_ps
         (
            _mm_shuffle_ps(*quadA, *quadA, maskShuffleCross1),
            _mm_shuffle_ps(*quadB, *quadB, maskShuffleCross2)
         ),
         _mm_mul_ps
         (
            _mm_shuffle_ps(*quadA, *quadA, maskShuffleCross2),
            _mm_shuffle_ps(*quadB, *quadB, maskShuffleCross1)
         )
      )
   );
}

As you can see, there are four _mm_shuffle_ps's in there, and I wondered if I could replace them with a combination of _mm_unpackhi_ps and _mm_unpacklo_ps which return a2 a3 b2 b3 and a0 a1 b0 b1 respectively and are slightly faster.

I couldn't figure it out on paper, but I thought of a solution. What if let the computer bruteforce the steps required? Just recursively step through the different options and see what gives the correct answer.

I got it work with multiply, it returns this when I want it to return (3, 12, 27, 0):

startA = _mm_set_ps(1.00, 2.00, 3.00, 0.00);
startB = _mm_set_ps(3.00, 3.00, 3.00, 0.00);
result0 = _mm_mul_ps(startA, startB);
// (3.00, 6.00, 9.00, 0.00)
result1 = _mm_mul_ps(startA, result0);
// (3.00, 12.00, 27.00, 0.00)

Very nice, if I say so myself.

However, when I wanted to implement divide I stumbled on a problem. Multiply doesn't just have to call multiply, it also has to call divide. Okay, so we put divide above multiply. But divide doesn't just have to call divide, it also has to call multiply, which is lower in the script, so it doesn't exist yet.

I started with an empty console application in Visual C++ and put everything in QuadTests.cpp.

How do I make sure these two functions can call each other?

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蓝戈者 2024-08-10 21:05:23

只是为了确认,您的问题是像这样排列的函数不起作用,因为在您从 getFoo 调用它时尚未声明 doStuff

int getFoo(int bar) {
    doStuff(bar + 1);
}

int doStuff(bar) {
    if (bar == 2) {
        return getFoo(bar);
    }

    return bar * 8;
}

要解决此问题,您需要对 int doStuff(int) 进行前向声明 。通常,这是通过头文件完成的——无论哪种方式,您只需要添加如下内容:

// #includes, etc. go here

int doStuff(int);
int getFoo(int);

// methods follow

Just to confirm, your problem is that functions arranged like this don't work, because doStuff isn't declared by the time you call it from getFoo:

int getFoo(int bar) {
    doStuff(bar + 1);
}

int doStuff(bar) {
    if (bar == 2) {
        return getFoo(bar);
    }

    return bar * 8;
}

To fix this, you need to make a forward declaration of int doStuff(int). Often, this is done with a header file -- either way, you just need to add something like this:

// #includes, etc. go here

int doStuff(int);
int getFoo(int);

// methods follow
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文