有效评估循环 cf 分支预测内的 max(a,b)？

发布于 2024-11-05 15:35:27 字数 124 浏览 5 评论 0原文

在 C 中的 for 循环内计算 2 个浮点数的最大值的有效方法是什么，而不使用可能使管道停顿的逻辑语句，例如 a >;乙？一个：b？

我正在处理巨大的 3D 数组并进行大量的循环迭代。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别理我 2024-11-12 15:35:27

检查你的编译器输出的内容，它可能已经是“最佳”了。例如，

float foo(float a, float b)
{
    return (a>b?a:b);
}

使用 GCC 4.5 编译，-O3 在 x86_64 上生成此程序集：

Disassembly of section .text:

0000000000000000 <foo>:
   0:   f3 0f 5f c1             maxss  %xmm1,%xmm0
   4:   c3                      retq

即编译器非常了解您的目标指令集以及代码的语义。让它发挥作用。

Check what your compiler outputs, it's probably "optimal" already. For instance,

float foo(float a, float b)
{
    return (a>b?a:b);
}

Compiled with GCC 4.5, -O3, generates this assembly on x86_64:

Disassembly of section .text:

0000000000000000 <foo>:
   0:   f3 0f 5f c1             maxss  %xmm1,%xmm0
   4:   c3                      retq

i.e. the compiler knows a lot about the instruction set you're targeting, and the semantics of your code. Let it do its job.

回复收藏 0 原文

巴黎夜雨 2024-11-12 15:35:27

好吧，我不认为这比使用分支更快，但这似乎有效：

#include <stdio.h>

#define FasI(f)  (*((int *) &(f)))
#define FasUI(f) (*((unsigned int *) &(f)))

#define lt0(f)  (FasUI(f) > 0x80000000U)
#define le0(f)  (FasI(f) <= 0)
#define gt0(f)  (FasI(f) > 0)
#define ge0(f)  (FasUI(f) <= 0x80000000U)


int main()
{
    float a=11.0,b=4.6;
    float x=a-b,y=b-a;

    printf("%f\n",lt0((y))*a+lt0((x))*b);
    return 0;
}

定义取自聚合魔法算法

Well, I don't think this is faster than using branching but this seems to work:

#include <stdio.h>

#define FasI(f)  (*((int *) &(f)))
#define FasUI(f) (*((unsigned int *) &(f)))

#define lt0(f)  (FasUI(f) > 0x80000000U)
#define le0(f)  (FasI(f) <= 0)
#define gt0(f)  (FasI(f) > 0)
#define ge0(f)  (FasUI(f) <= 0x80000000U)


int main()
{
    float a=11.0,b=4.6;
    float x=a-b,y=b-a;

    printf("%f\n",lt0((y))*a+lt0((x))*b);
    return 0;
}

The defines were taken from The Aggregate Magic Algorithms

回复收藏 0 原文

~没有更多了~