求一个数的绝对值最快的方法是什么

发布于 2024-07-15 22:14:04 字数 264 浏览 9 评论 0原文

实现返回数字绝对值的运算的最快方法是什么?

x=root(x²)

或者

if !isPositive(x):
    x=x*(-1)

实际上这个问题可以翻译为,if 有多快(以及为什么)。

我的大学编程教授总是告诉我要避免 if 因为它们非常慢,但我总是忘记问有多慢以及为什么。 这里有人知道吗?

Which is the fastest way to implement an operation that returns the absolute value of a number?

x=root(x²)

or

if !isPositive(x):
    x=x*(-1)

Actually this question can be translated as, how fast is an if (and why please).

My college programing professors always told me to avoid ifs for they are extremely slow, but I always forgot to ask how slow and why. Does anybody here know?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(16

老街孤人 2024-07-22 22:14:05

我想知道这个解决方案是否有问题。
没有

  • 分支,
  • 没有位宽相关的移位,
  • 没有位旋转,
  • 没有体系结构依赖,
  • 没有编译器依赖
  • 可选:INT_MIN没有未定义的行为,

也许指令太多了?

我的解决方案

xabs = (x < 0)*(-x) + (x >=0)*x
  • 2 次整数比较
  • 2 次乘法

旧解决方案

xtest = (x < 0)*x;           // xtest = x if is negative, otherwise zero
xabs = (x - xtest) - xtest;  // Order of instructions taken into account

取消INT_MIN的未定义行为

可以添加针对未定义行为(取消INT_MIN)的检查,
如果你的值不限制在算法之前的某个地方。
但这使得事情变得更加复杂。
也许,有人找到了更简单的逻辑。

xabs =   (x < -INT_MAX)*INT_MAX            //  x < -INT_MAX < 0  --> xabs = INT_MAX
       + ((x >= -INT_MAX)&&(x < 0))*(-x)   // -INT_MAX =< x < 0  --> xabs = -x
       + (x >= 0)*x                        // 0 <= x             --> xabs = +x
  • 5 次整数比较
  • 3 次整数乘法

不幸的是,我从未进行过速度比较。
所以我不知道它是否真的比

if ( x < 0 )
{
  if ( x >= -INT_MAX )
  {
    x = -x;
  }
  else
  {
    x = INT_MAX;
  }
}
     

I wonder, if something is wrong with this solution.
There is

  • no branching
  • no bitwidth dependent shifting
  • no bit twiddling
  • no architecture dependency
  • no compiler dependency
  • optionally: no undefined behaviour for INT_MIN

Maybe too much instructions?

My solution

xabs = (x < 0)*(-x) + (x >=0)*x
  • 2 integer comparisons
  • 2 multiplications

Old solution

xtest = (x < 0)*x;           // xtest = x if is negative, otherwise zero
xabs = (x - xtest) - xtest;  // Order of instructions taken into account

Undefined behaviour of negating INT_MIN

A check against undefined behaviour (negation of INT_MIN) can be added,
if your value is not limited in the algorithm somewhere before.
But that makes it a little more complicated.
Maybe, someone finds a simpler logic.

xabs =   (x < -INT_MAX)*INT_MAX            //  x < -INT_MAX < 0  --> xabs = INT_MAX
       + ((x >= -INT_MAX)&&(x < 0))*(-x)   // -INT_MAX =< x < 0  --> xabs = -x
       + (x >= 0)*x                        // 0 <= x             --> xabs = +x
  • 5 integer comparisons
  • 3 integer multiplications

Unfortunately, I never did a speed comparison.
So I don't know if it is really faster than

if ( x < 0 )
{
  if ( x >= -INT_MAX )
  {
    x = -x;
  }
  else
  {
    x = INT_MAX;
  }
}
     
橘香 2024-07-22 22:14:05

对于负数列表:

如果内存中存储了零,只需使用 0 - x,其中 x 是负数。

或者,如果内存中没有存储零:

xxx,其中x 是负数。

或者,为了清楚起见,使用括号:

(x) - (x) - (x) => (-n) - (-n) - (- n),其中x = -n

即从其自身中减去负数得到零,然后从零中减去它。

For a list of negative numbers:

if you have zero stored in memory, simply use 0 - x, where x is the negative number.

Or if you do not have zero stored in memory:

x-x-x, where x is the negative number.

Or, with brackets for clarity:

(x) - (x) - (x) => (-n) - (-n) - (-n), where x = -n

i.e. subtract the negative number from itself to get zero, then subtract it from zero.

才能让你更想念 2024-07-22 22:14:04

有一个很棒的技巧可以在不使用 if 语句的情况下计算 2 补码整数的绝对值。 理论上来说,如果该值为负数,您需要切换这些位并加一,否则您需要按原样传递这些位。 XOR 1 恰好切换 A,而 XOR 0 恰好使 A 保持不变。 所以你想做这样的事情:

  uint32_t temp = value >> 31;     // make a mask of the sign bit
  value ^= temp;                   // toggle the bits if value is negative
  value += temp & 1;               // add one if value was negative

原则上,你可以用最少三个汇编指令来完成它(没有分支)。 您可能会认为通过 math.h 获得的 abs() 函数能够以最佳方式完成此任务。

没有分支==更好的性能。 与上面 @paxdiablo 的响应相反,这在深层管道中确实很重要,其中代码中的分支越多,分支预测器出错并必须回滚的可能性就越大,等等。如果您避免在以下位置分支有可能,事情会在你的核心中全速前进:)。

There is a great trick to calculate the absolute value of a 2s-complement integer without using an if statement. The theory goes, if the value is negative you want to toggle the bits and add one, otherwise you want to pass the bits through as is. A XOR 1 happens to toggle A and A XOR 0 happens to leave A intact. So you want do something like this:

  uint32_t temp = value >> 31;     // make a mask of the sign bit
  value ^= temp;                   // toggle the bits if value is negative
  value += temp & 1;               // add one if value was negative

In principle, you can do it in as few as three assembly instructions (without a branch). And you'd like to think that the abs() function that you get with math.h does it optimally.

No branches == better performance. Contrary to @paxdiablo's response above, this really matters in deep pipelines where the more branches you have in your code, the more likely you are to have your branch predictor get it wrong and have to roll-back, etc. If you avoid branching where possible, things will keep moving along at full throttle in your core :).

养猫人 2024-07-22 22:14:04

条件比普通算术运算慢,但比计算平方根这样愚蠢的事情快得多。

我组装时的经验法则:

  • 整数或按位运算:1 个周期
  • 浮点加/减/乘:4 个周期
  • 浮点 div:约 30 个周期
  • 浮点求幂:约 200 个周期
  • 浮点 sqrt:约 60 个周期取决于实现
  • 条件分支:avg。 10 个周期,如果预测正确则更好,如果预测错误则更糟

Conditionals are slower than plain arithmetic operations, but much, much faster than something as silly as calculating the square root.

Rules of thumb from my assembly days:

  • Integer or bitwise op: 1 cycle
  • Floating-point add/sub/mul: 4 cycles
  • Floating-point div: ~30 cycles
  • Floating-point exponentiation: ~200 cycles
  • Floating-point sqrt: ~60 cycles depending on implementation
  • Conditional branch: avg. 10 cycles, better if well-predicted, much worse if mispredicted
橘虞初梦 2024-07-22 22:14:04

计算平方根可能是你能做的最糟糕的事情之一,因为它真的很慢。 通常有一个库函数可以做到这一点; 类似 Math.Abs​​() 的东西。 乘以-1也是不必要的; 只需返回-x。 因此,一个好的解决方案如下。

(x >= 0) ? x : -x

编译器可能会将其优化为单个指令。 由于执行管道很长,现代处理器上的条件可能非常昂贵 - 如果错误预测分支并且处理器开始从错误的代码路径执行指令,则必须丢弃计算。 但由于提到了编译器优化,在这种情况下您无需关心。

Calculating the square root is probably one of the worst things you could do because it is really slow. Usually there is a library function for doing this; something like Math.Abs(). Multiplying with -1 is also unnecessary; just return -x. So a good solution would be the following.

(x >= 0) ? x : -x

The compiler will probably optimize this to a single instructions. Conditions may be quite expensive on modern processors because of the long execution pipelines -the calculations must be thrown away if a branch was misspredicted and the processor started executing the instructions from the wrong code path. But because of the mentioned compiler optimization you need not care in this case.

心舞飞扬 2024-07-22 22:14:04

为了完整起见,以下是在 C++ 中对 x86 系统上的 IEEE 浮点数执行此操作的方法:

*(reinterpret_cast<uint32_t*>(&foo)) &= 0xffffffff >> 1;

For completeness, here's a way to do it for IEEE floats on x86 systems in C++:

*(reinterpret_cast<uint32_t*>(&foo)) &= 0xffffffff >> 1;
蘑菇王子 2024-07-22 22:14:04

求一个数的绝对值最快的方法

我认为“正确”答案实际上并不在这里。 获得绝对数字的最快方法可能是使用 Intel Intrinsic。 请参阅 https://software.intel.com/sites/landingpage/IntrinsicsGuide/ 和查找“vpabs”(或其他为您的 CPU 完成工作的内在函数)。 我很确定它会击败这里的所有其他解决方案。

如果您不喜欢内在函数(或不能使用它们或...),您可能需要检查编译器是否足够智能来确定是否调用“本机绝对值”(std::abs< C++ 中的 /code> 或 C# 中的 Math.Abs​​(x) )将自动更改为内在函数 - 基本上涉及查看反汇编(编译)代码。 如果您使用 JIT,请确保未禁用 JIT 优化。

如果这也不能为您提供优化的说明,您可以使用此处描述的方法: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs

Which is the fastest way to get the absolute value of a number

I think the "right" answer isn't here actually. The fastest way to get the absolute number is probably to use the Intel Intrinsic. See https://software.intel.com/sites/landingpage/IntrinsicsGuide/ and look for 'vpabs' (or another intrinsic that does the job for your CPU). I'm pretty sure it'll beat all the other solutions here.

If you don't like intrinsics (or cannot use them or ...), you might want to check if the Compiler is smart enough to figure out if a call to 'native absolute value' (std::abs in C++ or Math.Abs(x) in C#) will change automagically into the intrinsic - basically that involves looking at the disassembled (compiled) code. If you're in a JIT, be sure that JIT optimizations aren't disabled.

If that also doesn't give you the optimized instructions, you can use the method described here: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs .

感悟人生的甜 2024-07-22 22:14:04

与平方根相比,if 变体几乎肯定会令人眼花缭乱地快,因为它通常会转换为机器代码级别的条件跳转指令(在计算表达式之后) ,这可能很复杂,但在本例中并非如此,因为它是一个小于 0 的简单检查)。

取数字的平方根可能会慢得多(例如,牛顿方法会在机器代码级别使用许多 if 语句)。

混乱的可能来源是 if 总是导致以非顺序方式更改指令指针。 这可能会减慢将指令预取到管道中的处理器的速度,因为当地址意外更改时,它们必须重新填充管道。

然而,与执行平方根运算(而不是简单的检查和求反)相比,其成本微乎其微。

The if variant will almost certainly be blindingly fast compared to the square root, since it normally translates to a conditional jump instruction at the machine code level (following the evaluation of the expression, which may be complex, but not in this case since it's a simple check for less than 0).

Taking the square root of a number is likely to be much slower (Newton's method, for example, would use many many if statements at the machine code level).

The likely source of confusion is the fact that if invariably lead to changing the instruction pointer in a non-sequential manner. This can slow down processors that pre-fetch instructions into a pipeline since they have to re-populate the pipeline when the address changes unexpectedly.

However, the cost of that would be minuscule compared to performing a square root operation as opposed to a simple check-and-negate.

此生挚爱伱 2024-07-22 22:14:04

模运算用于求余数,即绝对值。 我修改了问题,因为它应该是 if !pos(x) then x = x*-1。 (没有丢失)

我不会担心 if 语句的效率。 相反,应该关注代码的可读性。 如果您发现存在效率问题,请重点分析代码以找到真正的瓶颈。

如果您想在编码时关注效率,您只需担心算法的大 O 复杂性即可。

如果语句非常有效,它会评估任何表达式,然后根据该表达式简单地更改程序计数器健康)状况。 程序计数器存储下一条要执行的指令的地址。

乘以 -1 并检查值是否大于 0 都可以简化为单个汇编指令。

求一个数的根并首先对该数进行平方肯定比带有否定的 if 需要更多的运算。

The modulo operation is used to find a remainder, you mean absolute value. I modified the question because it should be if !pos(x) then x = x*-1. (not was missing)

I wouldn't worry about the efficiency of an if statement. Instead focus on the readability of your code. If you identify that there is an efficiency problem, then focus on profiling your code to find real bottlenecks.

If you want to keep an eye out for efficiency while you code, you should only worry about the big-O complexity of your algorithms.

If statements are very efficient, it evaluates whatever expression and then simply changes the program counter based on that condition. The program counter stores the address of the next instruction to be executed.

Mulitplication by -1 and checking if a value is greater than 0 both can be reduced to a single assembly instruction.

Finding the root of a number and squaring that number first is definitely more operations than the if with a negation.

花心好男孩 2024-07-22 22:14:04

计算平方根所需的时间比计算条件所需的时间长得多。 如果你被教导要避免使用条件语句,因为它们速度很慢,那么你就被误导了。 它们比添加或减去整数或位移位等琐碎操作要慢得多 - 这就是为什么只有在执行此类琐碎操作时展开循环才会有好处。 但从总体上看,条件条件是好的、快的,而不是坏的、慢的。 做一些像调用函数或计算平方根这样复杂的事情来避免条件语句是疯狂的。

另外,为什么不做 (x = x * -1) 而不是 (x = 0 - x)? 也许编译器会对它们进行相同的优化,但是第二个不是更简单吗?

The time taken to do a square root is much greater than the time taken to do an conditional. If you have been taught to avoid conditionals because they are slow, then you have been misinformed. They are a good deal slower than trivial operations like adding or subtracting integers or bit shifting - which is why unrolling loops can be of benefit only if you are doing such trivial operations. But in the grand scheme of things conditionals are good and fast, not bad and slow. To do something as complicated as call a function or calculate a square root to avoid a conditional statement is crazy.

Also, instead of (x = x * -1) why not do (x = 0 - x)? Maybe the compiler will optimize them the same, but isn't the second one simpler anyway?

夜唯美灬不弃 2024-07-22 22:14:04

你用的是8086汇编吗? ;-)

                ; abs value of AX
   cwd          ; replicate the high bit into DX
   xor  ax, dx  ; take 1's complement if negative; no change if positive
   sub  ax, dx  ; AX is 2's complement if it was negative The standard
                : absolute value method works on any register but is much
                ; slower:

   or   bx, bx  ; see if number is negative
   jge  notneg  ; if it is negative...
   neg  bx      ; ...make it positive
notneg:         ; jump to here if positive

(公然窃取

Are you using 8086 assembly? ;-)

                ; abs value of AX
   cwd          ; replicate the high bit into DX
   xor  ax, dx  ; take 1's complement if negative; no change if positive
   sub  ax, dx  ; AX is 2's complement if it was negative The standard
                : absolute value method works on any register but is much
                ; slower:

   or   bx, bx  ; see if number is negative
   jge  notneg  ; if it is negative...
   neg  bx      ; ...make it positive
notneg:         ; jump to here if positive

(flagrantly stolen)

用心笑 2024-07-22 22:14:04

您可以尝试使用单个 AND 运算符作为掩码。

这是一个伪代码作为示例

i8 num = 10001101 = -13
u8 mask = 01111111 = 127;
i8 res = num & mask = 00001101 = 13

我相信这是在计算机上计算绝对值的最快方法。 如我错了请纠正我。

You can try to use a single AND operator as a mask.

Here is a pseudocode as an example

i8 num = 10001101 = -13
u8 mask = 01111111 = 127;
i8 res = num & mask = 00001101 = 13

.

I believe this is the fastest way to calculate the absolute value on a computer. Correct me if I'm wrong.

木格 2024-07-22 22:14:04

如果您只是比较两个数字的绝对值(例如,比较后不需要两个数字的绝对值),那么只需将两个值平方以使两个值都为正值(删除每个值的符号),则较大的平方将是大于较小的正方形。

If you are simply comparing the absolute values of two numbers (e.g. you don't need the absolute value of either after the comparison) then just square both values to make both positive (remove the sign of each value), the larger square will be greater than the smaller square.

寂寞花火° 2024-07-22 22:14:04

更快的速度很大程度上取决于您的目标编译器和 CPU。 在大多数 CPU 和所有编译器上 x = (x>=0)? x:-x; 是获得绝对值的最快方法,但事实上,通常标准函数已经提供了这种解决方案(例如 fabs())。 它被编译为比较后跟条件赋值指令(CMOV),而不是条件跳转。 但有些平台缺乏该指令。 尽管如此,Intel(而不是Microsoft或GCC)编译器会自动将if()转换为条件赋值,甚至会尝试优化循环(如果可能)。

如果 CPU 使用统计预测,分支代码通常比条件分配慢。 如果操作重复多次并且条件结果不断变化,则 if() 平均可能会更慢。 像 Intel 这样的 CPU 会开始计算两个分支,并会丢弃无效的分支,以防 if() 主体很大或可能很关键的大量周期。

现代 Intel CPU 上的 sqr() 和 sqrt() 是单个内置指令,速度并不慢,但它们不精确,并且加载寄存器也需要时间。

相关问题:为什么CPU分支指令很慢?

最有可能的是,教授希望学生对这个问题进行研究,如果学生学会独立思考并寻找额外的资源,那么这是一个半挑衅性的问题/任务,只会有好处。

What is faster is very dependent on what compiler and what CPU you're targeting. On most CPUs and all compilers x = (x>=0)? x:-x; is fastest way to get absolute value, but in fact, often standard functions already offer this solution (e.g. fabs()). It is compiled into compare followed by conditional assignment instruction(CMOV), not into conditional jump. Some platforms lack of that instruction though. Although, Intel (but not Microsoft or GCC) compiler would automatically convert if() into conditional assignment, and even would try optimize cycles (if possible).

Branching code in general is slower than conditional assignment, if CPU uses statistical prediction. if() might be slower in average if operation gets repeated multiple times and result of condition is constantly changing. CPUs like Intel, would start to calculate both branches, and would drop the invalid one, In case of large if() bodies or large number of cycles that might be critical.

sqr() and sqrt() on modern Intel CPUs are single built-in instruction and aren't slow, but they are imprecise, and loading registers would take time as well.

Related question: Why is a CPU branch instruction slow?

Most likely, professor wanted student to do research on this matter, it's semi-provocative question\task that would do only good, if student would learn think independently and look for additional sources.

趴在窗边数星星i 2024-07-22 22:14:04

我正在用 C 语言为 8088/8086 进行一些复古图形编程,并且调用 abs() 非常耗时,因此我将其替换为:

/* assuming 'i' is int; this WILL NOT WORK on floating point */
if (i < 0) {
    i = ~i + 1;
}

速度更快的原因是因为它本质上交换了 JNE的汇编中调用。 调用方法会更改几个寄存器,推送更多寄存器,将参数推送到堆栈上,并且可以刷新预取队列。 另外,这些操作需要在函数结束时反转,所有这些对于 CPU 来说都是非常昂贵的。

I'm doing some retro graphics programming in C for 8088/8086 and calling abs() is time consuming so I've replaced it with:

/* assuming 'i' is int; this WILL NOT WORK on floating point */
if (i < 0) {
    i = ~i + 1;
}

The reason this is faster is because it essentially trades a CALL in assembly for a JNE. Calling a method changes a couple of registers, pushes several more, pushes arguments onto the stack, and can flush the prefetch queue. Plus these actions need to be reversed at the end of the function and all of this is very expensive to the CPU.

中二柚 2024-07-22 22:14:04

为了完整起见,如果您正在处理浮点数,您始终可以执行类似 n * sign(n) 的操作,其中 sign 是一个函数,如果数字为正数,如果为负数,则为 -1。 在 C 中,这类似于 copysign(1.0, n)(n > 0) - (n < 0)

如今,大多数机器都使用 IEEE 754 作为浮点格式,因此您可以直接清除符号位:

float fabs(float x) {
    char *c = &x;
    c[0] &= 7;
    return *(float *)c;
}

鉴于 abs 函数可能会执行此操作,因此最好的选择是在可用时使用它。 如果幸运的话,该函数将是几条指令,并且将被内联。

For completeness, if you are dealing with floating point numbers, you can always do something like n * sign(n), where sign is a function that returns +1 if the number is positive, -1 if negative. In C this would be something like copysign(1.0, n) or (n > 0) - (n < 0).

Most machines use IEEE 754 as their floating point format these days, so you can clear the sign bit directly:

float fabs(float x) {
    char *c = &x;
    c[0] &= 7;
    return *(float *)c;
}

Given that the abs function likely does this exact thing, your best bet is to use it when available. If you are lucky the function will be a couple of instructions, and will be inlined.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文