Linux 内核中可能/不可能的宏如何工作以及它们的好处是什么?

发布于 2024-07-04 21:41:55 字数 431 浏览 9 评论 0原文

我一直在挖掘Linux内核的某些部分,发现了这样的调用:

if (unlikely(fd < 0))
{
    /* Do something */
}

或者

if (likely(!err))
{
    /* Do something */
}

我找到了它们的定义:

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

我知道它们是为了优化,但是它们是如何工作的? 使用它们预计性能/尺寸会降低多少? 至少在瓶颈代码中(当然是在用户空间中)是否值得这么麻烦(并且可能会失去可移植性)。

I've been digging through some parts of the Linux kernel, and found calls like this:

if (unlikely(fd < 0))
{
    /* Do something */
}

or

if (likely(!err))
{
    /* Do something */
}

I've found the definition of them:

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

I know that they are for optimization, but how do they work? And how much performance/size decrease can be expected from using them? And is it worth the hassle (and losing the portability probably) at least in bottleneck code (in userspace, of course).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

胡渣熟男 2024-07-11 21:41:56

它们是编译器在分支上生成提示前缀的提示。 在 x86/x64 上,它们占用一个字节,因此每个分支最多会增加一个字节。 至于性能,它完全取决于应用程序——在大多数情况下,处理器上的分支预测器现在会忽略它们。

编辑:忘记了他们真正可以提供帮助的一个地方。 它可以允许编译器对控制流图重新排序,以减少“可能”路径所采用的分支数量。 这可以显着改进检查多个退出情况的循环。

They're hints to the compiler to generate the hint prefixes on branches. On x86/x64, they take up one byte, so you'll get at most a one-byte increase for each branch. As for performance, it entirely depends on the application -- in most cases, the branch predictor on the processor will ignore them, these days.

Edit: Forgot about one place they can actually really help with. It can allow the compiler to reorder the control-flow graph to reduce the number of branches taken for the 'likely' path. This can have a marked improvement in loops where you're checking multiple exit cases.

乱了心跳 2024-07-11 21:41:56

这些是 GCC 函数,供程序员向编译器提示给定表达式中最可能的分支条件是什么。 这允许编译器构建分支指令,以便最常见的情况需要执行最少的指令。

分支指令的构建方式取决于处理器架构。

These are GCC functions for the programmer to give a hint to the compiler about what the most likely branch condition will be in a given expression. This allows the compiler to build the branch instructions so that the most common case takes the fewest number of instructions to execute.

How the branch instructions are built are dependent upon the processor architecture.

近箐 2024-07-11 21:41:56

根据 Cody 的评论,这与 Linux 无关,而是对编译器的一个提示。 发生的情况取决于体系结构和编译器版本。

Linux 中的这一特殊功能在驱动程序中有些被误用。 正如 osgxhot 属性的语义,块中调用的任何 hotcold 函数都可以自动提示该条件是否可能。 例如,dump_stack() 被标记为 cold,因此这是多余的,

 if(unlikely(err)) {
     printk("Driver error found. %d\n", err);
     dump_stack();
 }

gcc 的未来版本可能会根据这些提示选择性地内联函数。 也有人建议它不是 boolean,而是 最有可能 等中的分数。通常,应该优先使用某种替代机制,例如 冷。 除了热路径之外,没有理由在任何地方使用它。 编译器在一种架构上执行的操作可能在另一种架构上完全不同。

As per the comment by Cody, this has nothing to do with Linux, but is a hint to the compiler. What happens will depend on the architecture and compiler version.

This particular feature in Linux is somewhat mis-used in drivers. As osgx points out in semantics of hot attribute, any hot or cold function called with in a block can automatically hint that the condition is likely or not. For instance, dump_stack() is marked cold so this is redundant,

 if(unlikely(err)) {
     printk("Driver error found. %d\n", err);
     dump_stack();
 }

Future versions of gcc may selectively inline a function based on these hints. There have also been suggestions that it is not boolean, but a score as in most likely, etc. Generally, it should be preferred to use some alternate mechanism like cold. There is no reason to use it in any place but hot paths. What a compiler will do on one architecture can be completely different on another.

虐人心 2024-07-11 21:41:56

这些宏向编译器提示分支的走向。 这些宏将扩展为 GCC 特定扩展(如果可用)。

GCC 使用这些来优化分支预测。 例如,如果您有类似以下的内容

if (unlikely(x)) {
  dosomething();
}

return x;

那么它可以将此代码重组为更像这样的内容:

if (!x) {
  return x;
}

dosomething();
return x;

这样做的好处是,当处理器第一次采用分支时,会产生很大的开销,因为它可能已经推测性加载并进一步执行代码。 当它确定要采用该分支时,它必须使该分支无效,并从分支目标开始。

大多数现代处理器现在都具有某种分支预测功能,但这仅在您之前已经经历过分支并且分支仍在分支预测缓存中时才有帮助。

在这些场景中,编译器和处理器可以使用许多其他策略。 您可以在 Wikipedia 找到有关分支预测器如何工作的更多详细信息: http://en.wikipedia.org/wiki/分支预测器

These are macros that give hints to the compiler about which way a branch may go. The macros expand to GCC specific extensions, if they're available.

GCC uses these to to optimize for branch prediction. For example, if you have something like the following

if (unlikely(x)) {
  dosomething();
}

return x;

Then it can restructure this code to be something more like:

if (!x) {
  return x;
}

dosomething();
return x;

The benefit of this is that when the processor takes a branch the first time, there is significant overhead, because it may have been speculatively loading and executing code further ahead. When it determines it will take the branch, then it has to invalidate that, and start at the branch target.

Most modern processors now have some sort of branch prediction, but that only assists when you've been through the branch before, and the branch is still in the branch prediction cache.

There are a number of other strategies that the compiler and processor can use in these scenarios. You can find more details on how branch predictors work at Wikipedia: http://en.wikipedia.org/wiki/Branch_predictor

流心雨 2024-07-11 21:41:56

它们使编译器在硬件支持的情况下发出适当的分支提示。 这通常只意味着调整指令操作码中的一些位,因此代码大小不会改变。 CPU将从预测的位置开始获取指令,如果到达分支时发现错误,则刷新管道并重新开始; 在提示正确的情况下,这将使分支更快——具体快多少取决于硬件; 这对代码性能的影响程度取决于时间提示正确的比例。

例如,在 PowerPC CPU 上,未提示的分支可能需要 16 个周期,正确提示的分支可能需要 8 个周期,而错误提示的分支可能需要 24 个周期。在最里面的循环中,良好的提示可以产生巨大的差异。

可移植性并不是真正的问题 - 大概定义是在每个平台的标头中; 对于不支持静态分支提示的平台,您可以简单地将“可能”和“不可能”定义为空。

They cause the compiler to emit the appropriate branch hints where the hardware supports them. This usually just means twiddling a few bits in the instruction opcode, so code size will not change. The CPU will start fetching instructions from the predicted location, and flush the pipeline and start over if that turns out to be wrong when the branch is reached; in the case where the hint is correct, this will make the branch much faster - precisely how much faster will depend on the hardware; and how much this affects the performance of the code will depend on what proportion of the time hint is correct.

For instance, on a PowerPC CPU an unhinted branch might take 16 cycles, a correctly hinted one 8 and an incorrectly hinted one 24. In innermost loops good hinting can make an enormous difference.

Portability isn't really an issue - presumably the definition is in a per-platform header; you can simply define "likely" and "unlikely" to nothing for platforms that do not support static branch hints.

沙与沫 2024-07-11 21:41:56
long __builtin_expect(long EXP, long C);

这个结构告诉编译器表达式 EXP
最有可能的值为 C。返回值为 EXP。
__builtin_expect 用于条件语句
表达。 在几乎所有情况下,它都会被用在
布尔表达式的上下文,在这种情况下它是很多
更方便地定义两个辅助宏:

#define unlikely(expr) __builtin_expect(!!(expr), 0)
#define likely(expr) __builtin_expect(!!(expr), 1)

然后可以使用这些宏,如下所示:

if (likely(a > 1))

参考

long __builtin_expect(long EXP, long C);

This construct tells the compiler that the expression EXP
most likely will have the value C. The return value is EXP.
__builtin_expect is meant to be used in an conditional
expression. In almost all cases will it be used in the
context of boolean expressions in which case it is much
more convenient to define two helper macros:

#define unlikely(expr) __builtin_expect(!!(expr), 0)
#define likely(expr) __builtin_expect(!!(expr), 1)

These macros can then be used as in:

if (likely(a > 1))

Reference

枕花眠 2024-07-11 21:41:56

(一般评论 - 其他答案涵盖了细节)

您没有理由因为使用它们而失去可移植性。

您始终可以选择创建一个简单的零效果“内联”或宏,以便您可以使用其他编译器在其他平台上进行编译。

如果您使用其他平台,您将无法获得优化的好处。

(general comment - other answers cover the details)

There's no reason that you should lose portability by using them.

You always have the option of creating a simple nil-effect "inline" or macro that will allow you to compile on other platforms with other compilers.

You just won't get the benefit of the optimization if you're on other platforms.

所有深爱都是秘密 2024-07-11 21:41:56

在许多linux版本中,您都可以在 /usr/linux/ 中找到 compiler.h ,您可以将其包含起来以方便使用。 另一种观点是,unlikely() 比 likely() 更有用,因为

if ( likely( ... ) ) {
     doSomething();
}

它也可以在许多编译器中进行优化。

顺便说一句,如果你想观察代码的详细行为,你可以简单地执行以下操作:

gcc -c test.c
objdump -d test.o > obj.s

然后,打开 obj.s,你就可以找到答案。

In many linux release, you can find compiler.h in /usr/linux/ , you can include it for use simply. And another opinion, unlikely() is more useful rather than likely(), because

if ( likely( ... ) ) {
     doSomething();
}

it can be optimized as well in many compiler.

And by the way, if you want to observe the detail behavior of the code, you can do simply as follow:

gcc -c test.c
objdump -d test.o > obj.s

Then, open obj.s, you can find the answer.

飘过的浮云 2024-07-11 21:41:56

让我们反编译看看 GCC 4.8 做了什么

没有 __builtin_expect

#include "stdio.h"
#include "time.h"

int main() {
    /* Use time to prevent it from being optimized away. */
    int i = !time(NULL);
    if (i)
        printf("%d\n", i);
    puts("a");
    return 0;
}

使用 GCC 4.8.2 x86_64 Linux 编译和反编译:

gcc -c -O3 -std=gnu11 main.c
objdump -dr main.o

输出:

0000000000000000 <main>:
   0:       48 83 ec 08             sub    $0x8,%rsp
   4:       31 ff                   xor    %edi,%edi
   6:       e8 00 00 00 00          callq  b <main+0xb>
                    7: R_X86_64_PC32        time-0x4
   b:       48 85 c0                test   %rax,%rax
   e:       75 14                   jne    24 <main+0x24>
  10:       ba 01 00 00 00          mov    $0x1,%edx
  15:       be 00 00 00 00          mov    $0x0,%esi
                    16: R_X86_64_32 .rodata.str1.1
  1a:       bf 01 00 00 00          mov    $0x1,%edi
  1f:       e8 00 00 00 00          callq  24 <main+0x24>
                    20: R_X86_64_PC32       __printf_chk-0x4
  24:       bf 00 00 00 00          mov    $0x0,%edi
                    25: R_X86_64_32 .rodata.str1.1+0x4
  29:       e8 00 00 00 00          callq  2e <main+0x2e>
                    2a: R_X86_64_PC32       puts-0x4
  2e:       31 c0                   xor    %eax,%eax
  30:       48 83 c4 08             add    $0x8,%rsp
  34:       c3                      retq

中的指令顺序内存没有变化:首先是 printf,然后是 putsretq 返回。

使用 __builtin_expect

现在将 if (i) 替换为:

if (__builtin_expect(i, 0))

,我们得到:

0000000000000000 <main>:
   0:       48 83 ec 08             sub    $0x8,%rsp
   4:       31 ff                   xor    %edi,%edi
   6:       e8 00 00 00 00          callq  b <main+0xb>
                    7: R_X86_64_PC32        time-0x4
   b:       48 85 c0                test   %rax,%rax
   e:       74 11                   je     21 <main+0x21>
  10:       bf 00 00 00 00          mov    $0x0,%edi
                    11: R_X86_64_32 .rodata.str1.1+0x4
  15:       e8 00 00 00 00          callq  1a <main+0x1a>
                    16: R_X86_64_PC32       puts-0x4
  1a:       31 c0                   xor    %eax,%eax
  1c:       48 83 c4 08             add    $0x8,%rsp
  20:       c3                      retq
  21:       ba 01 00 00 00          mov    $0x1,%edx
  26:       be 00 00 00 00          mov    $0x0,%esi
                    27: R_X86_64_32 .rodata.str1.1
  2b:       bf 01 00 00 00          mov    $0x1,%edi
  30:       e8 00 00 00 00          callq  35 <main+0x35>
                    31: R_X86_64_PC32       __printf_chk-0x4
  35:       eb d9                   jmp    10 <main+0x10>

printf (编译为 __printf_chk) 被移动到函数的最后,在 puts 和返回之后,以改进分支预测,如其他答案所述。

所以它基本上与:

int main() {
    int i = !time(NULL);
    if (i)
        goto printf;
puts:
    puts("a");
    return 0;
printf:
    printf("%d\n", i);
    goto puts;
}

此优化不是用-O0完成的。

但祝你好运,编写一个使用 __builtin_expect 比不使用 __builtin_expect 运行速度更快的示例,现在的 CPU 真的很智能。 我天真的尝试在这里

C++20 [[likely]][[unlikely]]

C++20 已标准化这些 C++ 内置函数:如何在中使用 C++20 的 likely/unlikely 属性if-else 语句 他们很可能(双关语!)做同样的事情。

Let's decompile to see what GCC 4.8 does with it

Without __builtin_expect

#include "stdio.h"
#include "time.h"

int main() {
    /* Use time to prevent it from being optimized away. */
    int i = !time(NULL);
    if (i)
        printf("%d\n", i);
    puts("a");
    return 0;
}

Compile and decompile with GCC 4.8.2 x86_64 Linux:

gcc -c -O3 -std=gnu11 main.c
objdump -dr main.o

Output:

0000000000000000 <main>:
   0:       48 83 ec 08             sub    $0x8,%rsp
   4:       31 ff                   xor    %edi,%edi
   6:       e8 00 00 00 00          callq  b <main+0xb>
                    7: R_X86_64_PC32        time-0x4
   b:       48 85 c0                test   %rax,%rax
   e:       75 14                   jne    24 <main+0x24>
  10:       ba 01 00 00 00          mov    $0x1,%edx
  15:       be 00 00 00 00          mov    $0x0,%esi
                    16: R_X86_64_32 .rodata.str1.1
  1a:       bf 01 00 00 00          mov    $0x1,%edi
  1f:       e8 00 00 00 00          callq  24 <main+0x24>
                    20: R_X86_64_PC32       __printf_chk-0x4
  24:       bf 00 00 00 00          mov    $0x0,%edi
                    25: R_X86_64_32 .rodata.str1.1+0x4
  29:       e8 00 00 00 00          callq  2e <main+0x2e>
                    2a: R_X86_64_PC32       puts-0x4
  2e:       31 c0                   xor    %eax,%eax
  30:       48 83 c4 08             add    $0x8,%rsp
  34:       c3                      retq

The instruction order in memory was unchanged: first the printf and then puts and the retq return.

With __builtin_expect

Now replace if (i) with:

if (__builtin_expect(i, 0))

and we get:

0000000000000000 <main>:
   0:       48 83 ec 08             sub    $0x8,%rsp
   4:       31 ff                   xor    %edi,%edi
   6:       e8 00 00 00 00          callq  b <main+0xb>
                    7: R_X86_64_PC32        time-0x4
   b:       48 85 c0                test   %rax,%rax
   e:       74 11                   je     21 <main+0x21>
  10:       bf 00 00 00 00          mov    $0x0,%edi
                    11: R_X86_64_32 .rodata.str1.1+0x4
  15:       e8 00 00 00 00          callq  1a <main+0x1a>
                    16: R_X86_64_PC32       puts-0x4
  1a:       31 c0                   xor    %eax,%eax
  1c:       48 83 c4 08             add    $0x8,%rsp
  20:       c3                      retq
  21:       ba 01 00 00 00          mov    $0x1,%edx
  26:       be 00 00 00 00          mov    $0x0,%esi
                    27: R_X86_64_32 .rodata.str1.1
  2b:       bf 01 00 00 00          mov    $0x1,%edi
  30:       e8 00 00 00 00          callq  35 <main+0x35>
                    31: R_X86_64_PC32       __printf_chk-0x4
  35:       eb d9                   jmp    10 <main+0x10>

The printf (compiled to __printf_chk) was moved to the very end of the function, after puts and the return to improve branch prediction as mentioned by other answers.

So it is basically the same as:

int main() {
    int i = !time(NULL);
    if (i)
        goto printf;
puts:
    puts("a");
    return 0;
printf:
    printf("%d\n", i);
    goto puts;
}

This optimization was not done with -O0.

But good luck on writing an example that runs faster with __builtin_expect than without, CPUs are really smart these days. My naive attempts are here.

C++20 [[likely]] and [[unlikely]]

C++20 has standardized those C++ built-ins: How to use C++20's likely/unlikely attribute in if-else statement They will likely (a pun!) do the same thing.

[浮城] 2024-07-11 21:41:55

它们提示编译器发出指令,这些指令将导致分支预测有利于跳转指令的“可能”一侧。 这可能是一个巨大的胜利,如果预测正确,则意味着跳转指令基本上是免费的并且将占用零周期。 另一方面,如果预测错误,则意味着需要刷新处理器管道,并且可能会花费几个周期。 只要预测在大多数情况下都是正确的,这往往有利于性能。

与所有此类性能优化一样,您应该只在进行广泛的分析后才进行此操作,以确保代码确实处于瓶颈,并且可能考虑到它在紧密循环中运行的微观性质。 一般来说,Linux 开发人员都非常有经验,所以我想他们会这样做。 他们并不太关心可移植性,因为他们只针对 gcc,并且他们对想要生成的程序集有非常详细的了解。


请注意,除了某些 ISA 上的静态预测(向后采取/向前不采取)之外,大多数 ISA 没有办法让机器代码实际提示硬件分支预测器。 在 2013 年左右的 x86 等现代实现中,即使这不再是问题了:

likelyunlikely 宏或 C++ [[likely]] / [[unlikely]] 注释可以提示编译器的分支布局有利于快速路径的 I-cache 局部性,并最小化快速路径上采用的分支。 还暗示在可能的情况下做出分支汇编与无分支汇编的决定。

They are hint to the compiler to emit instructions that will cause branch prediction to favour the "likely" side of a jump instruction. This can be a big win, if the prediction is correct it means that the jump instruction is basically free and will take zero cycles. On the other hand if the prediction is wrong, then it means the processor pipeline needs to be flushed and it can cost several cycles. So long as the prediction is correct most of the time, this will tend to be good for performance.

Like all such performance optimisations you should only do it after extensive profiling to ensure the code really is in a bottleneck, and probably given the micro nature, that it is being run in a tight loop. Generally the Linux developers are pretty experienced so I would imagine they would have done that. They don't really care too much about portability as they only target gcc, and they have a very close idea of the assembly they want it to generate.


Note that most ISAs don't have a way for the machine code to actually hint the hardware branch predictor, other than static prediction (backward taken / forward not-taken) on some. And on modern implementations like x86 since 2013 or so, even that's not a thing anymore:

The likely and unlikely macros or C++ [[likely]] / [[unlikely]] annotations can hint the compiler's branch layout to favour I-cache locality for the fast path, and minimize taken branches on the fast path. Also to hint the decision to make branchy vs. branchless asm when that's possible.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文