Linux 内核中可能/不可能的宏如何工作以及它们的好处是什么？

发布于 2024-07-04 21:41:55 字数 431 浏览 9 评论 0原文

我一直在挖掘Linux内核的某些部分，发现了这样的调用：

if (unlikely(fd < 0))
{
    /* Do something */
}

或者

if (likely(!err))
{
    /* Do something */
}

我找到了它们的定义：

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

我知道它们是为了优化，但是它们是如何工作的？使用它们预计性能/尺寸会降低多少？至少在瓶颈代码中（当然是在用户空间中）是否值得这么麻烦（并且可能会失去可移植性）。

原文

I've been digging through some parts of the Linux kernel, and found calls like this:

if (unlikely(fd < 0))
{
    /* Do something */
}

if (likely(!err))
{
    /* Do something */
}

I've found the definition of them:

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

I know that they are for optimization, but how do they work? And how much performance/size decrease can be expected from using them? And is it worth the hassle (and losing the portability probably) at least in bottleneck code (in userspace, of course).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

胡渣熟男 2024-07-11 21:41:56

它们是编译器在分支上生成提示前缀的提示。在 x86/x64 上，它们占用一个字节，因此每个分支最多会增加一个字节。至于性能，它完全取决于应用程序——在大多数情况下，处理器上的分支预测器现在会忽略它们。

编辑：忘记了他们真正可以提供帮助的一个地方。它可以允许编译器对控制流图重新排序，以减少“可能”路径所采用的分支数量。这可以显着改进检查多个退出情况的循环。

回复收藏 0 原文

乱了心跳 2024-07-11 21:41:56

这些是 GCC 函数，供程序员向编译器提示给定表达式中最可能的分支条件是什么。这允许编译器构建分支指令，以便最常见的情况需要执行最少的指令。

分支指令的构建方式取决于处理器架构。

回复收藏 0 原文

近箐 2024-07-11 21:41:56

根据 Cody 的评论，这与 Linux 无关，而是对编译器的一个提示。发生的情况取决于体系结构和编译器版本。

Linux 中的这一特殊功能在驱动程序中有些被误用。正如 osgx 在 hot 属性的语义，块中调用的任何 hot 或 cold 函数都可以自动提示该条件是否可能。例如，dump_stack() 被标记为 cold，因此这是多余的，

 if(unlikely(err)) {
     printk("Driver error found. %d\n", err);
     dump_stack();
 }

gcc 的未来版本可能会根据这些提示选择性地内联函数。也有人建议它不是 boolean，而是 最有可能 等中的分数。通常，应该优先使用某种替代机制，例如 冷。除了热路径之外，没有理由在任何地方使用它。编译器在一种架构上执行的操作可能在另一种架构上完全不同。

As per the comment by Cody, this has nothing to do with Linux, but is a hint to the compiler. What happens will depend on the architecture and compiler version.

This particular feature in Linux is somewhat mis-used in drivers. As osgx points out in semantics of hot attribute, any hot or cold function called with in a block can automatically hint that the condition is likely or not. For instance, dump_stack() is marked cold so this is redundant,

 if(unlikely(err)) {
     printk("Driver error found. %d\n", err);
     dump_stack();
 }

Future versions of gcc may selectively inline a function based on these hints. There have also been suggestions that it is not boolean, but a score as in most likely, etc. Generally, it should be preferred to use some alternate mechanism like cold. There is no reason to use it in any place but hot paths. What a compiler will do on one architecture can be completely different on another.

回复收藏 0 原文

虐人心 2024-07-11 21:41:56

这些宏向编译器提示分支的走向。这些宏将扩展为 GCC 特定扩展（如果可用）。

GCC 使用这些来优化分支预测。例如，如果您有类似以下的内容

if (unlikely(x)) {
  dosomething();
}

return x;

那么它可以将此代码重组为更像这样的内容：

if (!x) {
  return x;
}

dosomething();
return x;

这样做的好处是，当处理器第一次采用分支时，会产生很大的开销，因为它可能已经推测性加载并进一步执行代码。当它确定要采用该分支时，它必须使该分支无效，并从分支目标开始。

大多数现代处理器现在都具有某种分支预测功能，但这仅在您之前已经经历过分支并且分支仍在分支预测缓存中时才有帮助。

在这些场景中，编译器和处理器可以使用许多其他策略。您可以在 Wikipedia 找到有关分支预测器如何工作的更多详细信息： http://en.wikipedia.org/wiki/分支预测器

These are macros that give hints to the compiler about which way a branch may go. The macros expand to GCC specific extensions, if they're available.

GCC uses these to to optimize for branch prediction. For example, if you have something like the following

if (unlikely(x)) {
  dosomething();
}

return x;

Then it can restructure this code to be something more like:

if (!x) {
  return x;
}

dosomething();
return x;

The benefit of this is that when the processor takes a branch the first time, there is significant overhead, because it may have been speculatively loading and executing code further ahead. When it determines it will take the branch, then it has to invalidate that, and start at the branch target.

Most modern processors now have some sort of branch prediction, but that only assists when you've been through the branch before, and the branch is still in the branch prediction cache.

There are a number of other strategies that the compiler and processor can use in these scenarios. You can find more details on how branch predictors work at Wikipedia: http://en.wikipedia.org/wiki/Branch_predictor

回复收藏 0 原文

流心雨 2024-07-11 21:41:56

它们使编译器在硬件支持的情况下发出适当的分支提示。这通常只意味着调整指令操作码中的一些位，因此代码大小不会改变。 CPU将从预测的位置开始获取指令，如果到达分支时发现错误，则刷新管道并重新开始；在提示正确的情况下，这将使分支更快——具体快多少取决于硬件；这对代码性能的影响程度取决于时间提示正确的比例。

例如，在 PowerPC CPU 上，未提示的分支可能需要 16 个周期，正确提示的分支可能需要 8 个周期，而错误提示的分支可能需要 24 个周期。在最里面的循环中，良好的提示可以产生巨大的差异。

可移植性并不是真正的问题 - 大概定义是在每个平台的标头中；对于不支持静态分支提示的平台，您可以简单地将“可能”和“不可能”定义为空。

回复收藏 0 原文

沙与沫 2024-07-11 21:41:56

long __builtin_expect(long EXP, long C);

这个结构告诉编译器表达式 EXP
最有可能的值为 C。返回值为 EXP。
__builtin_expect 用于条件语句
表达。在几乎所有情况下，它都会被用在
布尔表达式的上下文，在这种情况下它是很多
更方便地定义两个辅助宏：

#define unlikely(expr) __builtin_expect(!!(expr), 0)
#define likely(expr) __builtin_expect(!!(expr), 1)

然后可以使用这些宏，如下所示：

if (likely(a > 1))

参考

long __builtin_expect(long EXP, long C);

This construct tells the compiler that the expression EXP
most likely will have the value C. The return value is EXP.
__builtin_expect is meant to be used in an conditional
expression. In almost all cases will it be used in the
context of boolean expressions in which case it is much
more convenient to define two helper macros:

#define unlikely(expr) __builtin_expect(!!(expr), 0)
#define likely(expr) __builtin_expect(!!(expr), 1)

These macros can then be used as in:

if (likely(a > 1))

Reference

回复收藏 0 原文

枕花眠 2024-07-11 21:41:56

（一般评论 - 其他答案涵盖了细节）

您没有理由因为使用它们而失去可移植性。

您始终可以选择创建一个简单的零效果“内联”或宏，以便您可以使用其他编译器在其他平台上进行编译。

如果您使用其他平台，您将无法获得优化的好处。

回复收藏 0 原文

所有深爱都是秘密 2024-07-11 21:41:56

在许多linux版本中，您都可以在 /usr/linux/ 中找到 compiler.h ，您可以将其包含起来以方便使用。另一种观点是，unlikely() 比 likely() 更有用，因为

if ( likely( ... ) ) {
     doSomething();
}

它也可以在许多编译器中进行优化。

顺便说一句，如果你想观察代码的详细行为，你可以简单地执行以下操作：

gcc -c test.c
objdump -d test.o > obj.s

然后，打开 obj.s，你就可以找到答案。

In many linux release, you can find compiler.h in /usr/linux/ , you can include it for use simply. And another opinion, unlikely() is more useful rather than likely(), because

if ( likely( ... ) ) {
     doSomething();
}

it can be optimized as well in many compiler.

And by the way, if you want to observe the detail behavior of the code, you can do simply as follow:

gcc -c test.c
objdump -d test.o > obj.s

Then, open obj.s, you can find the answer.

回复收藏 0 原文

飘过的浮云 2024-07-11 21:41:56

让我们反编译看看 GCC 4.8 做了什么

没有 __builtin_expect

#include "stdio.h"
#include "time.h"

int main() {
    /* Use time to prevent it from being optimized away. */
    int i = !time(NULL);
    if (i)
        printf("%d\n", i);
    puts("a");
    return 0;
}

使用 GCC 4.8.2 x86_64 Linux 编译和反编译：

gcc -c -O3 -std=gnu11 main.c
objdump -dr main.o

输出：

0000000000000000 <main>:
   0:       48 83 ec 08             sub    $0x8,%rsp
   4:       31 ff                   xor    %edi,%edi
   6:       e8 00 00 00 00          callq  b <main+0xb>
                    7: R_X86_64_PC32        time-0x4
   b:       48 85 c0                test   %rax,%rax
   e:       75 14                   jne    24 <main+0x24>
  10:       ba 01 00 00 00          mov    $0x1,%edx
  15:       be 00 00 00 00          mov    $0x0,%esi
                    16: R_X86_64_32 .rodata.str1.1
  1a:       bf 01 00 00 00          mov    $0x1,%edi
  1f:       e8 00 00 00 00          callq  24 <main+0x24>
                    20: R_X86_64_PC32       __printf_chk-0x4
  24:       bf 00 00 00 00          mov    $0x0,%edi
                    25: R_X86_64_32 .rodata.str1.1+0x4
  29:       e8 00 00 00 00          callq  2e <main+0x2e>
                    2a: R_X86_64_PC32       puts-0x4
  2e:       31 c0                   xor    %eax,%eax
  30:       48 83 c4 08             add    $0x8,%rsp
  34:       c3                      retq

中的指令顺序内存没有变化：首先是 printf，然后是 puts 和 retq 返回。

使用 __builtin_expect

现在将 if (i) 替换为：

if (__builtin_expect(i, 0))

，我们得到：

0000000000000000 <main>:
   0:       48 83 ec 08             sub    $0x8,%rsp
   4:       31 ff                   xor    %edi,%edi
   6:       e8 00 00 00 00          callq  b <main+0xb>
                    7: R_X86_64_PC32        time-0x4
   b:       48 85 c0                test   %rax,%rax
   e:       74 11                   je     21 <main+0x21>
  10:       bf 00 00 00 00          mov    $0x0,%edi
                    11: R_X86_64_32 .rodata.str1.1+0x4
  15:       e8 00 00 00 00          callq  1a <main+0x1a>
                    16: R_X86_64_PC32       puts-0x4
  1a:       31 c0                   xor    %eax,%eax
  1c:       48 83 c4 08             add    $0x8,%rsp
  20:       c3                      retq
  21:       ba 01 00 00 00          mov    $0x1,%edx
  26:       be 00 00 00 00          mov    $0x0,%esi
                    27: R_X86_64_32 .rodata.str1.1
  2b:       bf 01 00 00 00          mov    $0x1,%edi
  30:       e8 00 00 00 00          callq  35 <main+0x35>
                    31: R_X86_64_PC32       __printf_chk-0x4
  35:       eb d9                   jmp    10 <main+0x10>

printf （编译为 __printf_chk) 被移动到函数的最后，在 puts 和返回之后，以改进分支预测，如其他答案所述。

所以它基本上与：

int main() {
    int i = !time(NULL);
    if (i)
        goto printf;
puts:
    puts("a");
    return 0;
printf:
    printf("%d\n", i);
    goto puts;
}

此优化不是用-O0完成的。

但祝你好运，编写一个使用 __builtin_expect 比不使用 __builtin_expect 运行速度更快的示例，现在的 CPU 真的很智能。我天真的尝试在这里。

C++20 [[likely]] 和 [[unlikely]]

C++20 已标准化这些 C++ 内置函数：如何在中使用 C++20 的 likely/unlikely 属性if-else 语句他们很可能（双关语！）做同样的事情。

Let's decompile to see what GCC 4.8 does with it

Without __builtin_expect

#include "stdio.h"
#include "time.h"

int main() {
    /* Use time to prevent it from being optimized away. */
    int i = !time(NULL);
    if (i)
        printf("%d\n", i);
    puts("a");
    return 0;
}

Compile and decompile with GCC 4.8.2 x86_64 Linux:

gcc -c -O3 -std=gnu11 main.c
objdump -dr main.o

Output:

0000000000000000 <main>:
   0:       48 83 ec 08             sub    $0x8,%rsp
   4:       31 ff                   xor    %edi,%edi
   6:       e8 00 00 00 00          callq  b <main+0xb>
                    7: R_X86_64_PC32        time-0x4
   b:       48 85 c0                test   %rax,%rax
   e:       75 14                   jne    24 <main+0x24>
  10:       ba 01 00 00 00          mov    $0x1,%edx
  15:       be 00 00 00 00          mov    $0x0,%esi
                    16: R_X86_64_32 .rodata.str1.1
  1a:       bf 01 00 00 00          mov    $0x1,%edi
  1f:       e8 00 00 00 00          callq  24 <main+0x24>
                    20: R_X86_64_PC32       __printf_chk-0x4
  24:       bf 00 00 00 00          mov    $0x0,%edi
                    25: R_X86_64_32 .rodata.str1.1+0x4
  29:       e8 00 00 00 00          callq  2e <main+0x2e>
                    2a: R_X86_64_PC32       puts-0x4
  2e:       31 c0                   xor    %eax,%eax
  30:       48 83 c4 08             add    $0x8,%rsp
  34:       c3                      retq

The instruction order in memory was unchanged: first the printf and then puts and the retq return.

With __builtin_expect

Now replace if (i) with:

if (__builtin_expect(i, 0))

and we get:

0000000000000000 <main>:
   0:       48 83 ec 08             sub    $0x8,%rsp
   4:       31 ff                   xor    %edi,%edi
   6:       e8 00 00 00 00          callq  b <main+0xb>
                    7: R_X86_64_PC32        time-0x4
   b:       48 85 c0                test   %rax,%rax
   e:       74 11                   je     21 <main+0x21>
  10:       bf 00 00 00 00          mov    $0x0,%edi
                    11: R_X86_64_32 .rodata.str1.1+0x4
  15:       e8 00 00 00 00          callq  1a <main+0x1a>
                    16: R_X86_64_PC32       puts-0x4
  1a:       31 c0                   xor    %eax,%eax
  1c:       48 83 c4 08             add    $0x8,%rsp
  20:       c3                      retq
  21:       ba 01 00 00 00          mov    $0x1,%edx
  26:       be 00 00 00 00          mov    $0x0,%esi
                    27: R_X86_64_32 .rodata.str1.1
  2b:       bf 01 00 00 00          mov    $0x1,%edi
  30:       e8 00 00 00 00          callq  35 <main+0x35>
                    31: R_X86_64_PC32       __printf_chk-0x4
  35:       eb d9                   jmp    10 <main+0x10>

The printf (compiled to __printf_chk) was moved to the very end of the function, after puts and the return to improve branch prediction as mentioned by other answers.

So it is basically the same as:

int main() {
    int i = !time(NULL);
    if (i)
        goto printf;
puts:
    puts("a");
    return 0;
printf:
    printf("%d\n", i);
    goto puts;
}

This optimization was not done with -O0.

But good luck on writing an example that runs faster with __builtin_expect than without, CPUs are really smart these days. My naive attempts are here.

C++20 [[likely]] and [[unlikely]]

C++20 has standardized those C++ built-ins: How to use C++20's likely/unlikely attribute in if-else statement They will likely (a pun!) do the same thing.

回复收藏 0 原文