是“IF”吗? 昂贵的?

发布于 07-09 02:23 字数 231 浏览 9 评论 0原文

我一辈子都记不起老师那天到底说了什么,我希望你能知道。

该模块是“数据结构和算法”,他告诉我们以下内容:

if 语句是最昂贵的 [某物]。 [某事] 注册 [某事]。

是的,我的记忆力确实很糟糕,我真的很抱歉,但我已经在谷歌上搜索了几个小时,但什么也没找到。 有任何想法吗?

I can't, for the life of me, remember what exactly our teacher said that day and I'm hoping you would probably know.

The module is "Data Structures and Algorithms" and he told us something along the lines of:

The if statement is the most expensive
[something]. [something] registers
[something].

Yes, I do have a horrible memory and I'm really really sorry, but I've been googling for hours and nothing has come up. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

美男兮2024-07-16 02:23:03

正如许多人指出的那样,条件分支在现代计算机上可能非常慢。

话虽这么说,有很多条件分支并不存在于 if 语句中,您无法总是知道编译器会产生什么结果,并且担心基本语句将花费多长时间实际上总是错误的事情去做。 (如果你能知道编译器将可靠地生成什么,那么你可能没有一个好的优化编译器。)

As pointed out by many, conditional branches can be very slow on a modern computer.

That being said, there are a whole lot of conditional branches that don't live in if statements, you can't always tell what the compiler will come up with, and worrying about how long basic statements will take is virtually always the wrong thing to do. (If you can tell what the compiler will generate reliably, you may not have a good optimizing compiler.)

当爱已成负担2024-07-16 02:23:03

我唯一能想象到的可能是 if 语句通常可以产生分支。 根据处理器架构的具体情况,分支可能会导致管道停顿或其他不太理想的情况。

然而,这是极其具体的情况 - 大多数现代处理器都具有分支预测功能,试图最大限度地减少分支的负面影响。 另一个例子是 ARM 架构(可能还有其他架构)如何处理条件逻辑 - ARM 具有指令级条件执行,因此简单的条件逻辑不会导致分支 - 如果条件不满足,指令将简单地作为 NOP 执行。

综上所述,在担心这些事情之前先确保你的逻辑正确。 不正确的代码是尽可能未优化的。

The only thing I can imagine this might be referring to is the fact that an if statement generally can result in a branch. Depending on the specifics of the processor architecture, branches can cause pipeline stalls or other less than optimal situations.

However, this is extremely situation specific - most modern processors have branch prediction capabilities that attempt to minimize the negative effects of branching. Another example would be how the ARM architecture (and probably others) can handle conditional logic - the ARM has instruction level conditional execution, so simple conditional logic results in no branching - the instructions simply execute as NOPs if the conditions are not met.

All that said - get your logic correct before worrying about this stuff. Incorrect code is as unoptimized as you can get.

○愚か者の日2024-07-16 02:23:03

CPU 是深度流水线的。 任何分支指令(if/for/while/switch/etc)都意味着CPU并不真正知道接下来要加载和运行什么指令。

CPU 要么在等待知道要做什么时停止,要么进行猜测。 对于较旧的 CPU,或者如果猜测错误,您将不得不在它加载正确的指令时遭遇管道停顿。 根据 CPU 的不同,这可能会导致高达 10-20 条指令的停顿。

现代 CPU 试图通过良好的分支预测、同时执行多个路径并仅保留实际路径来避免这种情况。 这有很大帮助,但也只能到此为止。

祝你在课堂上好运。

另外,如果您在现实生活中不得不担心这个问题,您可能正在做操作系统设计、实时图形、科学计算或类似的 CPU 限制的事情。 担心之前的简介。

CPUs are deeply pipelined. Any branch instruction (if/for/while/switch/etc) means that the CPU doesn't really know what instruction to load and run next.

The CPU either stalls while waiting to know what to do, or the CPU takes a guess. In the case of an older CPU, or if the guess is wrong, you'll have to suffer a pipeline stall while it goes and loads the correct instruction. Depending on the CPU this can be as high as 10-20 instructions worth of stall.

Modern CPUs try to avoid this by doing good branch prediction, and by executing multiple paths at the same time, and only keeping the actual one. This helps out a lot, but can only go so far.

Good luck in the class.

Also, if you have to worry about this in real life, you're probably doing OS design, realtime graphics, scientific computing, or something similarly CPU-bound. Profile before worrying.

泛滥成性2024-07-16 02:23:03

以最清晰、最简单、最干净的方式编写程序,而且效率不会明显低下。 这充分利用了最昂贵的资源,即您。 无论是编写程序还是稍后调试(需要理解)程序。 如果性能不够,请测量瓶颈所在,并了解如何缓解它们。 只有在极少数情况下,您在执行此操作时才需要担心个别(源)说明。 性能是指在第一行选择正确的算法和数据结构,仔细编程,获得足够快的机器。 使用好的编译器,当您看到现代编译器所做的代码重组时,您会感到惊讶。 重组代码以提高性能是一种最后的手段,代码会变得更加复杂(因此有更多错误),更难以修改,因此总体上更加昂贵。

Write your programs the clearest, simplest, cleanest way that isn't obviously inefficient. That makes the best use of the most expensive resource, you. Be it writing or later debugging (requires understanding) the program. If the performance isn't enough, measure where the bottlenecks are, and see how to mitigate them. Only on extremely rare occasions will you have to worry about individual (source) instructions when doing so. Performance is about selecting the right algorithms and data structures in the first line, careful programing, getting a fast enough machine. Use a good compiler, you'd be surprised when seeing the kind of code restructuring a modern compiler does. Restructuring code for performance is a sort of last resort measure, the code gets more complex (thus buggier), harder to modify, and thus all-around more expensive.

奢欲2024-07-16 02:23:03

一些CPU(如X86)提供编程级别的分支预测以避免这种分支预测延迟。

一些编译器(如 GCC)将它们公开为高级编程语言(如 C/C++)的扩展。

请参阅 Linux 内核中的 likely()/unlikely() 宏 - 它们如何工作? 他们有什么好处?

Some CPU's(like X86) provides branch prediction to programming level to avoid such a branch prediction latency.

Some compiler exposes(like GCC) these as a extension to higher level programming languages(like C/C++).

Refer likely()/unlikely() macros in the Linux kernel - how do they work? What's their benefit?.

仅一夜美梦2024-07-16 02:23:03

就 ALU 使用而言最昂贵的是? 它会占用 CPU 寄存器来存储要比较的值,并且每次运行 if 语句时都会占用时间来获取和比较值。

因此,对此的优化是在运行循环之前进行一次比较并将结果存储为变量。

只是想解释一下你漏掉的单词。

The most expensive in terms of ALU usage? It uses up CPU registers to store the values to be compared and takes up time to fetch and compare the values each time the if statement is run.

Therefore an optimization of that is to do one comparison and store the result as a variable before the loop is run.

Just trying to interpret your missing words.

请持续率性2024-07-16 02:23:03

我曾经和一个朋友因为这个问题发生过争执。 他使用的是一种非常幼稚的圆算法,但声称他的比我的更快(只计算圆的1/8的那种),因为我使用了if。 最后,if 语句被 sqrt 替换,不知何故速度更快。 也许是因为 FPU 内置了 sqrt?

I had this argument with a friend of mine once. He was using a very naive circle algorithm, but claimed his to be faster than mine (The kind that only calculates 1/8th of the circle) because mine used if. In the end, the if statement was replaced with sqrt and somehow that was faster. Perhaps because the FPU has sqrt built in?

﹂绝世的画2024-07-16 02:23:03

你的代码应该是可预测的和可能的。

如果你的整个程序是这样的:

int apple = 1;

if (apple == 1) then 这是可预测且可能的代码。

它也是优化的代码,因为您已经使编译器和 CPU 变得容易; 他们不需要预测任何事情,因此不存在代价高昂的错误预测,即分支错误预测。

所以你尝试编写一个程序,使每一行都是一个自我实现的预言。
你有 3 种筹码:真相、虚假和未知。
您正在尝试仅使用真相芯片构建一个程序。

为此:

If else: if should be more likely and if there is a return that should be in else.

For and While should be replace by: do while -> except if there is a continue.

That continue should then become an: if do while -> in that order.

If it absolutely necessary to test at beginning use: if do while

If there is less than 5 cases switch to if else from most likely to least likely

Cases should be of relative likelihood, otherwise should be expressed as if else before switch.

Bitwise operators and better logical operators

“简单的整数运算,例如加法、减法、比较、位运算和移位运算(以及增量运算符)在大多数微处理器上只需要一个时钟周期。”

增量运算符:i++ 优于 ++I;

布尔操作数:

  1. In && 将最有可能为真的陈述放在最后
  2. In || 把最有可能为真的放在第一位。

因此,为了回答您的问题,如果条件为真或可能为真,则 if 语句并不那么昂贵,否则它会陷入分支错误预测。

Your code should be predictable and likely.

If your whole program is this:

int apple = 1;

if (apple == 1) then that is predictable and likely code.

It is also optimized code because you have made it easy for the compiler and cpu; they don't have to predict anything therefore there are no mispredictions aka Branch Mispredictions which are costly.

So you try to write a program so that each line is a self fulfilling prophecy.
You got 3 kinds of chips: Truth, False and Unknown.
You are trying to build a program with only Truth chips.

Towards that end:

If else: if should be more likely and if there is a return that should be in else.

For and While should be replace by: do while -> except if there is a continue.

That continue should then become an: if do while -> in that order.

If it absolutely necessary to test at beginning use: if do while

If there is less than 5 cases switch to if else from most likely to least likely

Cases should be of relative likelihood, otherwise should be expressed as if else before switch.

Bitwise operators and better logical operators

“Simple integer operations such as addition, subtraction, comparison, bit operations and shift operations (and increment operators) take only one clock cycle on most microprocessors.”

Incremental operators: i++ is better than ++I;

Boolean operands:

  1. In && statement put most likely to be true last
  2. In || put most likely to be true first.

So to answer your question, the if statement is not that expensive if the condition is true or likely to be true otherwise it falls into branch misprediction.

_蜘蛛2024-07-16 02:23:03

在许多较旧的处理器上,人们可以识别“如果”的情况会很昂贵以及不会的情况,但是现代高性能处理器包括用于预测将采用和不会采用哪些分支的电路,并且只有在以下情况下分支才会昂贵这样的电路猜测错误。 不幸的是,这通常使得确定编写一段代码的最佳方式变得非常困难,因为处理器完全有可能在处理人为的测试数据时正确预测分支结果,但在处理现实世界时却猜测其中许多结果是错误的数据,反之亦然。

除非人们试图优化其分支时序很好理解的特定目标的性能,否则最好的方法通常是假设分支时序不太可能成为整体性能的重要因素,除非或直到人们能够证明这一点。 分支时序可能会受到输入数据中细微差异的影响,并且通常没有实际的方法来确保测试数据包括可能影响性能的所有变化。

On many older processors, one could identify circumstances were "if" would be expensive and circumstances where it wouldn't, but modern high-performance processors include circuitry to predict which branches will and won't be taken, and branches are only costly if such circuitry guesses wrong. Unfortunately, this often makes it very difficult to determine the optimal way of writing a piece of code, since it's entirely possible that a processor might correctly predict branch outcomes when processing contrived test data, but then guess many of them wrong when processing real-world data, or vice versa.

Unless one is trying to optimize performance on a particular target whose branch timings are well understood, the best approach is usually to assume that the branch timings are unlikely to be an important factor in overall performance unless or until one can demonstrate otherwise. Branch timings may be influenced by subtle differences in input data, and there's often no practical way to ensure that test data includes all variations that might affect performance.

橘和柠2024-07-16 02:23:02

在最低级别(在硬件中),是的,如果是昂贵的。 为了理解其中的原因,您必须了解管道的工作原理。

当前要执行的指令通常存储在指令指针 (IP) 或程序计数器 (PC) 中; 这些术语是同义词,但不同的术语用于不同的架构。 对于大多数指令,下一条指令的PC只是当前PC加上当前指令的长度。 对于大多数RISC架构,指令都是恒定长度,因此PC可以按恒定量递增。 对于x86等CISC架构,指令可以是可变长度的,因此解码指令的逻辑必须弄清楚当前指令有多长才能找到下一条指令的位置。

然而,对于分支指令,要执行的下一条指令不是当前指令之后的下一个位置。 分支是 goto - 它们告诉处理器下一条指令在哪里。 分支可以是有条件的或无条件的,目标位置可以是固定的或计算的。

有条件与无条件很容易理解 - 仅当某个条件成立时才会采用条件分支(例如一个数字是否等于另一个数字); 如果未采用分支,则控制将像平常一样继续执行分支后的下一条指令。 对于无条件分支,总是采用该分支。 条件分支出现在 if 语句以及 forwhile 循环的控制测试中。 无条件分支出现在无限循环、函数调用、函数返回、break 和 continue 语句、臭名昭著的 goto 语句等等(这些列表远非详尽)。

分支目标是另一个重要问题。 大多数分支都有固定的分支目标 - 它们转到在编译时固定的代码中的特定位置。 这包括 if 语句、各种循环、常规函数调用等等。 计算分支在运行时计算分支的目标。 这包括 switch 语句(有时)、从函数返回、虚函数调用和函数指针调用。

那么这对性能意味着什么呢? 当处理器看到其管道中出现分支指令时,它需要弄清楚如何继续填充其管道。 为了弄清楚程序流中分支之后有哪些指令,它需要知道两件事:(1)是否将采用分支以及(2)分支的目标。 弄清楚这一点被称为分支预测,这是一个具有挑战性的问题。 如果处理器猜测正确,程序就会全速继续运行。 相反,如果处理器猜测不正确,它只是花了一些时间计算错误的东西。 现在它必须刷新其管道并使用来自正确执行路径的指令重新加载它。 底线:性能受到巨大影响。

因此,if 语句昂贵的原因是由于分支错误预测。 这只是最低级别的。 如果您正在编写高级代码,则根本不需要担心这些细节。 仅当您用 C 或汇编编写对性能极其关键的代码时,您才应该关心这一点。 如果是这种情况,编写无分支代码通常优于分支代码,即使需要更多指令。 您可以使用一些很酷的小技巧来计算诸如 abs()min()max() 之类的东西,而无需分枝。

At the very lowest level (in the hardware), yes, ifs are expensive. In order to understand why, you have to understand how pipelines work.

The current instruction to be executed is stored in something typically called the instruction pointer (IP) or program counter (PC); these terms are synonymous, but different terms are used with different architectures. For most instructions, the PC of the next instruction is just the current PC plus the length of the current instruction. For most RISC architectures, instructions are all a constant length, so the PC can be incremented by a constant amount. For CISC architectures such as x86, instructions can be variable-length, so the logic that decodes the instruction has to figure out how long the current instruction is to find the location of the next instruction.

For branch instructions, however, the next instruction to be executed is not the next location after the current instruction. Branches are gotos - they tell the processor where the next instruction is. Branches can either be conditional or unconditional, and the target location can be either fixed or computed.

Conditional vs. unconditional is easy to understand - a conditional branch is only taken if a certain condition holds (such as whether one number equals another); if the branch is not taken, control proceeds to the next instruction after the branch like normal. For unconditional branches, the branch is always taken. Conditional branches show up in if statements and the control tests of for and while loops. Unconditional branches show up in infinite loops, function calls, function returns, break and continue statements, the infamous goto statement, and many more (these lists are far from exhaustive).

The branch target is another important issue. Most branches have a fixed branch target - they go to a specific location in code that is fixed at compile time. This includes if statements, loops of all sorts, regular function calls, and many more. Computed branches compute the target of the branch at runtime. This includes switch statements (sometimes), returning from a function, virtual function calls, and function pointer calls.

So what does this all mean for performance? When the processor sees a branch instruction appear in its pipeline, it needs to figure out how to continue to fill up its pipeline. In order to figure out what instructions come after the branch in the program stream, it needs to know two things: (1) if the branch will be taken and (2) the target of the branch. Figuring this out is called branch prediction, and it's a challenging problem. If the processor guesses correctly, the program continues at full speed. If instead the processor guesses incorrectly, it just spent some time computing the wrong thing. It now has to flush its pipeline and reload it with instructions from the correct execution path. Bottom line: a big performance hit.

Thus, the reason why if statements are expensive is due to branch mispredictions. This is only at the lowest level. If you're writing high-level code, you don't need to worry about these details at all. You should only care about this if you're writing extremely performance-critical code in C or assembly. If that is the case, writing branch-free code can often be superior to code that branches, even if several more instructions are needed. There are some cool bit-twiddling tricks you can do to compute things such as abs(), min(), and max() without branching.

装迷糊2024-07-16 02:23:02

“昂贵”是一个非常相对的术语,特别是与“if”语句的关系,因为您还必须考虑条件的成本。 其范围可以从一些简短的 CPU 指令到测试调用远程数据库的函数的结果。

我不会担心这个。 除非您正在进行嵌入式编程,否则您可能根本不应该担心“if”的成本。 对于大多数程序员来说,它永远不会成为应用程序性能的驱动因素。

"Expensive" is a very relative term, especially with relationship to an "if" statement since you also have to take into the account the cost of the condition. That could range anywhere from a few short cpu instructions to testing the result of a function that calls out to a remote database.

I wouldn't worry about it. Unless you're doing embedded programming you probably shouldn't be concerned about the cost of "if" at all. For most programmers it's just not going to ever be the driving factor in your app's performance.

农村范ル2024-07-16 02:23:02

分支,特别是在 RISC 架构微处理器上,是最昂贵的指令之一。 这是因为在许多体系结构上,编译器会预测最有可能采用哪条执行路径,并将这些指令放在可执行文件中的下一个位置,因此当分支发生时它们已经位于 CPU 缓存中。 如果分支走另一条路,它必须返回主内存并获取新指令——这是相当昂贵的。 在许多 RISC 架构上,除了分支(通常是 2 个周期)之外,所有指令都是一个周期。 我们这里讨论的不是主要成本,所以不用担心。 此外,编译器在 99% 的情况下都会比您优化得更好:) EPIC 架构(Itanium 就是一个例子)真正令人敬畏的事情之一是它缓存(并开始处理)来自分支两侧的指令,然后一旦知道分支的结果就丢弃不需要的集合。 如果典型架构沿着不可预测的路径分支,这可以节省额外的内存访问。

Branches, especially on RISC architecture microprocessors, are some of the most expensive instructions. This is because on many architectures, the compiler predicts which path of execution will be taken most likely and puts those instructions next in the executable, so they'll already be in the CPU cache when the branch happens. If the branch goes the other way, it has to go back out to main memory and fetch the new instructions -- that's fairly expensive. On many RISC architectures, all instructions are one cycle except for branch (which is often 2 cycles). We're not talking about a major cost here, so don't worry about it. Also, the compiler will optimize better than you do 99% of the time :) One of the really awesome things about the EPIC architecture (Itanium is an example) is that it caches (and begins processing) instructions from both sides of the branch, then discards the set it doesn't need once the outcome of the branch is known. This saves the extra memory access of a typical architecture in the event that it branches along the unpredicted path.

盗梦空间2024-07-16 02:23:02

请查看有关单元性能的文章通过分支消除提高性能。 另一篇有趣的文章是实时碰撞检测博客上的这篇关于无分支选择的文章

除了针对这个问题已经发布的优秀答案之外,我想提醒一下,尽管“if”语句被认为是昂贵的低级操作,但尝试在更高级别的环境中利用无分支编程技术,例如脚本语言或业务逻辑层(无论语言如何),可能是非常不合适的。

绝大多数情况下,编写程序时应首先考虑清晰度,其次考虑性能优化。 在许多问题领域中,性能至关重要,但简单的事实是,大多数开发人员编写的模块并不是用于渲染引擎的核心或连续运行数周的高性能流体动力学模拟。 当您的解决方案的首要任务是“正常工作”时,您最不需要考虑的事情应该是是否可以节省代码中条件语句的开销。

Check out the article Better Performance Through Branch Elimination on Cell Performance. Another fun one is this post about branchless selections on the Real Time Collision Detection Blog.

In addition to the excellent answers already posted in response to this question, I'd like to put in a reminder that although "if" statements are considered expensive low-level operations, trying to utilize branch-free programming techniques in a higher level environment, such as a scripting language or a business logic layer (regardless of language), may be ridiculously inappropriate.

The vast majority of the time, programs should be written for clarity first and optimized for performance second. There are numerous problem domains where performance is paramount, but the simple fact is that most developers are not writing modules for use deep in the core of a rendering engine or a high performance fluid dynamics simulation that runs for weeks on end. When the top priority is for your solution to "just work" the last thing on your mind should be whether or not you can save on the overhead of a conditional statement in your code.

呆萌少年2024-07-16 02:23:02

if 本身慢。 缓慢总是相对的,我敢打赌,你从来没有感受到 if 语句的“开销”。 如果您要编写高性能代码,无论如何您可能都希望避免分支。 使 if 变慢的原因是处理器根据一些启发式方法从 if 之后预加载代码。 它还会阻止管道直接在机器代码中的 if 分支指令之后执行代码,因为处理器还不知道将采取什么路径(在管道处理器中,多个指令是交错的,并且执行)。 执行的代码可能必须反向执行(如果采用了另一个分支。这称为分支错误预测),或者在这些地方填充了 noop,这样就不会发生这种情况。不会发生。

如果 if 是邪恶的,那么 switch 也是邪恶的,&&|| 也是邪恶的。 别担心。

if in itself is not slow. Slowness is always relative i bet for my life that you haven't ever felt the "overhead" of an if-statement. If you are going to make a high-performance code, you migh want to avoid branches anyway. What makes if slow is that the processor is preloading code from after the if based on some heuristic and whatnot. It will also stop pipelines from executing code directly after the if branch instruction in the machine code, since the processor doesn't know yet what path will be taken (in a pipelined processor, multiple instructions are interleaved and executed). Code executed could have to be executed in reverse (if the other branch was taken. it's called branch misprediction), or noop's be filled at those places so that this doesn't happen.

If if is evil, then switch is evil too, and &&, || too. Don't worry about it.

能否归途做我良人2024-07-16 02:23:02

在最低可能级别 if 包含(在计算特定 if 的所有特定于应用程序的先决条件之后):

  • 如果测试成功,则某些测试指令
  • 跳转到代码中的某个位置,否则继续前进。

与此相关的成本:

  • 低级比较——通常是 1 个 cpu 操作,超级便宜的
  • 潜在跳转——这可能很昂贵

思考为什么跳转很昂贵:

  • 你可以跳转到内存中任何地方的任意代码,如果事实证明它是这样的不被CPU缓存——我们有一个问题,因为我们需要访问主内存,而
  • 现代CPU执行分支预测的速度较慢。 他们尝试猜测是否会成功,并在管道中提前执行代码,从而加快速度。 如果预测失败,则管道之前完成的所有计算都必须失效。 这也是一项昂贵的操作

所以总结一下:

  • 如果您真的非常关心性能,那么这可能会很昂贵。
  • 当且仅当您正在编写实时光线追踪器或生物模拟或类似的东西时,您才应该关心它。 在现实世界的大多数情况下,没有理由关心它。

On the lowest possible level if consists of (after computing all the app-specific prerequisites for particular if):

  • some test instruction
  • jump to some place in the code if test succeeds, proceed forwards otherwise.

Costs associated with that:

  • a low level comparison -- usually 1 cpu operation, super cheap
  • potential jump -- which can be expensive

Reson why jumps are expensive:

  • you can jump to arbirary code that lives anywhere in memory, if it turns out that it is not cached by the cpu -- we have a problem, because we need to access main memory, which is slower
  • modern CPUs do branch predition. They try to guess whether if will succeed or not and execute code ahead in the pipeline, so speed things up. If the prediction fails all computation done ahead by pipeline has to be invalidated. That also is an expensive operation

So to sum up:

  • If can be expesive, if you really, really, relly care about performance.
  • You should care about it if and only if you are writing real time raytracer or biological simulation or something similar. There is no reason to care about it in most of the real world.
只为守护你2024-07-16 02:23:02

现代处理器具有很长的执行管道,这意味着多个指令在不同阶段同时执行。 当下一条指令开始运行时,他们可能并不总是知道一条指令的结果。 当他们遇到条件跳转(if)时,他们有时必须等到管道为空才能知道指令指针应该走哪条路。

我认为它是一列长货运列车。 它可以在直线上快速运载大量货物,但转弯性能很差。

Pentium 4 (Prescott) 拥有著名的 31 级长流水线。

更多关于维基百科

Modern processors have long execution pipelines which means that several instructions are executed in various stages at the same time. They may not always know the outcome of one instruction when the next one begins to run. When they run into a conditional jump (if) they sometimes have to wait until the pipeline is empty before they can know which way the instruction pointer should go.

I think of it as a long freight train. It can carry a lot of cargo fast in a straight line, but it corners badly.

Pentium 4 (Prescott) had a famously long pipeline of 31 stages.

More on Wikipedia

萌能量女王2024-07-16 02:23:02

也许分支会终止 CPU 指令预取?

Maybe the branching kills the CPU instruction prefetching?

红墙和绿瓦2024-07-16 02:23:02

另请注意,循环内部不一定非常昂贵。

现代 CPU 假定在第一次访问 if 语句时,将采用“if-body”(或者换句话说:它还假定循环体将被多次采用)(*)。 在第二次和进一步访问时,它(CPU)也许可以查看分支历史表,并查看上次的情况如何(是真的吗?是假的吗?)。 如果上次为 false,则推测执行将继续到 if 的“else”,或超出循环。

(*) 该规则实际上是“不采用前向分支,采用后向分支”。 在 if 语句中,如果条件计算结果为 false,则有一个[向前]跳转(到 if 主体之后的点)(记住:CPU 无论如何假设不进行分支/跳转),但在循环中,可能有一个到循环后位置的前向分支(不进行),以及重复时的向后分支(要进行)。

这也是为什么对虚函数或函数指针调用并不像许多人想象的那么糟糕的原因之一(http://phresnel.org/blog/

Also note that inside a loop is not necessarily very expensive.

Modern CPU assumes upon first visit of an if-statement, that the "if-body" is to be taken (or said the other way: it also assumes a loop-body to be taken multiple times) (*). Upon second and further visits, it (the CPU) can maybe look into the Branch History Table, and see how the condition was the last time (was it true? was it false?). If it was false the last time, then speculative execution will proceed to the "else" of the if, or beyond the loop.

(*) The rule is actually "forward branch not taken, backward branch taken". In an if-statement, there is only a [forward] jump (to the point after the if-body) if the condition evaluates to false (remember: the CPU anyways assumes to not take a branch/jump), but in a loop, there is maybe a forward branch to the position after the loop (not to be taken), and a backward branch upon repetetion (to be taken).

This is also one of the reasons why a call to a virtual function or a function-pointer-call is not that worse as many assume (http://phresnel.org/blog/)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文