位移是 O(1) 还是 O(n)?

发布于 2025-01-01 02:21:56 字数 205 浏览 5 评论 0原文

移位操作是O(1)还是O(n)

计算机通常需要更多的操作来移动 31 位而不是移动 1 位,这是否有意义?

或者无论我们需要移动多少个位置,移动所需的操作次数都是恒定,这是否有意义?

PS:想知道硬件是否是一个合适的标签。

Are shift operations O(1) or O(n) ?

Does it make sense that computers generally require more operations to shift 31 places instead of shifting 1 place?

Or does it make sense the number of operations required for shifting is constant regardless of how many places we need to shift?

PS: wondering if hardware is an appropriate tag..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

猫七 2025-01-08 02:21:56

某些指令集仅限于每条指令一位移位。某些指令集允许您在一条指令中指定要移位的任意数量的位,这在现代处理器上通常需要一个时钟周期(“现代”是一个故意模糊的词)。请参阅 dan04 的回答,了解桶形移位器,这是一种在一次操作中移位多个位的电路。

这一切都归结为逻辑算法。结果中的每一位都是基于输入的逻辑函数。对于单次右移,算法类似于:

  • 如果指令是 [右移] 并且输入的位 1 是 1,则结果的位 0 是 1,否则位 0 是 0。
  • 如果指令是 [右移],则位 1 = 位 2。
  • 依此类推。

但逻辑方程也可以很容易地表示为:

  • 如果指令是 [右移] 并且数量操作数为 1,则结果位 0 = 移位的输入位 1。
  • 如果则金额为 2位 0 = 位 2。
  • 依此类推。

逻辑门是异步的,可以在一个时钟周期内完成所有这些工作。然而,如果您要比较的只是这两种类型的指令,那么单移确实可以实现更快的时钟周期和更少的门来稳定。或者替代方案是使其需要更长的时间才能稳定,因此指令需要 2 或 3 个时钟或其他时间,逻辑计数到 3,然后锁存结果。

例如,MSP430 仅具有单位右移指令(因为您可以使用另一条指令执行单位移位或左循环,我将留给读者自行解决)。

ARM 指令集允许立即数和基于寄存器的多位循环、算术移位和逻辑移位。我认为只有一个实际的旋转指令,另一个是别名,因为向左旋转1与向右旋转32相同,你只需要一个单向桶形移位器来实现多位旋转。

x86 中的 SHL 允许每条指令有多个位,但过去它需要多个时钟。

等等,您可以轻松检查其中列出的任何说明。

你的问题的答案是它不是固定的。有时是一次操作、一个周期、一条指令。有时它是一条指令多个时钟周期。有时它是多条指令、多个时钟周期。

编译器通常会针对此类事情进行优化。假设您有一个 16 位寄存器指令集,其中包含交换字节指令和带有立即数但只有单个位移位的 AND 指令。您可能认为移位 8 位需要 8 个移位指令周期,但您可以只交换字节(一条指令),然后将下半部分与零(这可能需要两条指令,或者可能是两个字的可变字长指令,或者它可能编码成一条指令),因此只需要 2 或 3 个指令/时钟周期而不是 8 个。对于 9 位的移位,您可以做同样的事情并添加一个移位,使其需要 9 个时钟而不是 3 或 4 个时钟周期另外,在一些方面。在不同的体系结构中,乘以 256 比移位 8 更快,等等。每个指令集都有其自己的限制和技巧。

甚至大多数指令集都提供多位或大多数限制为单个位。属于“计算机”类别的处理器,如 X86、ARM、PowerPC 和 MIPS,将倾向于一种操作来转移。扩展到所有处理器,但不一定是当今常用的“计算机”,并且它以另一种方式移位,我想说其中更多的是单位而不是多位,因此需要多个操作来执行多位移位。

Some instruction sets are limited to one bit shift per instruction. And some instruction sets allow you to specify any number of bits to shift in one instruction, which usually takes one clock cycle on modern processors (modern being an intentionally vague word). See dan04's answer about a barrel shifter, a circuit that shifts more than one bit in one operation.

It all boils down to the logic algorithm. Each bit in the result is a logic function based on the input. For a single right shift, the algorithm would be something like:

  • If the instruction is [shift right] and bit 1 of the input is 1, then bit 0 of the result is 1, else bit 0 is 0.
  • If the instruction is [shift right], then bit 1 = bit 2.
  • etc.

But the logic equation could just as easily be:

  • If the instruction is [shift right] and the amount operand is 1, then result bit 0 = shifted input bit 1.
  • if the amount is 2 then bit 0 = bit 2.
  • and so on.

The logic gates, being asynchronous, can do all of this in one clock cycle. Yet it is true the single shift allows for a faster clock cycle and less gates to settle, if all you are comparing is these two flavors of an instruction. Or the alternative is making it take longer to settle, so the instruction takes 2 or 3 clocks or whatever, and the logic counts to 3 then latches the result.

The MSP430, for example, only has single bit rotate right instructions (because you can perform a single bit shift or a rotate left with another instruction, which I will leave to the reader to figure out).

The ARM instruction set allows both immediate and register based multi-bit rotates, arithmetic shifts and logical shifts. I think there is only one actual rotate instruction and the other is an alias, because rotate left 1 is the same as a rotate right 32, you only need an one direction barrel shifter to implement a multi bit rotate.

SHL in the x86 allows more than one bit per instruction, but it used to take more than one clock.

and so on, you can easily examine any of the instruction sets out there.

The answer to your question is that it is not fixed. Sometimes it is one operation, one cycle, one instruction. Sometimes it is one instruction multiple clock cycles. Sometimes it is multiple instructions, multiple clock cycles.

The compilers often optimize for these sorts of things. Say you have a 16 bit register instruction set with a swap byte instruction and an AND instruction with immediate, but only a single bit shift. You may think shifting 8 bits would require 8 shift instruction cycles, but you could just swap bytes (one instruction) and then AND the lower half to zeros (which might take two instructions, or might be a variable word length instruction of two words, or it might encode into a single instruction) so it only takes 2 or 3 instruction/clock cycles instead of 8. For a shift of 9 bits, you can do the same thing and add a shift, making it 9 clocks vs 3 or 4. Also, on some architectures, it is faster to multiply by 256 than to shift by 8, etc, etc. Each instruction set has its own limitations and tricks.

It is not even the case that either most instruction sets provide multi bit or most limit to single bit. The processors that fall into the "computer" category, like X86, ARM, PowerPC, and MIPS, would lean toward one operation to shift. Expand to all processors but not necessarily "computers" commonly used today, and it shifts the other way, I would say more of them are single bit than multi bit, so multiple operations are needed to perform a multi-bit shift.

余生共白头 2025-01-08 02:21:56

桶形移位器允许在O(log n)遍中执行移位 - 这可能在同一时钟周期内完成,使得移位操作成为O(1)操作。

A barrel shifter allows the shift to be performed in O(log n) passes — which may be done in the same clock cycle, making shifting an O(1) operation.

-柠檬树下少年和吉他 2025-01-08 02:21:56

如前所述,桶形移位器可以在恒定时间内将操作数移位任意距离。然而,桶形移位器会占用 CPU 芯片上的大量空间,因此并非所有 CPU 设计中都包含它们。

仅举一个众所周知的例子,英特尔奔腾 III 包含桶形移位器,但奔腾 IV 却没有。假设存在桶形移位器,为奔腾 III 编写的代码有时在奔腾 IV 上会减慢很多。我有一些加密代码(包括大量的移位和旋转),在 1.2 GHz Pentium III 上的运行速度比在 2.8 GHz Pentium IV 上快大约 4 倍。

As already noted, a barrel shifter can shift an operand an arbitrary distance in constant time. A barrel shifter, however, consumes a fair amount of space on a CPU die, so they're not included in all CPU designs.

Just for one fairly well known example, the Intel Pentium III included a barrel shifter -- but the Pentium IV did not. Code written for a Pentium III assuming a barrel shifter was present sometimes slowed down quite a bit on a Pentium IV. I had some encryption code (which includes lots of shifting and rotating) that ran about 4 times faster on a 1.2 GHz Pentium III than it did on a 2.8 GHz Pentium IV.

日记撕了你也走了 2025-01-08 02:21:56

实际上当前每个处理器上的位移都是 O(1)。

例如,看一下 x86“shrw”指令。第一个操作数(采用 AT&T 语法)是要移位的位数。
编译器如何实现移位取决于编译器,但是当处理器可以一次性移位 N 位时,将移位放入循环中是愚蠢的。

附录:
回复:“左移 31 是否需要更多操作?”
有不同类型的移位(如果您想知道为什么,请考虑如何处理从寄存器移出的位),但大多数处理器可以执行与 GPR 可以存储的一样多的位的单指令移位。要在 32 位寄存器上进行 40 位移位,需要跨多个寄存器进行移位(假设 64 位数字存储在 2 个 32 位寄存器中),这在我所知道的每个处理器上都需要更多指令。它仍然是 O(1),只是可能不是 1 个时钟。
有趣的是,奔腾 IV 处理器的位移速度非常慢。这很讽刺,因为英特尔历来建议通过位移位的方式优化 ^2 除法和乘法。如果有兴趣,请参阅:此 PDF 和 Google 了解更多信息。

Bit shifting is O(1) on practically every current processor.

Take a look, for example, at the x86 "shrw" instruction. The first operand (in AT&T syntax) is the number of bits to shift.
How a compiler implements shifting is dependent on the compiler, but it would be silly to put shifts in a loop when a processor can shift N bits in one go.

Addendum:
Re: "Do they require more operations to shift left 31?"
There are different kinds of shifts (if you're wondering why, consider what to do with the bits that are shifted off the register), but most processors can do a single-instruction shift of as many bits as the GPR can store. To do a 40-bit shift on a 32-bit register would require shifting across multiple registers (this is assuming a 64-bit number is stored across 2 32-bit registers), which on every processor I know of will require more instructions. It would still be O(1), just probably not 1 clock.
As an interesting side-note, the Pentium IV processor is amazingly slow at bit shifts. This is ironic because Intel has historically recommended optimization of ^2 divides and multiplies by way of bit shifting. See: this PDF and Google for more info if interested.

屋顶上的小猫咪 2025-01-08 02:21:56

咳咳,出于好奇在 C# 中进行了测试,并得到了有趣的结果。

var sw = Stopwatch.StartNew();
long l = 1;
for (long i = 0; i < 20000000; i++) {
    l = l << 60; l = l >> 60;
    l = l << 60; l = l >> 60;
    l = l << 60; l = l >> 60;
    //...
    // 50 of ^them^ total

}
Console.WriteLine(l + " " + sw.Elapsed);

在我的电脑上这需要 1.2 秒。但如果我替换

l = l << 60; l = l >> 60;

l = l << 1; l = l >> 1;

,则时间增加到 2.0 秒。不知道这里进行了什么样的优化,但看起来很奇怪。

Ahem, tested that out of curiosity in c# and got funny results.

var sw = Stopwatch.StartNew();
long l = 1;
for (long i = 0; i < 20000000; i++) {
    l = l << 60; l = l >> 60;
    l = l << 60; l = l >> 60;
    l = l << 60; l = l >> 60;
    //...
    // 50 of ^them^ total

}
Console.WriteLine(l + " " + sw.Elapsed);

That takes 1.2 secs on my PC. But if I replace

l = l << 60; l = l >> 60;

with

l = l << 1; l = l >> 1;

then the time increases to 2.0 secs. Have no idea what kind of optimizations are in play here, but it looks weird.

扮仙女 2025-01-08 02:21:56

对于普通硬件,固定大小的寄存器无论移动多少位置它都是恒定的。

另请注意,这里 O 表示法的使用非常奇怪,您通常会使用它来表示基于要移动的数量而不是要移动的位数的算法复杂性。

For normal hardware, fixed size registers it's constant regardless of how many places you shift.

Also note, that the usage of the O notation is quite weird here, you would normally use it to denote the algorithmic complexity based on the number to shift not the number of places to shift..

荒路情人 2025-01-08 02:21:56

作为具体示例,根据表C-17。 英特尔® 64 和 IA-32 架构优化参考手册的通用说明:

SAL/SAR/SHL/SHR reg, imm   1 cycle latency
SAL/SAR/SHL/SHR reg, cl    1.5 cycles latency

因此,这仍然是一个常数因子,O(1.5) = O(1)。可能存在更简单的微架构作为异常值,但一般来说,O(1)。

As a concrete example, according to Table C-17. General Purpose Instructions of the Intel® 64 and IA-32 Architectures Optimization Reference Manual:

SAL/SAR/SHL/SHR reg, imm   1 cycle latency
SAL/SAR/SHL/SHR reg, cl    1.5 cycles latency

So that's still a constant factor and O(1.5) = O(1). There may be simpler microarchitectures as outliers but in general, O(1).

苏别ゝ 2025-01-08 02:21:56

任何实现,即使需要 n 次操作,也将被视为 O(1)。

您的位数有上限,因此可以移动的位数也有上限。

即使移位需要更多操作,一次操作也需要 X 时间。最大值为 MAX_BITS*X。

因此,最大操作时间是恒定的。

O 符号应该解释时间如何随着任务的大小而增长。

例子:
您在循环中执行随机位移。

如果运行 1000 或 100000 次,则所需的时间严格为 O(n),其中 n 是迭代次数。如果将位移量视为 O(n),则时间循环
应该是 O(n^n),这不是

Any implementation, even if it requires n operation, will be considered O(1).

You have the number of bits capped, so there is a maximum number of bits that can be shifted.

Even if it takes more operations for bit shifts, one operation will require X time. The maximum will be MAX_BITS*X.

So, the maximum time of operation is constant.

O notation should explain how time grows with the size of the task.

Example:
You perform random bit shifts in a loop.

If you run it 1000 or 100000 times, the time you will need will be strictly O(n), where n is the number of iterations. If the bit shift was considered O(n), the time loop
should have taken was O(n^n), which is not

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文