当前位置：文江博客话题详情

为什么IMUL用于乘以无符号数字？

发布于 2025-01-25 08:34:12 字数 575 浏览 2 评论 0 原文

我编制了以下程序：

#include <stdint.h>

uint64_t usquare(uint32_t x) {
  return (uint64_t)x * (uint64_t)x;
}

此分配为：

 0: 89 f8                   mov    eax,edi
 2: 48 0f af c0             imul   rax,rax
 6: c3                      ret

但是 imul 是乘以签名数字的指令。为什么使用 GCC 使用它？

/编辑：使用 uint64_t 时，汇编相似：

0:  48 0f af ff             imul   rdi,rdi
4:  48 89 f8                mov    rax,rdi
7:  c3                      ret

原文

I compiled the following program:

#include <stdint.h>

uint64_t usquare(uint32_t x) {
  return (uint64_t)x * (uint64_t)x;
}

This disassembles to:

 0: 89 f8                   mov    eax,edi
 2: 48 0f af c0             imul   rax,rax
 6: c3                      ret

But imul is the instruction for multiplying signed numbers. Why is it used by gcc then?

/edit: when using uint64_t the assembly is similar:

0:  48 0f af ff             imul   rdi,rdi
4:  48 89 f8                mov    rax,rdi
7:  c3                      ret

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

度的依靠╰つ 2025-02-01 08:34:12

TL：DR：因为当我们不关心高半时，这是获得正确结果的一种方法（即输出仅与2个输入一样宽）。以及更灵活的寄存器分配，而不是强制使用RAX和RDX。

如果不适合此使用，英特尔可能还会添加 mul 的两动作版本。但这不是必需的，正如该答案所解释的那样。

警告这个答案很长！

...而且它充满了不需要的解释 - 但是我一直想写一些关于乘法更长的东西。

在乘以两个数字 a 和 b 的长度 n 的结果

时，结果是长度为2 n < sup>†，最重要的是， k - 数字仅取决于最低 k 数字（给出证明在附录A中。

x86 `imul`

x86乘法指令 imul 有两种形式：完整形式和部分形式。

第一种形式是类型 n × n →2 n ，这意味着它产生的结果是操作数的两倍 - 我们知道从为什么这有意义的理论中。
例如，

imul ax         ;16x16->32, Result is dx:ax
imul rax        ;64x64->128, Result is rdx:rax

第二种形式是 n × n → n ，这必然会删除一些信息。
特别是，此形式仅采用结果的下部 n 。

imul ax, ax          ;16x16->16, Lower WORD of the result is ax
imul rax, rax        ;64x64->64, Lower QWORD of the result is rax

仅单操作版本是第一种形式。

（还有一个3-动作表格， imul R64，r/m64，imm8/32 ，它允许您通过一个指令中的常数复制和杂音。在任何地方都不会写高半，因此我们可以将其视为等同于 imul R64，R/M64 dst *= src 表格。）

两个说明：<代码> imul vs `mul` mul

不管使用的表格如何，处理器始终用操作数的尺寸两倍（即第一个形式）计算结果。
为了能够做到这一点，操作数首先从其大小 n 转换为2 n （例如64位到128位）。
有关此的更多信息，请参见附录B。

乘法完成了，完整或部分结果存储在目的地中。

imul 和 mul 之间的区别在于操作数的转换方式。
由于大小扩展，因此这种特定类型的转换称为 Extension 。

mul 指令简单地用零填充上部 - 零扩展。
imul 指令复制高阶位（左侧第一个） - 这称为符号扩展名，它具有转换a n 位的数量（即它做正确的事情，读者留给读者以找到零扩展情况的反例）。

     How mul extends              How imul extends       
       an operand                   an operand

     +----+       +----+          +----+       +----+
     |0...|       |1...|          |0...|       |1...|
     +----+       +----+          +----+       +----+  

+----+----+  +----+----+     +----+----+  +----+----+
|0000|0...|  |0000|1...|     |0000|0...|  |1111|1...|
+----+----+  +----+----+     +----+----+  +----+----+

论文

imul 和 mul 仅在（ n +1）中明显出现。
对于32位操作数，这意味着最终只有完整结果的上部32位部分将有所不同。

这很容易看出，因为这两个指示都相同，而且从理论中可以知道，结果的第一个 n 仅取决于第一个<操作数的em> n 。

因此，论文： imul 的部分形式的结果与 mul 的结果相同。

那么，为什么 IMUL 存在？

原始8086只有 mul 和 imul 的单场版本。后来的X86版本仅添加了更灵活的两个和三个操作数版本的 imul ，该版本旨在用于不希望双宽结果的常见用例。

他们只编写一个输出寄存器，对于现代X86，这意味着他们可以解码为一个UOP： https：//agner.org/优化/。（在现代X86微体系结构中，每个UOP最多可以写1个寄存器。）One-Operand Imul R32 是Intel CPU上的3个UOPS：大概一个是乘坐的，另一个将64位产品分为2一半并写下一半，而另一半则在高半上做同样的事情。 IMUL R64 是2个UOPS；据推测，128位的结果来自已经分成64位的一半乘数。

mul 仍然只存在于非常古老的单开启形式中，固定寄存器是界面的一部分。

imul 根据签名的乘法 - cf 和设置标志：部分结果的符号扩展与完整结果不同），例如在溢出的情况下。这也是为什么两个和三个操作数表单不称为 mul ，否则这将是一个非常适合的名称。

做法

在实践中测试所有这些的我们可以问一个编译器^{[ live ]}对于以下程序的组装

#include <stdint.h>

uint64_t foo(uint32_t a)
{
    return a*(uint64_t)a;
}

，我们知道对于64位目标，生成的代码 imul ，因为A unint64_t 适合寄存器，因此适用于64×64→64乘法可作为 imul＆lt; reg64＆gt; ,, reg64＆gt;

foo(unsigned int):
        mov     eax, edi        ;edi = a
        imul    rax, rax        ;64x64->64
        ret

在32位代码中使用，使用 imul 。
imul＆lt; reg32＆gt; 或 imul＆lt; reg32＆gt; reg32＆gt;，gt; reg32＆gt; 是必不可少的>结果！完整的签名结果通常不等于完整的 unsigned 结果。
实际上，编译器恢复回到 mul ：

foo(unsigned int):
        mov     eax, DWORD PTR [esp+4]
        mul     eax
        ret

附录A

不会损失一般性，我们可以假设基本2，并且数字是 n + 1位长（因此索引从0到 n ） - 然后

c = a·b = ∑ _{i = 0..n}（a _i·2 ⁱ）··∑ _{j = 0..n}（b _j·2 ^J）=）=
∑ _{i = 0..n} [a _i· 2 ^i+j）]（通过分布属性），

我们看到结果的 k - th数字是所有附录的总和，以便 i + j = k 加上最终携带的

c _k = ∑ _{i，j = 0..n; i+j = k} a _i·b

术语c k 是携带，随着它传播到更高的位，它仅取决于较低的位。
第二项不能具有A _i或B _J，而 i 或 j ＆gt; k好像第一个是true，那么 i = k + e ，对于正，non null， e 因此E = - e
但是 j 不能负面！
第二种情况相似，并留给读者。

附录B

正如Beeonrope在评论中指出的那样，如果仅需要部分结果，则处理器可能不会计算完整的结果。

您可能意味着这只是一种概念上思考的方式。当您使用64x64 - ＆gt;时，处理器不一定会完成全部128位乘法。 64形式。实际上，截断的表格仅在最近的英特尔上使用1个UOP，但是完整的表格需要2个UOP，因此正在完成一些额外的工作

另外，符号扩展也可能在概念上也是如此。

同样，符号扩展可能“从概念上”发生，但可能不在硬件中。他们不会仅仅为了执行符号或零扩展名而拥有额外的电线和晶体管，这会给已经很大的乘数增添大量，但会使用其他一些技巧来完成乘法“仿佛”。

< sup>†长度 n 的二进制数量为2 ⁿ，因此两个这样的数字的乘法是按数量级的顺序2 ⁿ·2 ⁿ = 2 ^{n + n} = 2 ^{2 n}。就像多个长度2 n 一样。

TL:DR: because it's a faster way of getting the correct result when we don't care about the high half (i.e. the output is only as wide as the 2 inputs). And more flexible register-allocation instead of forced use of RAX and RDX.

If it wasn't usable for this, Intel probably would have added two-operand versions of mul as well. But that wasn't necessary, as this answer explains.

WARNING This answer is long!

... and it's full of unneeded explanations - but I have always wanted to write something more lengthy about the multiplication.

A bit of theory

When multiplying two numbers a and b of length n the result is of length 2 n^† and, most importantly, the k-th digit only depends on the lowest k digits (a proof is given in Appendix A).

x86 `imul`'s two forms

The x86 multiplication instruction imul comes in two form: the full form and the partial form.

The first form is of the kind n×n→2 n, meaning that it produces a result twice the size of the operands - we know from the theory why this makes sense.
For example

imul ax         ;16x16->32, Result is dx:ax
imul rax        ;64x64->128, Result is rdx:rax

The second form is of the kind n×n→n, this necessarily cuts out some information.
Particularly, this form takes only the lower n bits of the result.

imul ax, ax          ;16x16->16, Lower WORD of the result is ax
imul rax, rax        ;64x64->64, Lower QWORD of the result is rax

Only the single-operand version is of the first form.

(There's also a 3-operand form, imul r64, r/m64, imm8/32, which allows you to copy-and-multiply by a constant in one instruction. It has no implicit operands and again doesn't write the high half anywhere so we can just treat it as equivalent to the imul r64, r/m64 dst *= src form.)

The two instructions: `imul` vs `mul`

Regardless of the form used, the processor always calculates the result with a size twice the operands' (i.e. like the first form).
In order to be able to do that, the operands are first converted from their size n to size 2 n (e.g. from 64 to 128 bits).
See Appendix B for more on this.

The multiplication is done and the full, or partial, result is stored in the destination.

The difference between imul and mul is in how the operands are converted.
Since the size is extended, this particular type of conversion is called extension.

The mul instruction simply fills the upper part with zeros - it zero extends.
The imul instructions replicate the high-order bit (the first from the left) - this is called sign extension and it has the interesting property of transforming a two's complement signed number of n bits into a signed number of 2 n bits with the same sign and modulus (i.e. it does the right thing, it is left to the reader to find a counter-example for the zero-extension case).

     How mul extends              How imul extends       
       an operand                   an operand

     +----+       +----+          +----+       +----+
     |0...|       |1...|          |0...|       |1...|
     +----+       +----+          +----+       +----+  

+----+----+  +----+----+     +----+----+  +----+----+
|0000|0...|  |0000|1...|     |0000|0...|  |1111|1...|
+----+----+  +----+----+     +----+----+  +----+----+

The thesis

The difference between imul and mul is noticeable only from the (n+1)-th bit onward.
For a 32-bit operand, it means that only the upper 32-bit part of the full result will eventually be different.

This is easy to see as the lower n bits are the same for both instructions and as we know from the theory the first n bits of the result only depend on the first n bits of the operands.

Thus the thesis: The result of the partial form of imul is identical to that of mul.

Then why does imul exist?

Original 8086 only had one-operand versions of mul and imul. Later versions of x86 added more flexible two and three operand versions of imul only, intended for the common use-case where you don't want the double-width result.

They only write one output register, which for modern x86 means they can decode to a single uop: https://agner.org/optimize/. (In modern x86 microarchitectures, each uop can write at most 1 register.) One-operand imul r32 is 3 uops on Intel CPUs: presumably one to multiply, another to split the 64-bit product into 2 halves and write the low half, and another to do the same for the high half. imul r64 is 2 uops; presumably the 128-bit result comes out of the multiplier already split in 64-bit halves.

mul still only exists in the very ancient one-operand form with fixed registers as part of the interface.

imul sets the flags according to a signed multiplication - CF and OF are set if the partial result has discarded any significant information (the technical condition being: the sign extension of the partial result is different from the full result) such as in case of overflow. This is also why the two and three operand forms are not called mul, which otherwise would have been a perfectly fit name.

The practice

To test all this in practice we can ask a compiler^[live] for the assembly of the following program

#include <stdint.h>

uint64_t foo(uint32_t a)
{
    return a*(uint64_t)a;
}

While we know that for 64-bit target the code generated uses imul because a unint64_t fits a register and thus a 64×64→64 multiplication is available as imul <reg64>, <reg64>

foo(unsigned int):
        mov     eax, edi        ;edi = a
        imul    rax, rax        ;64x64->64
        ret

in 32-bit code there is no such multiplication using imul.
An imul <reg32> or imul <reg32>, <reg32>, <reg32> is necessary but that would produce a full result! And a full signed result is not generally equal to a full unsigned result.
In fact, the compiler reverts back to mul:

foo(unsigned int):
        mov     eax, DWORD PTR [esp+4]
        mul     eax
        ret

Appendix A

Without loss of generality, we can assume base 2 and that the numbers are n + 1 bits long (so that the indices run from 0 to n) - then

c = a·b = ∑_i=0..n (a_i·2ⁱ) · ∑_j=0..n(b_j·2^j) =
∑_i=0..n [a_i·∑_j=0..n (b_j·2^i+j)] (by the distributive property)

We see that the k-th digit of the result is the sum of all the addends such that i + j = k plus an eventual carry

c_k = ∑_{i,j=0..n; i+j=k} a_i·b_j·2^i+j + C_k

The term C_k is the carry and, as it propagates towards higher bits, it depends only on the lower bits.
The second term cannot have a a_i or b_j with i or j > k as if the first were true then i = k + e, for a positive, non null, e and thus j = k - i = k - k -e = -e
But j cannot be negative!
The second case is similar and left to the reader.

Appendix B

As BeeOnRope pointed out in the comments the processor probably doesn't compute a full result if only the partial result is needed.

You probably means that this is only a way of thinking about it, conceptually. The processor does not necessarily do a full 128-bit multiplication when you use the 64x64 -> 64 form. Indeed, the truncated form takes only 1 uop on recent Intel, but the full form takes 2 uops, so some extra work is being done

Comment from BeeOnRope

Also, the sign extension is probably conceptually too.

Similarly the sign extension may happens "conceptually", but probably not in hardware. They won't have the extra wires and transistors just to do the sign or zero extension, which would add a lot of bulk to an already huge multiplier, but will use some other tricks to do the multiplication "as if" that had happened.

Comment from BeeOnRope

^† Binary numbers of length n are in the order of magnitude of 2ⁿ, thus the multiplication of two such numbers is in the order of magnitude 2ⁿ · 2ⁿ = 2ⁿ⁺ⁿ = 2^{2 n}. Just like a number of length 2 n.

回复收藏 0 原文

纸伞微斜 2025-02-01 08:34:12

#include <stdint.h>

uint64_t fun0 ( uint32_t x )
{
    return (uint64_t)x * (uint64_t)x;
}
uint64_t fun1 ( uint32_t x )
{
    return ((uint64_t)x) * ((uint64_t)x);
}
uint64_t fun2 ( uint64_t x )
{
    return (x * x);
}



0000000000000000 <fun0>:
   0:   89 f8                   mov    %edi,%eax
   2:   48 0f af c0             imul   %rax,%rax
   6:   c3                      retq   
   7:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
   e:   00 00 

0000000000000010 <fun1>:
  10:   89 f8                   mov    %edi,%eax
  12:   48 0f af c0             imul   %rax,%rax
  16:   c3                      retq   
  17:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  1e:   00 00 

0000000000000020 <fun2>:
  20:   48 89 f8                mov    %rdi,%rax
  23:   48 0f af c7             imul   %rdi,%rax
  27:   c3                      retq

符号

即使您指定了所有64位未签名，它也会生成相同的结果

0x00FF * 0x00FF = 0xFE01
0xFFFF * 0xFFFF = 0xFFFE0001
so
0xFF * 0xFF = 0x01

扩展名，对于较低的64位来说并不重要，因此您可以使用IMUL进行8、16、32和64位操作数签名或未签名。

#include <stdint.h>

uint64_t fun0 ( uint32_t x )
{
    return (uint64_t)x * (uint64_t)x;
}
uint64_t fun1 ( uint32_t x )
{
    return ((uint64_t)x) * ((uint64_t)x);
}
uint64_t fun2 ( uint64_t x )
{
    return (x * x);
}



0000000000000000 <fun0>:
   0:   89 f8                   mov    %edi,%eax
   2:   48 0f af c0             imul   %rax,%rax
   6:   c3                      retq   
   7:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
   e:   00 00 

0000000000000010 <fun1>:
  10:   89 f8                   mov    %edi,%eax
  12:   48 0f af c0             imul   %rax,%rax
  16:   c3                      retq   
  17:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  1e:   00 00 

0000000000000020 <fun2>:
  20:   48 89 f8                mov    %rdi,%rax
  23:   48 0f af c7             imul   %rdi,%rax
  27:   c3                      retq

EDIT

even if you specify all 64 bit unsigned it generates the same result

0x00FF * 0x00FF = 0xFE01
0xFFFF * 0xFFFF = 0xFFFE0001
so
0xFF * 0xFF = 0x01

sign extension doesnt matter for the lower 64 bits so you can use imul for 8, 16, 32 and 64 bit operands signed or unsigned.

回复收藏 0 原文

~没有更多了~

关于作者

风筝有风，海豚有海

暂无简介

文章

29 人气

关注发私信

友情链接

文江博客

为什么IMUL用于乘以无符号数字？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

在乘以两个数字 a 和 b 的长度 n 的结果

x86 `imul`

两个说明：<代码> imul vs `mul` mul

论文

做法

附录A

附录B

A bit of theory

x86 `imul`'s two forms

The two instructions: `imul` vs `mul`

The thesis

The practice

Appendix A

Appendix B

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

为什么IMUL用于乘以无符号数字？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

在乘以两个数字 a 和 b 的长度 n 的结果

x86 imul

两个说明：<代码> imul vs mul mul

论文

做法

附录A

附录B

A bit of theory

x86 imul's two forms

The two instructions: imul vs mul

The thesis

The practice

Appendix A

Appendix B

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

x86 `imul`

两个说明：<代码> imul vs `mul` mul

x86 `imul`'s two forms

The two instructions: `imul` vs `mul`