理解 mul 和 mul 的问题汇编语言的imul指令

发布于 2024-08-15 23:03:57 字数 1411 浏览 10 评论 0原文

我正在从 paul caurter 的 PC Assembly 学习 80386

  mul source

如果操作数是字节大小，则将其乘以 AL 中的字节注册，结果存储在 AX的16位。

美好的。

如果源是 16 位，则将其乘以 AX 中的字和 32位结果存储在DX:AX中。

问题1：为什么选择DX:AX？为什么不能存储在EAX/EDX中？

imul 确实令人困惑

imul dest, source1
imul dest, source1, source2

我在理解表格时遇到问题。

Q2：在表的第二个条目中。再说一遍，为什么是 DX:AX。为什么不是 EAX 或 EDX？

现在考虑以下代码片段：

imul eax ; edx:eax = eax * eax
mov ebx, eax ; save answer in ebx
mov eax, square_msg ; square_msg db "Square of input is ", 0
call print_string ; prints the string eax
mov eax, ebx 
call print_int ;  prints the int stored in eax
call print_nl ; prints new line

Q3：前面说过，符号 EDX:EAX 意味着将 EDX 和 EAX 寄存器视为一个 64 位寄存器，其上位 32 位在 EDX 中，低位在 EAX 中。 所以答案也存储在 edx 中，对吧？在上面的代码中我们没有考虑任何 EDX 我们只是指 EAX 这怎么还有效？

Q4：我对表格中的其余所有条目有疑问。两个 n 位数字（n = 8/16/32 位）的最坏情况乘法结果是 2n 位。为什么它会将两个16/32位乘法结果存储在本身大小相同的寄存器中？

原文

I'm learning 80386 from PC Assembly by paul caurter

  mul source

If the operand is byte sized, it is multiplied by the byte in the AL
register and the result is stored in
the 16 bits of AX.

fine.

If the source is 16-bit, it is multiplied by the word in AX and the
32-bit result is stored in DX:AX.

Q1: Why DX:AX ? Why can't it store in EAX / EDX?

imul is really confusing

imul dest, source1
imul dest, source1, source2

I've problem in understanding the table.

Q2: in the 2nd entry of the table. Again, why DX:AX. Why not EAX or EDX?

Now consider following code snippet:

imul eax ; edx:eax = eax * eax
mov ebx, eax ; save answer in ebx
mov eax, square_msg ; square_msg db "Square of input is ", 0
call print_string ; prints the string eax
mov eax, ebx 
call print_int ;  prints the int stored in eax
call print_nl ; prints new line

Q3: Its previsously said that The notation EDX:EAX means to think of the EDX and EAX registers as one 64 bit register with the upper 32 bits in EDX and the lower bits in EAX. So the answer is also stored in edx, right? in the above code we didn't consider any EDX we are just referring to EAX
How is this still working?

Q4: I've problem with rest of all entries in the table. worst case multiplication result of two n bit numbers(n = 8/16/32 bits) is 2n bits. How come its storing the result of two 16/32 bit multiplication result in register of same size itself?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

把回忆走一遍 2024-08-22 23:03:57

imul 指令有很多不同的变体。

您偶然发现的变体是 16 位乘法。它将 AX 寄存器与您作为参数传递给 imul 的任何内容相乘，并将结果存储在 DX:AX 中。

一种 32 位变体的工作方式类似于 16 位乘法，但将寄存器写入 EDX:EAX。要使用此变体，您只需使用 32 位源操作数即可。

例如：

  ; a 16 bit multiplication:
  mov ax, [factor1]
  mov bx, [factor2]
  imul bx              ; 32-bit result in DX:AX
  ; or  imul  word [factor2]

  ; a 32 bit multiplication:
  mov eax, [factor1]
  mov ebx, [factor2] 
  imul ebx             ; 64-bit result in EDX:EAX

在 386 或更高版本上，您还可以编写 imul< /a> 以两个操作数形式。这使得它更加灵活且更易于使用。在此变体中，您可以自由选择任意 2 个寄存器作为源和目标，CPU 不会浪费时间在任何地方写入高半结果。并且不会破坏 EDX。

  mov   ecx, [factor1]
  imul  ecx, [factor2]    ; result in ecx, no other registers affected
  imul  ecx, ecx          ; and square the result

或者使用带符号的 16 位输入来匹配您的 imul。（对无符号输入使用 movzx）

  movsx   ecx, word [factor1]
  movsx   eax, word [factor2]  ; sign-extend inputs to 32-bit
  imul    eax, ecx             ; 32-bit multiply, result in EAX
  imul    eax, eax             ; and square the result

imul 的这个变体是随 386 引入，并且是提供 16 位和 32 位操作数大小。（以及 64 位模式下的 64 位操作数大小）。

在 32 位代码中，您始终可以假设有 386 条指令（例如 imul reg, reg/mem）可用，但如果您不关心较旧的 CPU，则可以在 16 位代码中使用它。

186 引入了 3 操作数立即数形式。

imul  cx, bx, 123        ; requires 186

imul  ecx, ebx, 123      ; requires 386

There are lots of different variations of the imul instruction.

The variant you've stumbled upon is a 16 bit multiplication. It multiplies the AX register with whatever you pass as the argument to imul and stores the result in DX:AX.

One 32 bit variant works like the 16 bit multiplication but writes the register into EDX:EAX. To use this variant all you have to do is to use a 32 bit source operand.

E.g:

  ; a 16 bit multiplication:
  mov ax, [factor1]
  mov bx, [factor2]
  imul bx              ; 32-bit result in DX:AX
  ; or  imul  word [factor2]

  ; a 32 bit multiplication:
  mov eax, [factor1]
  mov ebx, [factor2] 
  imul ebx             ; 64-bit result in EDX:EAX

On a 386 or later, you can also write an imul in the two operand form. That makes it much more flexible and easier to work with. In this variant you can freely choose any 2 registers as the source and destination, and the CPU won't waste time writing a high-half result anywhere. And won't destroy EDX.

  mov   ecx, [factor1]
  imul  ecx, [factor2]    ; result in ecx, no other registers affected
  imul  ecx, ecx          ; and square the result

Or for signed 16-bit inputs to match your imul. (use movzx for unsigned inputs)

  movsx   ecx, word [factor1]
  movsx   eax, word [factor2]  ; sign-extend inputs to 32-bit
  imul    eax, ecx             ; 32-bit multiply, result in EAX
  imul    eax, eax             ; and square the result

This variant of imul was introduced with 386, and is available in 16 and 32-bit operand-size. (And 64-bit operand-size in 64-bit mode).

In 32-bit code you can always assume that 386 instructions like imul reg, reg/mem are available, but you can use it in 16 bit code if you don't care about older CPUs.

186 introduced a 3-operand immediate form.

imul  cx, bx, 123        ; requires 186

imul  ecx, ebx, 123      ; requires 386

回复收藏 0 原文

白云不回头 2024-08-22 23:03:57

第一季度/第二季度：为什么选择 DX:AX？为什么不能存储在 EAX / EDX 中？

正如其他人所说，这只是为了向后兼容性。原始的(i)mul指令来自16位x86，它在32位x86指令集出现之前就已经出现了很久，因此它们无法将结果存储到eax/edx，因为没有电子注册。

Q3：在上面的代码中，我们没有考虑任何 EDX，我们只是引用 EAX 这仍然如何工作？

您输入的值很小，不会导致结果溢出，因此您看不到差异。如果您使用足够大的值（>= 16 位），您将看到 EDX != 0 并且打印结果将不正确。

Q4：为什么将两个16/32位乘法结果存储在本身大小相同的寄存器中？

这并不是说结果仍然与操作数相同。将两个 n 位值相乘始终会生成 2n 位值。但在 imul r16、r/m16[, imm8/16] 及其 32/64 位对应项中，高 n 位结果被丢弃。当您只需要结果的低 16/32/64 位时使用它们（即非扩大乘法），或者当您可以确保结果不会溢出时。

双操作数形式 — 在这种形式中，目标操作数（第一个操作数）乘以源操作数（第二个操作数）。目标操作数是通用寄存器，源操作数是立即值、通用寄存器或内存位置。 中间结果（输入操作数大小的两倍）被截断并存储在目标操作数位置。
[...与三操作数形式相同]
https://www.felixcloutier.com/x86/IMUL.html< /p>

现代编译器现在几乎只使用多操作数 imul 对于有符号和无符号乘法，因为

对于这两种情况，较低位始终相同，在 C 中，两个变量相乘会生成相同大小的结果 (intxint→int, long xlong→ long...) 非常适合 imul 的操作数。强制编译器发出单操作数mul或imul的唯一方法是使用两倍于寄存器大小的类型
乘法的结果比寄存器大小宽的情况非常罕见，例如 int64_t a; __int128_t p = (__int128_t)a * b; 因此很少需要单操作数 (i)mul
仅计算较低位会比获得整个结果更快。
由于各种形式的 imul 指令，使用起来更加灵活
- 在 2 操作数形式中，您不需要保存/恢复 EDX 和 EAX
- 3 操作数形式还允许您进行非破坏性乘法
现代 CPU 通常针对 imul 的多操作数版本进行优化（因为现在的现代编译器几乎专门使用多操作数 imul code> 对于有符号和无符号乘法），因此它们会比单操作数(i)mul 更快

Q1/Q2: Why DX:AX ? Why can't it store in EAX / EDX?

Like others said, that's just for backward compatibility. The original (i)mul instructions are from 16-bit x86 which had come long before the 32-bit x86 instruction set appeared, so they couldn't store the result to the eax/edx since there was no E-register.

Q3: in the above code we didn't consider any EDX we are just referring to EAX How is this still working?

You've entered small values that don't cause the result to overflow so you didn't see the differences. If you use big enough values (>= 16 bits) you'll see that EDX != 0 and the printed result will be incorrect.

Q4: How come its storing the result of two 16/32 bit multiplication result in register of same size itself?

It's not that the result is still the same size as the operands. Multiplying two n-bit values always produces a 2n-bit value. But in imul r16, r/m16[, imm8/16] and their 32/64-bit counterparts the high n-bit results are discarded. They're used when you only need the lower 16/32/64 bits of the result (i.e. non-widening multiplication), or when you can ensure that the result does not overflow.

Two-operand form — With this form the destination operand (the first operand) is multiplied by the source operand (second operand). The destination operand is a general-purpose register and the source operand is an immediate value, a general-purpose register, or a memory location. The intermediate product (twice the size of the input operand) is truncated and stored in the destination operand location.
[... Same for Three-operand form]
https://www.felixcloutier.com/x86/IMUL.html

Modern compilers nowadays almost exclusively use the multi-operand imul for both signed and unsigned multiplications because

The lower bits are always the same for both cases, and in C multiplying two variables generates a same size result (intxint→int, longxlong→ long...) which fit imul's operands nicely. The only way to force the compilers to emit single-operand mul or imul is using a type twice the register size
It's very uncommon to see a multiplication where the result is wider than the register size like int64_t a; __int128_t p = (__int128_t)a * b; so single-operand (i)mul is rarely needed
Calculating only the lower bits will be faster than getting the whole result.
Much more flexibility in usage due to various forms of imul instruction
- In the 2-operand form you don't need to save/restore EDX and EAX
- The 3-operand form further allows you to do non-destructive multiplication
Modern CPUs often optimize for the multi-operand versions of imul (because modern compilers nowadays almost exclusively use the multi-operand imul for both signed and unsigned multiplications) so they'll be faster than single-operand (i)mul