当前位置：文江博客话题详情

汇编语言 - 如何进行取模？

发布于 2024-12-13 14:47:51 字数 32 浏览 2 评论 0原文

x86 汇编中是否有类似模运算符或指令之类的东西？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

作死小能手 2024-12-20 14:47:51

如果您的模数/除数是已知常数，并且您关心性能，请参阅这个和这个。对于直到运行时才知道的循环不变值，乘法逆甚至是可能的，例如参见 https://libdivide.com/< /a> （但是如果没有 JIT code-gen，这比仅对一个常量所需的步骤进行硬编码的效率要低。）

切勿将 div 用于已知的 2 幂：它是很多对于余数，比 和 慢，或者对于除法，比右移慢。查看 C 编译器输出，了解无符号或有符号除以 2 的幂的示例，例如 Godbolt 编译器浏览器。如果您知道运行时输入是 2 的幂，请使用 lea eax, [esi-1] ； 和 eax, edi 或类似的东西来执行 x & (y-1)。 Modulo 256 甚至更加高效：movzx eax, cl 在最新的 Intel CPU 上具有零延迟 (mov-elimination），只要两个寄存器是分开的。

在简单/一般情况下：运行时未知值

DIV指令< /a> （及其对应的 IDIV 用于签名数字）同时给出商和余数。对于无符号，余数和模是同一件事。对于签名的idiv，它为您提供余数（不是模数）可以为负数：
例如-5 / 2 = -2 rem -1。 x86 除法语义与 C99 的 % 运算符完全匹配。

DIV r32 将 EDX:EAX 中的 64 位数字除以 32 位操作数（在任何寄存器或内存中），并将商存储在 EAX code> 和 EDX 中的其余部分。它因商溢出而出错。

无符号 32 位示例（在任何模式下工作）

mov eax, 1234          ; dividend low half
mov edx, 0             ; dividend high half = 0.  prefer  xor edx,edx

mov ebx, 10            ; divisor can be any register or memory

div ebx       ; Divides 1234 by 10.
        ; EDX =   4 = 1234 % 10  remainder
        ; EAX = 123 = 1234 / 10  quotient

在 16 位汇编中，您可以执行 div bx 来除以 DX:AX< 中的 32 位操作数/code> 由 BX 提供。请参阅英特尔的架构软件开发人员手册了解更多信息。

通常始终在无符号 div 之前使用 xor edx,edx 将 EAX 零扩展为 EDX:EAX。 这就是“正常”32 位 / 32 位 => 32 位除法。

对于有符号除法，在 idiv 之前使用 cdq 来有符号-扩展 EAX进入 EDX:EAX。另请参阅为什么在使用 DIV 指令之前 EDX 应为 0 ？。对于其他操作数大小，请使用 cbw (AL->AX)、cwd (AX->DX:AX)、cdq ( EAX->EDX:EAX) 或 cqo (RAX->RDX:RAX) 将上半部分设置为 0 或 -1 根据低半部分的符号位。

div / idiv 可用于 8、16、32 和（在 64 位模式下）64 位的操作数大小。在当前的 Intel CPU 上，64 位操作数大小比 32 位或更小的速度慢得多，但 AMD CPU 只关心数字的实际大小，而不考虑操作数大小。

请注意，8 位操作数大小很特殊：隐式输入/输出位于 AH:AL（又名 AX）中，而不是 DL:AL 中。请参阅 DOSBox 上的 8086 程序集： idiv 指令的错误？< /a> 为例。

有符号 64 位除法示例（需要 64 位模式）

   mov    rax,  0x8000000000000000   ; INT64_MIN = -9223372036854775808
   mov    ecx,  10           ; implicit zero-extension is fine for positive numbers

   cqo                       ; sign-extend into RDX, in this case = -1 = 0xFF...FF
   idiv   rcx
       ; quotient  = RAX = -922337203685477580 = 0xf333333333333334
       ; remainder = RDX = -8                  = 0xfffffffffffffff8

限制/常见错误

div dword 10 无法编码为机器代码（因此您的汇编器将报告有关无效操作数的错误）。

与 mul/imul 不同（您通常应该使用更快的 2 操作数 imul r32、r/m32 或 3 操作数 imul r32、r/m32、imm8/32 而不是浪费时间编写高半结果），没有更新的操作码用于除以立即数或 32 位/32 位=>没有高半被除数输入的 32 位除法或余数。

除法是如此缓慢并且（希望如此）罕见，以至于他们没有费心添加一种方法来让您避免 EAX 和 EDX，或者直接使用立即数。

如果商不适合一个寄存器，div 和 idiv 将出错（AL / AX / EAX / RAX，与被除数的宽度相同）。这包括除以零，但也适用于非零 EDX 和较小的除数。这就是为什么 C 编译器只是进行零扩展或符号扩展，而不是将 32 位值拆分为 DX:AX。

还有为什么 INT_MIN / -1 是 C 未定义行为：它会溢出 2 的补码系统（如 x86）上的有符号商。请参阅为什么整数除以 -1（负一）导致 FPE？作为 x86 与 ARM 的示例。在这种情况下，x86 idiv 确实会出错。

x86 异常是#DE - 除法异常。在 Unix/Linux 系统上，内核向导致 #DE 异常的进程传递 SIGFPE 算术异常信号。 (在哪些平台上执行整数除以零会触发浮点异常吗？）

对于 div，使用 high_half high_half 的被除数除数是安全的。例如，0x11:23 / 0x12 小于 0xff，因此它适合 8 位商。

通过使用一个块的余数作为下一个块的上半除数 (EDX)，可以实现大数除以小数的扩展精度除法。这可能就是为什么他们选择余数 = EDX 商 = EAX，而不是相反。

If your modulus / divisor is a known constant, and you care about performance, see this and this. A multiplicative inverse is even possible for loop-invariant values that aren't known until runtime, e.g. see https://libdivide.com/ (But without JIT code-gen, that's less efficient than hard-coding just the steps necessary for one constant.)

Never use div for known powers of 2: it's much slower than and for remainder, or right-shift for divide. Look at C compiler output for examples of unsigned or signed division by powers of 2, e.g. on the Godbolt compiler explorer. If you know a runtime input is a power of 2, use lea eax, [esi-1] ; and eax, edi or something like that to do x & (y-1). Modulo 256 is even more efficient: movzx eax, cl has zero latency on recent Intel CPUs (mov-elimination), as long as the two registers are separate.

In the simple/general case: unknown value at runtime

The DIV instruction (and its counterpart IDIV for signed numbers) gives both the quotient and remainder. For unsigned, remainder and modulus are the same thing. For signed idiv, it gives you the remainder (not modulus) which can be negative:
e.g. -5 / 2 = -2 rem -1. x86 division semantics exactly match C99's % operator.

DIV r32 divides a 64-bit number in EDX:EAX by a 32-bit operand (in any register or memory) and stores the quotient in EAX and the remainder in EDX. It faults on overflow of the quotient.

Unsigned 32-bit example (works in any mode)

mov eax, 1234          ; dividend low half
mov edx, 0             ; dividend high half = 0.  prefer  xor edx,edx

mov ebx, 10            ; divisor can be any register or memory

div ebx       ; Divides 1234 by 10.
        ; EDX =   4 = 1234 % 10  remainder
        ; EAX = 123 = 1234 / 10  quotient

In 16-bit assembly you can do div bx to divide a 32-bit operand in DX:AX by BX. See Intel's Architectures Software Developer’s Manuals for more information.

Normally always use xor edx,edx before unsigned div to zero-extend EAX into EDX:EAX. This is how you do "normal" 32-bit / 32-bit => 32-bit division.

For signed division, use cdq before idiv to sign-extend EAX into EDX:EAX. See also Why should EDX be 0 before using the DIV instruction?. For other operand-sizes, use cbw (AL->AX), cwd (AX->DX:AX), cdq (EAX->EDX:EAX), or cqo (RAX->RDX:RAX) to set the top half to 0 or -1 according to the sign bit of the low half.

div / idiv are available in operand-sizes of 8, 16, 32, and (in 64-bit mode) 64-bit. 64-bit operand-size is much slower than 32-bit or smaller on current Intel CPUs, but AMD CPUs only care about the actual magnitude of the numbers, regardless of operand-size.

Note that 8-bit operand-size is special: the implicit inputs/outputs are in AH:AL (aka AX), not DL:AL. See 8086 assembly on DOSBox: Bug with idiv instruction? for an example.

Signed 64-bit division example (requires 64-bit mode)

   mov    rax,  0x8000000000000000   ; INT64_MIN = -9223372036854775808
   mov    ecx,  10           ; implicit zero-extension is fine for positive numbers

   cqo                       ; sign-extend into RDX, in this case = -1 = 0xFF...FF
   idiv   rcx
       ; quotient  = RAX = -922337203685477580 = 0xf333333333333334
       ; remainder = RDX = -8                  = 0xfffffffffffffff8

Limitations / common mistakes

div dword 10 is not encodeable into machine code (so your assembler will report an error about invalid operands).

Unlike with mul/imul (where you should normally use faster 2-operand imul r32, r/m32 or 3-operand imul r32, r/m32, imm8/32 instead that don't waste time writing a high-half result), there is no newer opcode for division by an immediate, or 32-bit/32-bit => 32-bit division or remainder without the high-half dividend input.

Division is so slow and (hopefully) rare that they didn't bother to add a way to let you avoid EAX and EDX, or to use an immediate directly.

div and idiv will fault if the quotient doesn't fit into one register (AL / AX / EAX / RAX, the same width as the dividend). This includes division by zero, but will also happen with a non-zero EDX and a smaller divisor. This is why C compilers just zero-extend or sign-extend instead of splitting up a 32-bit value into DX:AX.

And also why INT_MIN / -1 is C undefined behaviour: it overflows the signed quotient on 2's complement systems like x86. See Why does integer division by -1 (negative one) result in FPE? for an example of x86 vs. ARM. x86 idiv does indeed fault in this case.

The x86 exception is #DE - divide exception. On Unix/Linux systems, the kernel delivers a SIGFPE arithmetic exception signal to processes that cause a #DE exception. (On which platforms does integer divide by zero trigger a floating point exception?)

For div, using a dividend with high_half < divisor is safe. e.g. 0x11:23 / 0x12 is less than 0xff so it fits in an 8-bit quotient.

Extended-precision division of a huge number by a small number can be implemented by using the remainder from one chunk as the high-half dividend (EDX) for the next chunk. This is probably why they chose remainder=EDX quotient=EAX instead of the other way around.

回复收藏 0 原文