求模 (%) 的 GCC 实现是如何工作的，为什么它不使用 div 指令？

发布于 2024-10-06 07:59:40 字数 554 浏览 2 评论 0原文

我试图弄清楚如何在汇编中计算模 10，因此我在 gcc 中编译了以下 c 代码，看看它会产生什么结果。

unsigned int i=999;
unsigned int j=i%10;

令我惊讶的是，我得到了

movl    -4(%ebp), %ecx
movl    $-858993459, %edx
movl    %ecx, %eax
mull    %edx
shrl    $3, %edx
movl    %edx, %eax
sall    $2, %eax
addl    %edx, %eax
addl    %eax, %eax
movl    %ecx, %edx
subl    %eax, %edx
movl    %edx, %eax
movl    %eax, -12(%ebp)

其中 -4(%ebp) 或“i”是输入，-12(%ebp) 或“j”是答案。我已经对此进行了测试，无论您将 -4(%ebp) 设置为多少，它都可以正常工作。

我的问题是这段代码是如何工作的以及它比使用 div 操作数如何更好。

原文

I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.

unsigned int i=999;
unsigned int j=i%10;

To my surprise I got

movl    -4(%ebp), %ecx
movl    $-858993459, %edx
movl    %ecx, %eax
mull    %edx
shrl    $3, %edx
movl    %edx, %eax
sall    $2, %eax
addl    %edx, %eax
addl    %eax, %eax
movl    %ecx, %edx
subl    %eax, %edx
movl    %edx, %eax
movl    %eax, -12(%ebp)

Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).

My question is how does this code work and how is it better than using the div operand.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凡间太子 2024-10-13 07:59:40

第一个问题：div 是一条非常慢的指令（超过 20 个时钟周期）。上面的序列包含更多指令，但它们都相对较快，因此就速度而言，这是一个净胜利。

前五个指令（直到并包括 shrl）计算 i/10（我将在一分钟内解释如何进行）。

接下来的几条指令再次将结果乘以 10，但避免使用 mul/imul 指令（这是否成功取决于您所针对的确切处理器 -较新的 x86 具有非常快的乘法器，但较旧的 x86 则没有）。

movl    %edx, %eax   ; eax=i/10
sall    $2, %eax     ; eax=(i/10)*4
addl    %edx, %eax   ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl    %eax, %eax   ; eax=(i/10)*5*2 = (i/10)*10

然后再次从i中减去该值以获得i - (i/10)*10，即i % 10（对于无符号数）。

最后，关于i/10的计算：基本思想是用乘以1/10代替除以10。编译器通过乘以 (2**35 / 10 + 1) 来对此进行定点近似 - 这是加载到 edx 中的神奇值，尽管它是作为有符号值输出的，尽管它实际上是无符号 - 并将结果右移 35。事实证明，这为所有 32 位整数提供了正确的结果。

有一些算法可以确定这种近似值，保证误差小于 1（这对于整数来说意味着它是正确的值），并且 GCC 显然使用 1:)

最后备注：如果您想实际看到 GCC 计算模数，请将除数变量（例如函数参数），因此它无法进行这种优化。无论如何，在 x86 上，您可以使用 div 计算模。 div 期望 edx:eax 中的 64 位被除数（edx 中的高 32 位，eax 中的低 32 位 - 如果您使用的是 32 位，请将 edx 清除为零- 位数字）并将其除以您指定的任何操作数（例如 div ebx 将 edx:eax 除以 ebx）。它在 eax 中返回商，在 edx 中返回余数。 idiv 对有符号值执行相同的操作。

Second question first: div is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.

The first five instructions (up to and including the shrl) compute i/10 (I'll explain how in a minute).

The next few instructions multiply the result by 10 again, but avoiding the mul/imul instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).

movl    %edx, %eax   ; eax=i/10
sall    $2, %eax     ; eax=(i/10)*4
addl    %edx, %eax   ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl    %eax, %eax   ; eax=(i/10)*5*2 = (i/10)*10

This is then subtracted from i again to obtain i - (i/10)*10 which is i % 10 (for unsigned numbers).

Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.

There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)

Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div. div expects the 64-bit dividend in edx:eax (high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx divides edx:eax by ebx). It returns the quotient in eax and the remainder in edx. idiv does the same for signed values.