32位机器如何处理大于2^32的数字?
我试图了解涉及大于 232 的数字的计算如何在 32 位计算机上进行。
C 代码
$ cat size.c
#include<stdio.h>
#include<math.h>
int main() {
printf ("max unsigned long long = %llu\n",
(unsigned long long)(pow(2, 64) - 1));
}
$
gcc 输出
$ gcc size.c -o size
$ ./size
max unsigned long long = 18446744073709551615
$
相应的汇编代码
$ gcc -S size.c -O3
$ cat size.s
.file "size.c"
.section .rodata.str1.4,"aMS",@progbits,1
.align 4
.LC0:
.string "max unsigned long long = %llu\n"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $-1, 8(%esp) #1
movl $-1, 12(%esp) #2
movl $.LC0, 4(%esp) #3
movl $1, (%esp) #4
call __printf_chk
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",@progbits
$
第 1 - 4 行到底发生了什么?
这是汇编级别的某种字符串连接吗?
I am trying to understand how calculations involving numbers greater than 232 happen on a 32 bit machine.
C code
$ cat size.c
#include<stdio.h>
#include<math.h>
int main() {
printf ("max unsigned long long = %llu\n",
(unsigned long long)(pow(2, 64) - 1));
}
$
gcc output
$ gcc size.c -o size
$ ./size
max unsigned long long = 18446744073709551615
$
Corresponding assembly code
$ gcc -S size.c -O3
$ cat size.s
.file "size.c"
.section .rodata.str1.4,"aMS",@progbits,1
.align 4
.LC0:
.string "max unsigned long long = %llu\n"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $-1, 8(%esp) #1
movl $-1, 12(%esp) #2
movl $.LC0, 4(%esp) #3
movl $1, (%esp) #4
call __printf_chk
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",@progbits
$
What exactly happens on the lines 1 - 4?
Is this some kind of string concatenation at the assembly level?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
__printf_chk
是printf
的包装器,它检查堆栈溢出,并采用一个额外的第一个参数,一个标志(例如,参见 此处。)pow(2 , 64) - 1
已优化为0xffffffffffffffff
因为参数是常量。根据通常的调用约定,
__printf_chk()
的第一个参数(int flag
)是堆栈上的 32 位值(位于%esp 在
call
指令时)。下一个参数 const char * format 是一个 32 位指针(堆栈上的下一个 32 位字,即位于%esp+4
处)。正在打印的 64 位数量占据了接下来的两个 32 位字(在%esp+8
和%esp+12
处):编译器已有效重写this:
...into this:
...并且在运行时,调用的堆栈布局如下所示(将堆栈显示为 32 位字,地址从图的底部向上递增):
__printf_chk
is a wrapper aroundprintf
which checks for stack overflow, and takes an additional first parameter, a flag (e.g. see here.)pow(2, 64) - 1
has been optimised to0xffffffffffffffff
as the arguments are constants.As per the usual calling conventions, the first argument to
__printf_chk()
(int flag
) is a 32-bit value on the stack (at%esp
at the time of thecall
instruction). The next argument,const char * format
, is a 32-bit pointer (the next 32-bit word on the stack, i.e. at%esp+4
). And the 64-bit quantity that is being printed occupies the next two 32-bit words (at%esp+8
and%esp+12
):The compiler has effectively rewritten this:
...into this:
...and, at runtime, the stack layout for the call looks like this (showing the stack as 32-bit words, with addresses increasing from the bottom of the diagram upwards):
类似于我们处理大于 9 的数字的方式,仅包含数字 0 - 9。
(使用位置数字)。假设这个问题是一个概念性的问题。
similar to the way as we handle numbers greater than 9, with only digits 0 - 9.
(using positional digits). presuming the question is a conceptual one.
在您的情况下,编译器知道 2^64-1 只是 0xffffffffffffffff,因此它已将 -1(低位双字)和 -1(高位双字)压入堆栈作为 printf 的参数。这只是一个优化。
一般来说,64 位数字(甚至更大的值)可以用多个字来存储,例如一个
unsigned long long
使用两个dword
。要添加两个 64 位数字,需要执行两次加法 - 一次在低 32 位上,一次在高 32 位上,再加上进位:您可以重复此
add
和序列adc 可以随意添加任意大的数字。减法也可以完成同样的事情 - 只需使用
sub
和sbb
(借位减法)。乘法和除法要复杂得多,每当您将 64 位数字相乘时,编译器通常会生成一些小的辅助函数来处理这些函数。像 GMP 这样支持非常非常大的整数的软件包使用 SSE/SSE2 来加快速度。请查看这篇维基百科文章,了解有关乘法算法的更多信息。
In your case, the compiler knows that 2^64-1 is just 0xffffffffffffffff, so it has pushed -1 (low dword) and -1 (high dword) onto the stack as your argument to printf. It's just an optimization.
In general, 64-bit numbers (and even greater values) can be stored with multiple words, e.g. an
unsigned long long
uses twodword
s. To add two 64-bit numbers, two additions are performed - one on the low 32 bits, and one on the high 32 bits, plus the carry:You can repeat this sequence of
add
andadc
s as much as you like to add arbitrarily big numbers. The same thing can be done with subtraction - just usesub
andsbb
(subtract with borrow).Multiplication and division are much more complicated, and the compiler usually produces some small helper functions to deal with these whenever you multiply 64-bit numbers together. Packages like GMP which support very, very large integers use SSE/SSE2 to speed things up. Take a look at this Wikipedia article for more information on multiplication algorithms.
正如其他人指出的那样,您的示例中的所有 64 位算术都已被优化掉。这个答案主要针对标题中的问题。
基本上,我们将每个 32 位数字视为一个数字,并以 4294967296 为基数进行工作。通过这种方式,我们可以处理任意大的数字。
加法和减法是最简单的。我们从最不重要的数字开始,一直到最重要的数字,一次处理一个数字。通常,第一个数字是使用普通的加/减指令完成的,后面的数字是使用特定的“带进位的加法”或“带借位的减法”指令完成的。状态寄存器中的进位标志用于将进位/借位从一位数字转移到下一位数字。由于二进制补码,有符号和无符号加法和减法是相同的。
乘法有点棘手,两个 32 位数字相乘可以产生 64 位结果。大多数 32 位处理器都有将两个 32 位数字相乘并在两个寄存器中生成 64 位结果的指令。然后需要进行加法将结果组合成最终答案。由于二进制补码有符号和无符号乘法是相同的,只要所需的结果大小与参数大小相同。如果结果大于参数,则需要特别小心。
为了进行比较,我们从最高有效数字开始。如果相等,我们向下移动到下一个数字,直到结果相等。
除法对于我来说太复杂了,无法在这篇文章中描述,但是有很多算法的例子。例如 http://www.hackersdelight.org/hdcodetxt/divDouble.c.txt
来自 gcc 的一些实际示例 https://godbolt.org/g/NclqXC ,汇编器是在英特尔语法中。
首先补充一下。将两个 64 位数字相加并产生 64 位结果。签名和未签名版本的 asm 是相同的。
这非常简单,将一个参数加载到 eax 和 edx 中,然后使用 add 和带有进位的 add 来添加另一个参数。结果保留在 eax 和 edx 中以返回给调用者。
现在将两个 64 位数字相乘以产生 64 位结果。同样,代码不会从签名更改为未签名。我添加了一些评论以使其更易于理解。
在查看代码之前,让我们考虑一下数学。 a 和 b 是 64 位数字,我将使用 lo() 表示 64 位数字的低 32 位,使用 hi() 表示 64 位数字的高 32 位。
(a * b) = (lo(a) * lo(b)) + (hi(a) * lo(b) * 2^32) + (hi(b) * lo(a) * 2^32) + (hi(b) * hi(a) * 2^64)
(a * b) mod 2^64 = (lo(a) * lo(b)) + (lo(hi(a) * lo(b)) * 2^32) + (lo(hi(b) * lo(a)) * 2^32)
lo((a * b) mod 2^64) = lo(lo(a) * lo(b))
hi ((a * b) mod 2^64) = hi(lo(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
最后,当我们尝试除法时,我们看到了。
编译器认为除法过于复杂而无法实现内联,而是调用库例程。
As others have pointed out all 64-bit aritmetic in your example has been optimised away. This answer focuses on the question int the title.
Basically we treat each 32-bit number as a digit and work in base 4294967296. In this manner we can work on arbiterally big numbers.
Addition and subtraction are easiest. We work through the digits one at a time starting from the least significant and moving to the most significant. Generally the first digit is done with a normal add/subtract instruction and later digits are done using a specific "add with carry" or "subtract with borrow" instruction. The carry flag in the status register is used to take the carry/borrow bit from one digit to the next. Thanks to twos complement signed and unsigned addition and subtraction are the same.
Multiplication is a little trickier, multiplying two 32-bit digits can produce a 64-bit result. Most 32-bit processors will have instructions that multiply two 32-bit numbers and produces a 64-bit result in two registers. Addition will then be needed to combine the results into a final answer. Thanks to twos complement signed and unsigned multiplication are the same provided the desired result size is the same as the argument size. If the result is larger than the arguments then special care is needed.
For comparision we start from the most significant digit. If it's equal we move down to the next digit until the results are equal.
Division is too complex for me to describe in this post, but there are plenty of examples out there of algorithms. e.g. http://www.hackersdelight.org/hdcodetxt/divDouble.c.txt
Some real-world examples from gcc https://godbolt.org/g/NclqXC , the assembler is in intel syntax.
First an addition. adding two 64-bit numbers and producing a 64-bit result. The asm is the same for both signed and unsigned versions.
This is pretty simple, load one argument into eax and edx, then add the other using an add followed by an add with carry. The result is left in eax and edx for return to the caller.
Now a multiplication of two 64-bit numbers to produce a 64-bit result. Again the code doesn't change from signed to unsigned. I've added some comments to make it easier to follow.
Before we look at the code lets consider the math. a and b are 64-bit numbers I will use lo() to represent the lower 32-bits of a 64-bit number and hi() to represent the upper 32 bits of a 64-bit number.
(a * b) = (lo(a) * lo(b)) + (hi(a) * lo(b) * 2^32) + (hi(b) * lo(a) * 2^32) + (hi(b) * hi(a) * 2^64)
(a * b) mod 2^64 = (lo(a) * lo(b)) + (lo(hi(a) * lo(b)) * 2^32) + (lo(hi(b) * lo(a)) * 2^32)
lo((a * b) mod 2^64) = lo(lo(a) * lo(b))
hi((a * b) mod 2^64) = hi(lo(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
Finally when we try a division we see.
The compiler has decided that division is too complex to implement inline and instead calls a library routine.
编译器实际上对您的代码进行了静态优化。
#1 #2 #3 行是 printf() 的参数
The compiler actually made a static optimization of your code.
lines #1 #2 #3 are parameters for printf()
正如 @Pafy 提到的,编译器已将其计算为常量。
2 的第 64 次减 1 是
0xffffffffffffffff
。作为 2 个 32 位整数,这是:
0xffffffff
和0xffffffff
,如果将其视为一对 32 位有符号类型,则最终为:<代码>-1和<代码>-1。
因此,对于您的编译器来说,生成的代码恰好相当于:
在程序集中,它是这样写的:
顺便说一句,计算 2 的幂的更好方法是将
1
左移。例如。(1ULL << 64) - 1
。As @Pafy mentions, the compiler has evaluated this as a constant.
2 to the 64th minus 1 is
0xffffffffffffffff
.As 2 32-bit integers this is:
0xffffffff
and0xffffffff
,which if you take that as a pair of 32-bit signed types, ends up as:
-1
, and-1
.Thus for your compiler the code generated happens to be equivalent to:
In the assembly it's written like this:
By the way, a better way to calculate powers of 2 is to shift
1
left. Eg.(1ULL << 64) - 1
.此线程中没有人注意到 OP 要求解释前 4 行,而不是第 11-14 行。
前 4 行是:
这是前 4 行中发生的情况:
这是一个汇编指令,表示我们将启动一个名为“size.c”的新逻辑文件。
这也是程序中只读字符串的指令。
该指令将位置计数器设置为始终为 4 的倍数。
例如,这是一个可以跳转到的标签
LC0
。我希望我为这个问题提供了正确的答案,因为我准确地回答了OP的要求。
No one in this thread noticed that the OP asked to explain the first 4 lines, not lines 11-14.
The first 4 lines are:
Here's what happens in first 4 lines:
This is an assembler directive that says that we are about to start a new logical file called "size.c".
This is also a directive for read only strings in the program.
This directive sets the location counter to always be a multiple of 4.
This is a label
LC0
that can be jumped to, for example.I hope I provided the right answer to the question as I answered exactly what OP asked.