32位机器如何处理大于2^32的数字?

发布于 2024-09-26 23:02:37 字数 1103 浏览 5 评论 0原文

我试图了解涉及大于 232 的数字的计算如何在 32 位计算机上进行。

C 代码

$ cat size.c
#include<stdio.h>
#include<math.h>

int main() {

    printf ("max unsigned long long = %llu\n",
    (unsigned long long)(pow(2, 64) - 1));
}
$

gcc 输出

$ gcc size.c -o size
$ ./size
max unsigned long long = 18446744073709551615
$

相应的汇编代码

$ gcc -S size.c -O3
$ cat size.s
    .file   "size.c"
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 4
.LC0:
    .string "max unsigned long long = %llu\n"
    .text
    .p2align 4,,15
.globl main
    .type   main, @function
main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $16, %esp
    movl    $-1, 8(%esp)   #1
    movl    $-1, 12(%esp)  #2
    movl    $.LC0, 4(%esp) #3
    movl    $1, (%esp)     #4
    call    __printf_chk
    leave
    ret
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
    .section    .note.GNU-stack,"",@progbits
$

第 1 - 4 行到底发生了什么?

这是汇编级别的某种字符串连接吗?

I am trying to understand how calculations involving numbers greater than 232 happen on a 32 bit machine.

C code

$ cat size.c
#include<stdio.h>
#include<math.h>

int main() {

    printf ("max unsigned long long = %llu\n",
    (unsigned long long)(pow(2, 64) - 1));
}
$

gcc output

$ gcc size.c -o size
$ ./size
max unsigned long long = 18446744073709551615
$

Corresponding assembly code

$ gcc -S size.c -O3
$ cat size.s
    .file   "size.c"
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 4
.LC0:
    .string "max unsigned long long = %llu\n"
    .text
    .p2align 4,,15
.globl main
    .type   main, @function
main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $16, %esp
    movl    $-1, 8(%esp)   #1
    movl    $-1, 12(%esp)  #2
    movl    $.LC0, 4(%esp) #3
    movl    $1, (%esp)     #4
    call    __printf_chk
    leave
    ret
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
    .section    .note.GNU-stack,"",@progbits
$

What exactly happens on the lines 1 - 4?

Is this some kind of string concatenation at the assembly level?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

双马尾 2024-10-03 23:02:37

__printf_chkprintf 的包装器,它检查堆栈溢出,并采用一个额外的第一个参数,一个标志(例如,参见 此处。)

pow(2 , 64) - 1 已优化为 0xffffffffffffffff 因为参数是常量。

根据通常的调用约定,__printf_chk() 的第一个参数(int flag)是堆栈上的 32 位值(位于 %esp 在 call 指令时)。下一个参数 const char * format 是一个 32 位指针(堆栈上的下一个 32 位字,即位于 %esp+4 处)。正在打印的 64 位数量占据了接下来的两个 32 位字(在 %esp+8%esp+12 处):

pushl   %ebp                 ; prologue
movl    %esp, %ebp           ; prologue
andl    $-16, %esp           ; align stack pointer
subl    $16, %esp            ; reserve bytes for stack frame
movl    $-1, 8(%esp)   #1    ; store low half of 64-bit argument (a constant) to stack
movl    $-1, 12(%esp)  #2    ; store high half of 64-bit argument (a constant) to stack
movl    $.LC0, 4(%esp) #3    ; store address of format string to stack
movl    $1, (%esp)     #4    ; store "flag" argument to __printf_chk to stack
call    __printf_chk         ; call routine
leave                        ; epilogue
ret                          ; epilogue

编译器已有效重写this:

printf("max unsigned long long = %llu\n", (unsigned long long)(pow(2, 64) - 1));

...into this:

__printf_chk(1, "max unsigned long long = %llu\n", 0xffffffffffffffffULL);

...并且在运行时,调用的堆栈布局如下所示(将堆栈显示为 32 位字,地址从图的底部向上递增):

        :                 :
        :     Stack       :
        :                 :
        +-----------------+
%esp+12 |      0xffffffff | \ 
        +-----------------+  } <-------------------------------------.
%esp+8  |      0xffffffff | /                                        |
        +-----------------+                                          |
%esp+4  |address of string| <---------------.                        |
        +-----------------+                 |                        |
%esp    |               1 | <--.            |                        |
        +-----------------+    |            |                        |
                  __printf_chk(1, "max unsigned long long = %llu\n", |
                                                    0xffffffffffffffffULL);

__printf_chk is a wrapper around printf which checks for stack overflow, and takes an additional first parameter, a flag (e.g. see here.)

pow(2, 64) - 1 has been optimised to 0xffffffffffffffff as the arguments are constants.

As per the usual calling conventions, the first argument to __printf_chk() (int flag) is a 32-bit value on the stack (at %esp at the time of the call instruction). The next argument, const char * format, is a 32-bit pointer (the next 32-bit word on the stack, i.e. at %esp+4). And the 64-bit quantity that is being printed occupies the next two 32-bit words (at %esp+8 and %esp+12):

pushl   %ebp                 ; prologue
movl    %esp, %ebp           ; prologue
andl    $-16, %esp           ; align stack pointer
subl    $16, %esp            ; reserve bytes for stack frame
movl    $-1, 8(%esp)   #1    ; store low half of 64-bit argument (a constant) to stack
movl    $-1, 12(%esp)  #2    ; store high half of 64-bit argument (a constant) to stack
movl    $.LC0, 4(%esp) #3    ; store address of format string to stack
movl    $1, (%esp)     #4    ; store "flag" argument to __printf_chk to stack
call    __printf_chk         ; call routine
leave                        ; epilogue
ret                          ; epilogue

The compiler has effectively rewritten this:

printf("max unsigned long long = %llu\n", (unsigned long long)(pow(2, 64) - 1));

...into this:

__printf_chk(1, "max unsigned long long = %llu\n", 0xffffffffffffffffULL);

...and, at runtime, the stack layout for the call looks like this (showing the stack as 32-bit words, with addresses increasing from the bottom of the diagram upwards):

        :                 :
        :     Stack       :
        :                 :
        +-----------------+
%esp+12 |      0xffffffff | \ 
        +-----------------+  } <-------------------------------------.
%esp+8  |      0xffffffff | /                                        |
        +-----------------+                                          |
%esp+4  |address of string| <---------------.                        |
        +-----------------+                 |                        |
%esp    |               1 | <--.            |                        |
        +-----------------+    |            |                        |
                  __printf_chk(1, "max unsigned long long = %llu\n", |
                                                    0xffffffffffffffffULL);
猛虎独行 2024-10-03 23:02:37

类似于我们处理大于 9 的数字的方式,仅包含数字 0 - 9。
(使用位置数字)。假设这个问题是一个概念性的问题。

similar to the way as we handle numbers greater than 9, with only digits 0 - 9.
(using positional digits). presuming the question is a conceptual one.

最美不过初阳 2024-10-03 23:02:37

在您的情况下,编译器知道 2^64-1 只是 0xffffffffffffffff,因此它已将 -1(低位双字)和 -1(高位双字)压入堆栈作为 printf 的参数。这只是一个优化。

一般来说,64 位数字(甚至更大的值)可以用多个字来存储,例如一个unsigned long long 使用两个dword。要添加两个 64 位数字,需要执行两次加法 - 一次在低 32 位上,一次在高 32 位上,再加上进位:

; Add 64-bit number from esi onto edi:
mov     eax, [esi] ; get low 32 bits of source
add     [edi], eax ; add to low 32 bits of destination
; That add may have overflowed, and if it did, carry flag = 1.
mov     eax, [esi+4] ; get high 32 bits of source
adc     [edi+4], eax ; add to high 32 bits of destination, then add carry.

您可以重复此 add序列adc 可以随意添加任意大的数字。减法也可以完成同样的事情 - 只需使用 subsbb (借位减法)。

乘法和除法要复杂得多,每当您将 64 位数字相乘时,编译器通常会生成一些小的辅助函数来处理这些函数。像 GMP 这样支持非常非常大的整数的软件包使用 SSE/SSE2 来加快速度。请查看这篇维基百科文章,了解有关乘法算法的更多信息。

In your case, the compiler knows that 2^64-1 is just 0xffffffffffffffff, so it has pushed -1 (low dword) and -1 (high dword) onto the stack as your argument to printf. It's just an optimization.

In general, 64-bit numbers (and even greater values) can be stored with multiple words, e.g. an unsigned long long uses two dwords. To add two 64-bit numbers, two additions are performed - one on the low 32 bits, and one on the high 32 bits, plus the carry:

; Add 64-bit number from esi onto edi:
mov     eax, [esi] ; get low 32 bits of source
add     [edi], eax ; add to low 32 bits of destination
; That add may have overflowed, and if it did, carry flag = 1.
mov     eax, [esi+4] ; get high 32 bits of source
adc     [edi+4], eax ; add to high 32 bits of destination, then add carry.

You can repeat this sequence of add and adcs as much as you like to add arbitrarily big numbers. The same thing can be done with subtraction - just use sub and sbb (subtract with borrow).

Multiplication and division are much more complicated, and the compiler usually produces some small helper functions to deal with these whenever you multiply 64-bit numbers together. Packages like GMP which support very, very large integers use SSE/SSE2 to speed things up. Take a look at this Wikipedia article for more information on multiplication algorithms.

近箐 2024-10-03 23:02:37

正如其他人指出的那样,您的示例中的所有 64 位算术都已被优化掉。这个答案主要针对标题中的问题。

基本上,我们将每个 32 位数字视为一个数字,并以 4294967296 为基数进行工作。通过这种方式,我们可以处理任意大的数字。

加法和减法是最简单的。我们从最不重要的数字开始,一直到最重要的数字,一次处理一个数字。通常,第一个数字是使用普通的加/减指令完成的,后面的数字是使用特定的“带进位的加法”或“带借位的减法”指令完成的。状态寄存器中的进位标志用于将进位/借位从一位数字转移到下一位数字。由于二进制补码,有符号和无符号加法和减法是相同的。

乘法有点棘手,两个 32 位数字相乘可以产生 64 位结果。大多数 32 位处理器都有将两个 32 位数字相乘并在两个寄存器中生成 64 位结果的指令。然后需要进行加法将结果组合成最终答案。由于二进制补码有符号和无符号乘法是相同的,只要所需的结果大小与参数大小相同。如果结果大于参数,则需要特别小心。

为了进行比较,我们从最高有效数字开始。如果相等,我们向下移动到下一个数字,直到结果相等。

除法对于我来说太复杂了,无法在这篇文章中描述,但是有很多算法的例子。例如 http://www.hackersdelight.org/hdcodetxt/divDouble.c.txt


来自 gcc 的一些实际示例 https://godbolt.org/g/NclqXC ,汇编器是在英特尔语法中。

首先补充一下。将两个 64 位数字相加并产生 64 位结果。签名和未签名版本的 asm 是相同的。

int64_t add64(int64_t a, int64_t b) { return a +  b; }
add64:
    mov     eax, DWORD PTR [esp+12]
    mov     edx, DWORD PTR [esp+16]
    add     eax, DWORD PTR [esp+4]
    adc     edx, DWORD PTR [esp+8]
    ret

这非常简单,将一个参数加载到 eax 和 edx 中,然后使用 add 和带有进位的 add 来添加另一个参数。结果保留在 eax 和 edx 中以返回给调用者。

现在将两个 64 位数字相乘以产生 64 位结果。同样,代码不会从签名更改为未签名。我添加了一些评论以使其更易于理解。

在查看代码之前,让我们考虑一下数学。 a 和 b 是 64 位数字,我将使用 lo() 表示 64 位数字的低 32 位,使用 hi() 表示 64 位数字的高 32 位。

(a * b) = (lo(a) * lo(b)) + (hi(a) * lo(b) * 2^32) + (hi(b) * lo(a) * 2^32) + (hi(b) * hi(a) * 2^64)

(a * b) mod 2^64 = (lo(a) * lo(b)) + (lo(hi(a) * lo(b)) * 2^32) + (lo(hi(b) * lo(a)) * 2^32)

lo((a * b) mod 2^64) = lo(lo(a) * lo(b))

hi ((a * b) mod 2^64) = hi(lo(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))

uint64_t mul64(uint64_t a, uint64_t b) { return a*b; }
mul64:
    push    ebx                      ;save ebx
    mov     eax, DWORD PTR [esp+8]   ;load lo(a) into eax
    mov     ebx, DWORD PTR [esp+16]  ;load lo(b) into ebx
    mov     ecx, DWORD PTR [esp+12]  ;load hi(a) into ecx  
    mov     edx, DWORD PTR [esp+20]  ;load hi(b) into edx
    imul    ecx, ebx                 ;ecx = lo(hi(a) * lo(b))
    imul    edx, eax                 ;edx = lo(hi(b) * lo(a))
    add     ecx, edx                 ;ecx = lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
    mul     ebx                      ;eax = lo(low(a) * lo(b))
                                     ;edx = hi(low(a) * lo(b))
    pop     ebx                      ;restore ebx.
    add     edx, ecx                 ;edx = hi(low(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
    ret

最后,当我们尝试除法时,我们看到了。

int64_t div64(int64_t a, int64_t b) { return a/b; }
div64:
    sub     esp, 12
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    call    __divdi3
    add     esp, 28
    ret

编译器认为除法过于复杂而无法实现内联,而是调用库例程。

As others have pointed out all 64-bit aritmetic in your example has been optimised away. This answer focuses on the question int the title.

Basically we treat each 32-bit number as a digit and work in base 4294967296. In this manner we can work on arbiterally big numbers.

Addition and subtraction are easiest. We work through the digits one at a time starting from the least significant and moving to the most significant. Generally the first digit is done with a normal add/subtract instruction and later digits are done using a specific "add with carry" or "subtract with borrow" instruction. The carry flag in the status register is used to take the carry/borrow bit from one digit to the next. Thanks to twos complement signed and unsigned addition and subtraction are the same.

Multiplication is a little trickier, multiplying two 32-bit digits can produce a 64-bit result. Most 32-bit processors will have instructions that multiply two 32-bit numbers and produces a 64-bit result in two registers. Addition will then be needed to combine the results into a final answer. Thanks to twos complement signed and unsigned multiplication are the same provided the desired result size is the same as the argument size. If the result is larger than the arguments then special care is needed.

For comparision we start from the most significant digit. If it's equal we move down to the next digit until the results are equal.

Division is too complex for me to describe in this post, but there are plenty of examples out there of algorithms. e.g. http://www.hackersdelight.org/hdcodetxt/divDouble.c.txt


Some real-world examples from gcc https://godbolt.org/g/NclqXC , the assembler is in intel syntax.

First an addition. adding two 64-bit numbers and producing a 64-bit result. The asm is the same for both signed and unsigned versions.

int64_t add64(int64_t a, int64_t b) { return a +  b; }
add64:
    mov     eax, DWORD PTR [esp+12]
    mov     edx, DWORD PTR [esp+16]
    add     eax, DWORD PTR [esp+4]
    adc     edx, DWORD PTR [esp+8]
    ret

This is pretty simple, load one argument into eax and edx, then add the other using an add followed by an add with carry. The result is left in eax and edx for return to the caller.

Now a multiplication of two 64-bit numbers to produce a 64-bit result. Again the code doesn't change from signed to unsigned. I've added some comments to make it easier to follow.

Before we look at the code lets consider the math. a and b are 64-bit numbers I will use lo() to represent the lower 32-bits of a 64-bit number and hi() to represent the upper 32 bits of a 64-bit number.

(a * b) = (lo(a) * lo(b)) + (hi(a) * lo(b) * 2^32) + (hi(b) * lo(a) * 2^32) + (hi(b) * hi(a) * 2^64)

(a * b) mod 2^64 = (lo(a) * lo(b)) + (lo(hi(a) * lo(b)) * 2^32) + (lo(hi(b) * lo(a)) * 2^32)

lo((a * b) mod 2^64) = lo(lo(a) * lo(b))

hi((a * b) mod 2^64) = hi(lo(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))

uint64_t mul64(uint64_t a, uint64_t b) { return a*b; }
mul64:
    push    ebx                      ;save ebx
    mov     eax, DWORD PTR [esp+8]   ;load lo(a) into eax
    mov     ebx, DWORD PTR [esp+16]  ;load lo(b) into ebx
    mov     ecx, DWORD PTR [esp+12]  ;load hi(a) into ecx  
    mov     edx, DWORD PTR [esp+20]  ;load hi(b) into edx
    imul    ecx, ebx                 ;ecx = lo(hi(a) * lo(b))
    imul    edx, eax                 ;edx = lo(hi(b) * lo(a))
    add     ecx, edx                 ;ecx = lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
    mul     ebx                      ;eax = lo(low(a) * lo(b))
                                     ;edx = hi(low(a) * lo(b))
    pop     ebx                      ;restore ebx.
    add     edx, ecx                 ;edx = hi(low(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
    ret

Finally when we try a division we see.

int64_t div64(int64_t a, int64_t b) { return a/b; }
div64:
    sub     esp, 12
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    push    DWORD PTR [esp+28]
    call    __divdi3
    add     esp, 28
    ret

The compiler has decided that division is too complex to implement inline and instead calls a library routine.

风轻花落早 2024-10-03 23:02:37

编译器实际上对您的代码进行了静态优化。
#1 #2 #3 行是 printf() 的参数

The compiler actually made a static optimization of your code.
lines #1 #2 #3 are parameters for printf()

妄想挽回 2024-10-03 23:02:37

正如 @Pafy 提到的,编译器已将其计算为常量。

2 的第 64 次减 1 是 0xffffffffffffffff

作为 2 个 32 位整数,这是:0xffffffff0xffffffff
如果将其视为一对 32 位有符号类型,则最终为:<代码>-1和<代码>-1。

因此,对于您的编译器来说,生成的代码恰好相当于:

printf("max unsigned long long = %llu\n", -1, -1);

在程序集中,它是这样写的:

movl    $-1, 8(%esp)   #Second -1 parameter
movl    $-1, 12(%esp)  #First -1 parameter
movl    $.LC0, 4(%esp) #Format string
movl    $1, (%esp)     #A one.  Kind of odd, perhaps __printf_chk
                       #in your C library expects this.
call    __printf_chk

顺便说一句,计算 2 的幂的更好方法是将 1 左移。例如。 (1ULL << 64) - 1

As @Pafy mentions, the compiler has evaluated this as a constant.

2 to the 64th minus 1 is 0xffffffffffffffff.

As 2 32-bit integers this is: 0xffffffff and 0xffffffff,
which if you take that as a pair of 32-bit signed types, ends up as: -1, and -1.

Thus for your compiler the code generated happens to be equivalent to:

printf("max unsigned long long = %llu\n", -1, -1);

In the assembly it's written like this:

movl    $-1, 8(%esp)   #Second -1 parameter
movl    $-1, 12(%esp)  #First -1 parameter
movl    $.LC0, 4(%esp) #Format string
movl    $1, (%esp)     #A one.  Kind of odd, perhaps __printf_chk
                       #in your C library expects this.
call    __printf_chk

By the way, a better way to calculate powers of 2 is to shift 1 left. Eg. (1ULL << 64) - 1.

携余温的黄昏 2024-10-03 23:02:37

此线程中没有人注意到 OP 要求解释前 4 行,而不是第 11-14 行。

前 4 行是:

    .file   "size.c"
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 4
.LC0:

这是前 4 行中发生的情况:

.file   "size.c"

这是一个汇编指令,表示我们将启动一个名为“size.c”的新逻辑文件。

.section    .rodata.str1.4,"aMS",@progbits,1

这也是程序中只读字符串的指令。

.align 4

该指令将位置计数器设置为始终为 4 的倍数。

.LC0:

例如,这是一个可以跳转到的标签 LC0

我希望我为这个问题提供了正确的答案,因为我准确地回答了OP的要求。

No one in this thread noticed that the OP asked to explain the first 4 lines, not lines 11-14.

The first 4 lines are:

    .file   "size.c"
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 4
.LC0:

Here's what happens in first 4 lines:

.file   "size.c"

This is an assembler directive that says that we are about to start a new logical file called "size.c".

.section    .rodata.str1.4,"aMS",@progbits,1

This is also a directive for read only strings in the program.

.align 4

This directive sets the location counter to always be a multiple of 4.

.LC0:

This is a label LC0 that can be jumped to, for example.

I hope I provided the right answer to the question as I answered exactly what OP asked.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文