汇编 - .data、.code 和寄存器...?
所以今天早上我发布了一个关于装配的困惑问题,我得到了一些非常好的真正帮助,我真的很感激。
现在我开始进入组装并开始了解它是如何工作的。
我觉得我理解得还不错,包括堆栈、中断、二进制/十六进制,以及大多数基本操作的一般功能(jmp、push、mov 等)。
我正在努力理解并希望获得帮助的概念如下 - 如果您能解决以下任何问题,这将是一个巨大的帮助:
- .data 部分到底发生了什么?这些变量是我们声明的吗?
- 如果是这样,我们可以在代码部分稍后声明变量吗?如果没有,为什么不呢?如果是这样,那么我们如何以及为什么使用数据部分?
- 什么是寄存器?它与变量相比如何?我的意思是我知道这是一个存储一小段信息的位置......但这对我来说听起来完全像一个变量。
- 我如何制作一个数组?我知道这看起来有点随机,但我很好奇我会如何去做这样的事情。
- 是否有一个关于每个寄存器的用途的常见做法列表?我仍然没有完全理解它们,但注意到有些人说,例如,应该使用某个寄存器来存储过程的“返回值” - 是否有此类实践的全面或至少信息丰富的列表?
- 我学习汇编的原因之一是为了更好地理解高级代码背后发生的事情。考虑到这一点 - 当我用 C++ 编程时,我经常考虑堆栈和堆。在汇编中我知道堆栈是什么 - “堆”在哪里?
一些信息:我使用 masm32 和 WinAsm 作为 IDE,并且我正在 Windows 7 上工作。我有很多使用高级语言(例如 c++/java)进行编程的经验。
编辑:感谢大家的帮助,一如既往的信息非常丰富!很棒的东西!最后一件事 - 我想知道堆栈指针和基指针或 ESP 和 EBP 之间有什么区别。有人可以帮我吗?
编辑:我想我现在明白了...... ESP 总是指向堆栈的顶部。但是,您可以将 EBP 指向任何您想要的位置。 ESP 是自动处理的,但您可以使用 EBP 做任何您想做的事情。例如:
push 6
push 5
push 4
mov EBP, ESP
push 3
push 2
在这种情况下,EBP 现在指向保存 4 的地址,但 ESP 现在指向保存 2 的地址。
在实际应用程序中,6、5 和 4 可能是函数参数,而 3 和 2 可能是局部变量在该函数内。
So this morning I posted a confused question about assembly and I received some great genuine help, which I really appreciate.
And now I'm starting to get into assembly and am beginning to understand how it works.
Things I feel I understand alright include the stack, interrupts, binary/hex, and in general what most of the basic operations do (jmp, push, mov, etc).
Concepts that I'm struggling to understand and would like help with are below - it would be a huge help if you could address any of the following:
- What exactly is happening in the .data section? Are those variables we're declaring?
- If so, can we declare variables later in the code section? If not, why not? If so, how, and why do we use the data section then?
- What's a register? How does it compare to a variable? I mean I know it's a location that stores a small piece of information... but that sounds exactly like a variable to me.
- How do I make an array? I know this seems kind of random, but I'm curious as to how I'd go about doing something like this.
- Is there a list somewhere of common practices for what each register should be used for? I still don't get them completely, but have noticed some people saying, for example, that a certain register should be used to store 'return values' from procedures - is there a comprehensive or at least informative list of such practices?
- One of the reasons I'm learning assembly is to better understand what's going on behind my high level code. With that in mind - when I'm programming in c++, I'm often thinking about the stack and the heap. In assembly I know what the stack is - where's the 'heap'?
Some info: I'm using masm32 with WinAsm as an IDE, and I'm working on Windows 7. I have a lot of prior experience programming in higher level languages such as c++/java.
edit: Thanks for the help everyone, extremely informative as usual! Great stuff! One last thing though - I'm wondering what the difference is between the Stack Pointer, and the Base pointer, or ESP and EBP. Can someone help me out?
edit: I think I get it now... ESP always points to the top of the stack. However, you can point EBP at whatever you want. ESP is automatically handled but you can do whatever you want with EBP. For example:
push 6
push 5
push 4
mov EBP, ESP
push 3
push 2
In this scenario, EBP now points to the address holding 4, but ESP now points to the address holding 2.
In a real application 6, 5, and 4 could have been function arguments, whereas 3 and 2 could be local variables within that function.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
让我们尝试按顺序回答吧!
数据部分包含您希望系统在调用程序入口点之前自动初始化的任何内容。你是对的,通常全局变量在这里结束。零初始化数据通常不包含在可执行文件中,因为没有理由这样做 - 生成该空间所需的只是程序加载器的几个指令。一旦程序开始运行,ZI 和数据区域通常是可以互换的。 Wikipedia 有更多信息。
在汇编编程中,变量并不真正存在,至少在编写 C 代码时并不存在。你所拥有的只是你所做出的关于如何布置你的记忆的决定。变量可以位于堆栈上、内存中的某个位置,或者仅存在于寄存器中。
寄存器是处理器的内部数据存储。一般来说,您只能对处理器寄存器中的值进行操作。您可以将其内容加载到内存中或从内存中存储它们的内容,这是计算机工作方式的基本操作。这是一个简单的例子。此 C 代码:
可能会被翻译成一些(简化的)程序集:
在这种情况下,您可以将寄存器视为变量,但一般来说,任何一个变量都不必始终保留在同一个寄存器中;根据您的日常工作的复杂程度,这甚至可能是不可能的。您需要将一些数据压入堆栈,弹出其他数据,等等。 “变量”是逻辑数据块,而不是它位于内存或寄存器等中的位置。
数组只是一个连续的内存块 - 对于本地数组,您只需适当递减堆栈指针即可。对于全局数组,您可以在数据部分声明该块。
有很多关于寄存器的约定 - 检查您平台的 ABI 或调用约定文档,了解有关如何正确使用它们的详细信息。您的汇编程序文档也可能包含信息。检查 维基百科上的 ABI 文章。
您的汇编程序可以进行与任何 C 程序相同的系统调用,因此您只需调用
malloc()
即可从堆中获取内存。Let's try to answer in order!
The data section contains anything that you want to be automatically initialized for you by the system before it calls the entry point of your program. You're right, normally global variables end up here. Zero-initialized data is generally not included in the executable file, since there's no reason to - a couple of directives to the program loader are all that's needed to generate that space. Once your program starts running, the ZI and data regions are generally interchangeable. Wikipedia has a lot more information.
Variables don't really exist when assembly programming, at least not in the sense they do when you're writing C code. All you have is the decisions you've made about how to lay out your memory. Variables can be on the stack, somewhere in memory, or just live only in registers.
Registers are the internal data storage of the processor. You can, in general, only do operations on values in processor registers. You can load and store their contents to and from memory, which is the basic operation of how your computer works. Here's a quick example. This C code:
Might get translated to some (simplified) assembly along the lines of:
In this case, you can think of the registers as variables, but in general it's not necessary that any one variable always stay in the same register; depending on how complicated your routine is, it may not even be possible. You'll need to push some data onto the stack, pop other data off, and so on. A 'variable' is that logical piece of data, not where it lives in memory or registers, etc.
An array is just a contiguous block of memory - for a local array, you can just decrement the stack pointer appropriately. For a global array, you can declare that block in the data section.
There are a bunch of conventions about registers - check your platform's ABI or calling convention document for details about how to use them correctly. Your assembler documentation might have information as well. Check the ABI article on wikipedia.
Your assembly program can make the same system calls any C program could, so you can just call
malloc()
to get memory from the heap.我想补充一下。计算机上的程序通常分为三个部分,尽管还有其他部分。
代码段 - .code、.text :http://en.wikipedia.org/wiki/Code_segment
数据段 - .data :http://en.wikipedia.org/wiki/Data_segment
BSS:http://en.wikipedia.org/wiki/.bss
正如其他人所描述的,寄存器是 CPU 存储数据或内存地址的设施。操作是在寄存器上执行的,例如 add eax, ebx,并且根据汇编语言,这意味着不同的事情。在本例中,这将转换为将 ebx 的内容添加到 eax 并将其存储在 eax 中(NASM 语法)。 GNU AS (AT&T) 中的等效项是:
movl $ebx, $eax
。不同的汇编方言有不同的规则和运算符。由于这个原因,我不是 MASM 的粉丝——它与 NASM、YASM 和 GNU AS 都非常不同。与 C 之间并没有真正的交互作用。ABI 指定了这是如何发生的;例如,在 x86 (unix) 上,您会发现方法的参数被压入堆栈,而在 Unix 上的 x86-64 中,前几个参数将位于寄存器中。两个 ABI 都期望函数的结果存储在 eax/rax 寄存器中。
下面是一个针对 Windows 和 Linux 进行汇编的 32 位添加例程。
看到这里,你就明白我的意思了。 “返回”值可在 eax 中找到。相比之下,x64 版本将如下所示:
有定义此类内容的文档。以下是 UNIX x64 ABI: http://www.x86-64.org/documentation /abi-0.99.pdf。我确信您可能会找到适用于您需要的任何处理器、平台等的 ABI。
如何在汇编中操作数组?指针算术。给定
eax
处的基地址,如果整数大小为 4 个字节,则下一个存储的整数将位于[eax+4]
处。您可以使用对 malloc/calloc 的调用来创建此空间,或者调用内存分配系统调用,无论您的系统上有什么。什么是“堆”?再次根据维基百科,它是为动态内存分配保留的内存区域。在调用 calloc、malloc 或内存分配系统调用之前,您不会在汇编程序中看到它,但它确实存在。
对不起这篇文章。
I'd like to add to this. Programs on a computer are typically split up into three sections, although there are others.
Code Segment - .code, .text : http://en.wikipedia.org/wiki/Code_segment
Data Segment - .data : http://en.wikipedia.org/wiki/Data_segment
BSS : http://en.wikipedia.org/wiki/.bss
Registers are, as described by others, facilities of the CPU to store data or a memory address. Operations are performed upon registers, such as
add eax, ebx
and depending on the assembly dialect, that means different things. In this case, this translates to add the contents of ebx to eax and store it in eax (NASM syntax). The equivalent in GNU AS (AT&T) is:movl $ebx, $eax
. Different dialects of assembly have different rules and operators. I'm not a fan of MASM for this reason - it is very different to both NASM, YASM and GNU AS.There isn't really an in general interaction with C. ABI's designate how this happens; for example, on x86 (unix) you'll find a method's arguments pushed onto the stack, whereas in x86-64 on Unix the first few arguments will be positioned in registers. Both ABIs expect the result of the function to be stored in the eax/rax register.
Here's a 32-bit add routine that assembles for both Windows and Linux.
Here, you can see what I mean. The "return" value is found in eax. By contrast, the x64 version would look like this:
There are documents that define this sort of thing. Here's the UNIX x64 ABI: http://www.x86-64.org/documentation/abi-0.99.pdf. I'm sure you could probably find ABIs for any processor, platform etc you needed.
How do you operate on an array in assembly? Pointer arithmetic. Given a base address at
eax
the next stored integer would be at[eax+4]
if the integer is 4 bytes in size. You could create this space using calls up to malloc/calloc, or you call the memory allocation system call, whatever that is on your system.What is the 'heap'? According to wikipedia again, it's the area of memory reserved for dynamic memory allocation. You don't see it in your assembly program until you call calloc, malloc or the memory allocation system call, but it is there.
Sorry for the essay.