当前位置：文江博客话题详情

汇编 - .data、.code 和寄存器...？

发布于 2024-08-22 22:20:19 字数 1103 浏览 13 评论 0原文

所以今天早上我发布了一个关于装配的困惑问题，我得到了一些非常好的真正帮助，我真的很感激。

现在我开始进入组装并开始了解它是如何工作的。

我觉得我理解得还不错，包括堆栈、中断、二进制/十六进制，以及大多数基本操作的一般功能（jmp、push、mov 等）。

我正在努力理解并希望获得帮助的概念如下 - 如果您能解决以下任何问题，这将是一个巨大的帮助：

.data 部分到底发生了什么？这些变量是我们声明的吗？
如果是这样，我们可以在代码部分稍后声明变量吗？如果没有，为什么不呢？如果是这样，那么我们如何以及为什么使用数据部分？
什么是寄存器？它与变量相比如何？我的意思是我知道这是一个存储一小段信息的位置......但这对我来说听起来完全像一个变量。
我如何制作一个数组？我知道这看起来有点随机，但我很好奇我会如何去做这样的事情。
是否有一个关于每个寄存器的用途的常见做法列表？我仍然没有完全理解它们，但注意到有些人说，例如，应该使用某个寄存器来存储过程的“返回值” - 是否有此类实践的全面或至少信息丰富的列表？
我学习汇编的原因之一是为了更好地理解高级代码背后发生的事情。考虑到这一点 - 当我用 C++ 编程时，我经常考虑堆栈和堆。在汇编中我知道堆栈是什么 - “堆”在哪里？

一些信息：我使用 masm32 和 WinAsm 作为 IDE，并且我正在 Windows 7 上工作。我有很多使用高级语言（例如 c++/java）进行编程的经验。

编辑：感谢大家的帮助，一如既往的信息非常丰富！很棒的东西！最后一件事 - 我想知道堆栈指针和基指针或 ESP 和 EBP 之间有什么区别。有人可以帮我吗？

编辑：我想我现在明白了...... ESP 总是指向堆栈的顶部。但是，您可以将 EBP 指向任何您想要的位置。 ESP 是自动处理的，但您可以使用 EBP 做任何您想做的事情。例如：

push 6
push 5
push 4
mov EBP, ESP
push 3
push 2

在这种情况下，EBP 现在指向保存 4 的地址，但 ESP 现在指向保存 2 的地址。

在实际应用程序中，6、5 和 4 可能是函数参数，而 3 和 2 可能是局部变量在该函数内。

原文

So this morning I posted a confused question about assembly and I received some great genuine help, which I really appreciate.

And now I'm starting to get into assembly and am beginning to understand how it works.

Things I feel I understand alright include the stack, interrupts, binary/hex, and in general what most of the basic operations do (jmp, push, mov, etc).

Concepts that I'm struggling to understand and would like help with are below - it would be a huge help if you could address any of the following:

What exactly is happening in the .data section? Are those variables we're declaring?
If so, can we declare variables later in the code section? If not, why not? If so, how, and why do we use the data section then?
What's a register? How does it compare to a variable? I mean I know it's a location that stores a small piece of information... but that sounds exactly like a variable to me.
How do I make an array? I know this seems kind of random, but I'm curious as to how I'd go about doing something like this.
Is there a list somewhere of common practices for what each register should be used for? I still don't get them completely, but have noticed some people saying, for example, that a certain register should be used to store 'return values' from procedures - is there a comprehensive or at least informative list of such practices?
One of the reasons I'm learning assembly is to better understand what's going on behind my high level code. With that in mind - when I'm programming in c++, I'm often thinking about the stack and the heap. In assembly I know what the stack is - where's the 'heap'?

Some info: I'm using masm32 with WinAsm as an IDE, and I'm working on Windows 7. I have a lot of prior experience programming in higher level languages such as c++/java.

edit: Thanks for the help everyone, extremely informative as usual! Great stuff! One last thing though - I'm wondering what the difference is between the Stack Pointer, and the Base pointer, or ESP and EBP. Can someone help me out?

edit: I think I get it now... ESP always points to the top of the stack. However, you can point EBP at whatever you want. ESP is automatically handled but you can do whatever you want with EBP. For example:

push 6
push 5
push 4
mov EBP, ESP
push 3
push 2

In this scenario, EBP now points to the address holding 4, but ESP now points to the address holding 2.

In a real application 6, 5, and 4 could have been function arguments, whereas 3 and 2 could be local variables within that function.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苄①跕圉湢 2024-08-29 22:20:19

让我们尝试按顺序回答吧！

数据部分包含您希望系统在调用程序入口点之前自动初始化的任何内容。你是对的，通常全局变量在这里结束。零初始化数据通常不包含在可执行文件中，因为没有理由这样做 - 生成该空间所需的只是程序加载器的几个指令。一旦程序开始运行，ZI 和数据区域通常是可以互换的。 Wikipedia 有更多信息。
在汇编编程中，变量并不真正存在，至少在编写 C 代码时并不存在。你所拥有的只是你所做出的关于如何布置你的记忆的决定。变量可以位于堆栈上、内存中的某个位置，或者仅存在于寄存器中。
寄存器是处理器的内部数据存储。一般来说，您只能对处理器寄存器中的值进行操作。您可以将其内容加载到内存中或从内存中存储它们的内容，这是计算机工作方式的基本操作。这是一个简单的例子。此 C 代码：
```
int a = 5;
整数 b = 6;
int *d = (int *)0x12345678; // 假设 0x12345678 是有效的内存指针
*d = a + b;
```
可能会被翻译成一些（简化的）程序集：
```
加载 r1, 5
负载r2,6
加载r4，0x1234568
添加 r3、r1、r2
存储r4、r3
```
在这种情况下，您可以将寄存器视为变量，但一般来说，任何一个变量都不必始终保留在同一个寄存器中；根据您的日常工作的复杂程度，这甚至可能是不可能的。您需要将一些数据压入堆栈，弹出其他数据，等等。 “变量”是逻辑数据块，而不是它位于内存或寄存器等中的位置。
数组只是一个连续的内存块 - 对于本地数组，您只需适当递减堆栈指针即可。对于全局数组，您可以在数据部分声明该块。
有很多关于寄存器的约定 - 检查您平台的 ABI 或调用约定文档，了解有关如何正确使用它们的详细信息。您的汇编程序文档也可能包含信息。检查维基百科上的 ABI 文章。
您的汇编程序可以进行与任何 C 程序相同的系统调用，因此您只需调用 malloc() 即可从堆中获取内存。

Let's try to answer in order!

The data section contains anything that you want to be automatically initialized for you by the system before it calls the entry point of your program. You're right, normally global variables end up here. Zero-initialized data is generally not included in the executable file, since there's no reason to - a couple of directives to the program loader are all that's needed to generate that space. Once your program starts running, the ZI and data regions are generally interchangeable. Wikipedia has a lot more information.
Variables don't really exist when assembly programming, at least not in the sense they do when you're writing C code. All you have is the decisions you've made about how to lay out your memory. Variables can be on the stack, somewhere in memory, or just live only in registers.
Registers are the internal data storage of the processor. You can, in general, only do operations on values in processor registers. You can load and store their contents to and from memory, which is the basic operation of how your computer works. Here's a quick example. This C code:
```
int a = 5;
int b = 6;
int *d = (int *)0x12345678; // assume 0x12345678 is a valid memory pointer
*d = a + b;
```
Might get translated to some (simplified) assembly along the lines of:
```
load  r1, 5
load  r2, 6
load  r4, 0x1234568
add   r3, r1, r2
store r4, r3
```
In this case, you can think of the registers as variables, but in general it's not necessary that any one variable always stay in the same register; depending on how complicated your routine is, it may not even be possible. You'll need to push some data onto the stack, pop other data off, and so on. A 'variable' is that logical piece of data, not where it lives in memory or registers, etc.
An array is just a contiguous block of memory - for a local array, you can just decrement the stack pointer appropriately. For a global array, you can declare that block in the data section.
There are a bunch of conventions about registers - check your platform's ABI or calling convention document for details about how to use them correctly. Your assembler documentation might have information as well. Check the ABI article on wikipedia.
Your assembly program can make the same system calls any C program could, so you can just call malloc() to get memory from the heap.

回复收藏 0 原文

半夏半凉 2024-08-29 22:20:19

我想补充一下。计算机上的程序通常分为三个部分，尽管还有其他部分。

代码段 - .code、.text ：http://en.wikipedia.org/wiki/Code_segment

在计算中，代码段也
称为文本段或简称为
文本，是一个短语，用于指代
内存或目标文件的一部分
包含可执行指令。
它有固定的大小，通常是
只读。如果文本部分不是
只读，然后特定
架构允许自我修改
代码。只读代码是可重入的，如果
它可以由多个执行
同时进行处理。作为回忆
区域，代码段驻留在
记忆的下部或最深处
底部，以防止堆和
堆栈会因覆盖而溢出。

数据段 - .data ：http://en.wikipedia.org/wiki/Data_segment

数据段是其中的一个部分
目标文件中的程序或
内存，其中包含全局
变量和静态变量
由程序员初始化。它
具有固定的大小，因为所有的
本节中的数据由
程序编写前的程序员
已加载。但是，它不是只读的，
因为变量的值可以
在运行时被改变。这是在
与Rodata（恒定，
只读数据）部分，以及
代码段（也称为文本
段）。

BSS：http://en.wikipedia.org/wiki/.bss

在计算机编程中，.bss 或 bss
（最初代表块
由符号开始）被许多人使用
编译器和链接器作为名称
数据段的一部分包含
静态变量和全局变量
只充满了
最初的零值数据（即
当执行开始时）。常常是
称为“bss 部分”或
“bss 段”。程序加载器
初始化分配的内存
加载时的 bss 部分
程序。

正如其他人所描述的，寄存器是 CPU 存储数据或内存地址的设施。操作是在寄存器上执行的，例如 add eax, ebx，并且根据汇编语言，这意味着不同的事情。在本例中，这将转换为将 ebx 的内容添加到 eax 并将其存储在 eax 中（NASM 语法）。 GNU AS (AT&T) 中的等效项是：movl $ebx, $eax。不同的汇编方言有不同的规则和运算符。由于这个原因，我不是 MASM 的粉丝——它与 NASM、YASM 和 GNU AS 都非常不同。

与 C 之间并没有真正的交互作用。ABI 指定了这是如何发生的；例如，在 x86 (unix) 上，您会发现方法的参数被压入堆栈，而在 Unix 上的 x86-64 中，前几个参数将位于寄存器中。两个 ABI 都期望函数的结果存储在 eax/rax 寄存器中。

下面是一个针对 Windows 和 Linux 进行汇编的 32 位添加例程。

_Add
    push    ebp             ; create stack frame
    mov     ebp, esp
    mov     eax, [ebp+8]    ; grab the first argument
    mov     ecx, [ebp+12]   ; grab the second argument
    add     eax, ecx        ; sum the arguments
    pop     ebp             ; restore the base pointer
    ret

看到这里，你就明白我的意思了。 “返回”值可在 eax 中找到。相比之下，x64 版本将如下所示：

_Add
    push    rbp             ; create stack frame
    mov     rbp, rsp
    mov     eax, edi        ; grab the first argument
    mov     ecx, esi        ; grab the second argument
    add     eax, ecx        ; sum the arguments
    pop     rbp             ; restore the base pointer
    ret

有定义此类内容的文档。以下是 UNIX x64 ABI： http://www.x86-64.org/documentation /abi-0.99.pdf。我确信您可能会找到适用于您需要的任何处理器、平台等的 ABI。

如何在汇编中操作数组？指针算术。给定 eax 处的基地址，如果整数大小为 4 个字节，则下一个存储的整数将位于 [eax+4] 处。您可以使用对 malloc/calloc 的调用来创建此空间，或者调用内存分配系统调用，无论您的系统上有什么。

什么是“堆”？再次根据维基百科，它是为动态内存分配保留的内存区域。在调用 calloc、malloc 或内存分配系统调用之前，您不会在汇编程序中看到它，但它确实存在。

对不起这篇文章。

I'd like to add to this. Programs on a computer are typically split up into three sections, although there are others.

Code Segment - .code, .text : http://en.wikipedia.org/wiki/Code_segment

In computing, a code segment, also
known as a text segment or simply as
text, is a phrase used to refer to a
portion of memory or of an object file
that contains executable instructions.
It has a fixed size and is usually
read-only. If the text section is not
read-only, then the particular
architecture allows self-modifying
code. Read-only code is reentrant if
it can be executed by more than one
process at the same time. As a memory
region, a code segment resides in the
lower parts of memory or at its very
bottom, in order to prevent heap and
stack overflows from overwriting it.

Data Segment - .data : http://en.wikipedia.org/wiki/Data_segment

A data segment is one of the sections
of a program in an object file or in
memory, which contains the global
variables and static variables that
are initialized by the programmer. It
has a fixed size, since all of the
data in this section is set by the
programmer before the program is
loaded. However, it is not read-only,
since the values of the variables can
be altered at runtime. This is in
contrast to the Rodata (constant,
read-only data) section, as well as
the code segment (also known as text
segment).

BSS : http://en.wikipedia.org/wiki/.bss

In computer programming, .bss or bss
(which originally stood for Block
Started by Symbol) is used by many
compilers and linkers as the name of a
part of the data segment containing
static variables and global variables
that are filled solely with
zero-valued data initially (i. e.,
when execution begins). It is often
referred to as the "bss section" or
"bss segment". The program loader
initializes the memory allocated for
the bss section when it loads the
program.

Registers are, as described by others, facilities of the CPU to store data or a memory address. Operations are performed upon registers, such as add eax, ebx and depending on the assembly dialect, that means different things. In this case, this translates to add the contents of ebx to eax and store it in eax (NASM syntax). The equivalent in GNU AS (AT&T) is: movl $ebx, $eax. Different dialects of assembly have different rules and operators. I'm not a fan of MASM for this reason - it is very different to both NASM, YASM and GNU AS.

There isn't really an in general interaction with C. ABI's designate how this happens; for example, on x86 (unix) you'll find a method's arguments pushed onto the stack, whereas in x86-64 on Unix the first few arguments will be positioned in registers. Both ABIs expect the result of the function to be stored in the eax/rax register.

Here's a 32-bit add routine that assembles for both Windows and Linux.

_Add
    push    ebp             ; create stack frame
    mov     ebp, esp
    mov     eax, [ebp+8]    ; grab the first argument
    mov     ecx, [ebp+12]   ; grab the second argument
    add     eax, ecx        ; sum the arguments
    pop     ebp             ; restore the base pointer
    ret

Here, you can see what I mean. The "return" value is found in eax. By contrast, the x64 version would look like this:

_Add
    push    rbp             ; create stack frame
    mov     rbp, rsp
    mov     eax, edi        ; grab the first argument
    mov     ecx, esi        ; grab the second argument
    add     eax, ecx        ; sum the arguments
    pop     rbp             ; restore the base pointer
    ret

There are documents that define this sort of thing. Here's the UNIX x64 ABI: http://www.x86-64.org/documentation/abi-0.99.pdf. I'm sure you could probably find ABIs for any processor, platform etc you needed.

How do you operate on an array in assembly? Pointer arithmetic. Given a base address at eax the next stored integer would be at [eax+4] if the integer is 4 bytes in size. You could create this space using calls up to malloc/calloc, or you call the memory allocation system call, whatever that is on your system.

What is the 'heap'? According to wikipedia again, it's the area of memory reserved for dynamic memory allocation. You don't see it in your assembly program until you call calloc, malloc or the memory allocation system call, but it is there.

Sorry for the essay.

回复收藏 0 原文

~没有更多了~