使用指针作为函数参数时的段前缀

发布于 2024-10-15 11:42:52 字数 499 浏览 6 评论 0原文

我有一个汇编/c 问题。我刚刚读到了有关段前缀的内容，例如 ds:varX 等。前缀对于逻辑地址的计算很重要。我也读到，默认值是“ds”，一旦您使用 ebp 寄存器来计算地址，就会使用“ss”。对于代码“cs”是默认值。这一切都是有道理的。现在我在 c 中有以下内容：

int x; // some static var in ds

void test(int *p){
...
*p =5;

}

... main(){

test(&x);
//now x is 5
}

如果您现在考虑测试函数的实现...您会在堆栈上得到指向 x 的指针。如果要取消引用指针，首先从堆栈中获取指针值（x 的地址）并将其保存在例如 eax 中。然后您可以取消引用 eax 来更改 x 的值。但是，c 编译器如何知道给定的指针（地址）是否引用堆栈上的内存（例如，如果我从另一个函数调用 test 并将局部变量的地址作为测试参数推送）或数据段？完整的逻辑地址是如何计算的？该函数无法知道给定地址偏移量与哪个段相关..？！

原文

I have an assembler/c question. I just read about segment prefixes, for example ds:varX and so on. The prefix is important for the calculation of the logical address. I read too, that default is "ds" and as soon as you use the ebp register to calculate an address, "ss" is used. For code "cs" is default. That all makes sense.
Now I have the following in c:

int x; // some static var in ds

void test(int *p){
...
*p =5;

}

... main(){

test(&x);
//now x is 5
}

If you now think about the implemention of test-function... you get the pointer to x on the stack. If you want to dereference the pointer, you first get the pointer-value(address of x) from the stack and save it in eax for example. Then you can dereference eax to change the value of x. But how does the c-compiler know if the given pointer(address) references memory on the stack (for example if i call test from another function and push the address of a localvariable as parameter for test) or the data segment? How is the full logical address calculated? The function cannot know which segment the given address offset relates to..?!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘味果▽酱 2024-10-22 11:42:52

一般情况下，在分段平台上，您不能按照您的建议将指针值“读入eax”。在分段平台上，指针通常会保存段值和偏移值，这意味着读取这样的指针意味着初始化至少两个寄存器 - 段和偏移量 - 而不仅仅是一个eax< /代码>。

但在特定情况下，它取决于所谓的内存模型。分段平台上的编译器支持多种内存模型。

对于初学者来说，由于显而易见的原因，只要段寄存器保存正确的值，使用哪个段寄存器并不重要。例如，如果 DS 和 ES 寄存器内部保存相同的值，则 DS: 将指向内存中的相同位置如ES:。

例如，在所谓的“微小”内存模型中，所有段寄存器都保存相同的值，即所有内容 - 代码、数据、堆栈 - 都适合一个段（这就是为什么它被称为“微小的”）。在这个内存模型中，每个指针只是该段中的一个偏移量，当然，使用哪个段寄存器来处理该偏移量并不重要。

在“更大”的内存模型中，您可以为代码 (CS)、堆栈 (SS) 和数据 (DS) 提供单独的段。但在这样的内存模型上，指针对象通常会同时保存其内部地址的偏移量和段部分。在您的示例中，指针 p 实际上是一个由两部分组成的对象，同时保存段值和偏移值。为了取消引用此类指针，编译器将生成从 p 读取段值和偏移值并使用它们的代码。例如，段值将被读入ES寄存器，而偏移值将被读入si寄存器。然后，代码将访问 ES:[di] 以读取 *p 值。

还有“中间”内存模型，其中代码存储在一个段 (CS) 中，而数据和堆栈都存储在另一段中，因此 DS 和 SS 将保持相同的值。显然，在该平台上，无需区分 DS 和 SS。

在最大的内存模型中，您可以有多个数据段。在这种情况下，很明显，分段模式下正确的数据寻址实际上并不是选择正确的段寄存器的问题（正如您似乎相信的那样），而是获取几乎任何段的问题在执行访问之前注册并用正确的值初始化它。

In general case, on a segmented platform your can't just read the pointer value "into eax" as you suggest. On a segmented platform the pointer would generally hold both the segment value and offset value, meaning that reading such a pointer would imply initializing at least two registers - segment and offset - not just one eax.

But in specific cases it depends on so called the memory model. Compilers on segmented platforms supported several memory models.

For starters, for obvious reasons it does not matter which segment register you use as long as the segment register holds the correct value. For example, if DS and ES registers hold the same value inside, then DS:<offset> will point to the same location in memory as ES:<offset>.

In so called "tiny" memory model, for one example, all segment registers were holding the same value, i.e. everything - code, data, stack - would fit in one segment (which is why it was called "tiny"). In this memory model each pointer was just an offset in this segment and, of course, it simply didn't matter which segment register to use with that offset.

In "larger" memory models you could have separate segments for code (CS), stack (SS) and data (DS). But on such memory models pointer object would normally hold both the offset and segment part of the address inside of it. In your example pointer p would actually be a two-part object, holding both segment value and offset value at the same time. In order to dereference such pointer the compiler would generate the code that would read both segment and offset values from p and use both of them. For example, the segment value would be read into ES register, while the offset value would be read into si register. The code would then access ES:[di] in order to read *p value.

There were also "intermediate" memory models, when code would be stored in one segment (CS), while data and stack would both be stored in another segment, so DS and SS would hold the same value. On that platform, obviously, there was no need to differentiate between DS and SS.

In the largest memory models you could have multiple data segments. In this case it is rather obvious that proper data addressing in segmented mode is not really a matter of choosing the proper segment register (as you seem to believe), but rather a matter of taking pretty much any segment register and initializing it with the correct value before performing the access.

回复收藏 0 原文