使用指针作为函数参数时的段前缀

发布于 2024-10-15 11:42:52 字数 499 浏览 4 评论 0原文

我有一个汇编/c 问题。我刚刚读到了有关段前缀的内容,例如 ds:varX 等。前缀对于逻辑地址的计算很重要。我也读到,默认值是“ds”,一旦您使用 ebp 寄存器来计算地址,就会使用“ss”。对于代码“cs”是默认值。这一切都是有道理的。 现在我在 c 中有以下内容:

int x; // some static var in ds

void test(int *p){
...
*p =5;

}

... main(){

test(&x);
//now x is 5
}

如果您现在考虑测试函数的实现...您会在堆栈上得到指向 x 的指针。如果要取消引用指针,首先从堆栈中获取指针值(x 的地址)并将其保存在例如 eax 中。然后您可以取消引用 eax 来更改 x 的值。但是,c 编译器如何知道给定的指针(地址)是否引用堆栈上的内存(例如,如果我从另一个函数调用 test 并将局部变量的地址作为测试参数推送)或数据段?完整的逻辑地址是如何计算的?该函数无法知道给定地址偏移量与哪个段相关..?!

I have an assembler/c question. I just read about segment prefixes, for example ds:varX and so on. The prefix is important for the calculation of the logical address. I read too, that default is "ds" and as soon as you use the ebp register to calculate an address, "ss" is used. For code "cs" is default. That all makes sense.
Now I have the following in c:

int x; // some static var in ds

void test(int *p){
...
*p =5;

}

... main(){

test(&x);
//now x is 5
}

If you now think about the implemention of test-function... you get the pointer to x on the stack. If you want to dereference the pointer, you first get the pointer-value(address of x) from the stack and save it in eax for example. Then you can dereference eax to change the value of x. But how does the c-compiler know if the given pointer(address) references memory on the stack (for example if i call test from another function and push the address of a localvariable as parameter for test) or the data segment? How is the full logical address calculated? The function cannot know which segment the given address offset relates to..?!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

橘味果▽酱 2024-10-22 11:42:52

一般情况下,在分段平台上,您不能按照您的建议将指针值“读入eax”。在分段平台上,指针通常会保存段值和偏移值,这意味着读取这样的指针意味着初始化至少两个寄存器 - 段和偏移量 - 而不仅仅是一个eax< /代码>。

但在特定情况下,它取决于所谓的内存模型。分段平台上的编译器支持多种内存模型。

对于初学者来说,由于显而易见的原因,只要段寄存器保存正确的值,使用哪个段寄存器并不重要。例如,如果 DSES 寄存器内部保存相同的值,则 DS: 将指向内存中的相同位置如ES:

例如,在所谓的“微小”内存模型中,所有段寄存器都保存相同的值,即所有内容 - 代码、数据、堆栈 - 都适合一个段(这就是为什么它被称为“微小的”)。在这个内存模型中,每个指针只是该段中的一个偏移量,当然,使用哪个段寄存器来处理该偏移量并不重要。

在“更大”的内存模型中,您可以为代码 (CS)、堆栈 (SS) 和数据 (DS) 提供单独的段。但在这样的内存模型上,指针对象通常会同时保存其内部地址的偏移量和段部分。在您的示例中,指针 p 实际上是一个由两部分组成的对象,同时保存段值和偏移值。为了取消引用此类指针,编译器将生成从 p 读取段值和偏移值并使用它们的代码。例如,段值将被读入ES寄存器,而偏移值将被读入si寄存器。然后,代码将访问 ES:[di] 以读取 *p 值。

还有“中间”内存模型,其中代码存储在一个段 (CS) 中,而数据和堆栈都存储在另一段中,因此 DSSS 将保持相同的值。显然,在该平台上,无需区分 DSSS

在最大的内存模型中,您可以有多个数据段。在这种情况下,很明显,分段模式下正确的数据寻址实际上并不是选择正确的段寄存器的问题(正如您似乎相信的那样),而是获取几乎任何段的问题在执行访问之前注册并用正确的值初始化它。

In general case, on a segmented platform your can't just read the pointer value "into eax" as you suggest. On a segmented platform the pointer would generally hold both the segment value and offset value, meaning that reading such a pointer would imply initializing at least two registers - segment and offset - not just one eax.

But in specific cases it depends on so called the memory model. Compilers on segmented platforms supported several memory models.

For starters, for obvious reasons it does not matter which segment register you use as long as the segment register holds the correct value. For example, if DS and ES registers hold the same value inside, then DS:<offset> will point to the same location in memory as ES:<offset>.

In so called "tiny" memory model, for one example, all segment registers were holding the same value, i.e. everything - code, data, stack - would fit in one segment (which is why it was called "tiny"). In this memory model each pointer was just an offset in this segment and, of course, it simply didn't matter which segment register to use with that offset.

In "larger" memory models you could have separate segments for code (CS), stack (SS) and data (DS). But on such memory models pointer object would normally hold both the offset and segment part of the address inside of it. In your example pointer p would actually be a two-part object, holding both segment value and offset value at the same time. In order to dereference such pointer the compiler would generate the code that would read both segment and offset values from p and use both of them. For example, the segment value would be read into ES register, while the offset value would be read into si register. The code would then access ES:[di] in order to read *p value.

There were also "intermediate" memory models, when code would be stored in one segment (CS), while data and stack would both be stored in another segment, so DS and SS would hold the same value. On that platform, obviously, there was no need to differentiate between DS and SS.

In the largest memory models you could have multiple data segments. In this case it is rather obvious that proper data addressing in segmented mode is not really a matter of choosing the proper segment register (as you seem to believe), but rather a matter of taking pretty much any segment register and initializing it with the correct value before performing the access.

殤城〤 2024-10-22 11:42:52

AndreyT 所描述的是 DOS 时代发生的事情。如今,现代操作系统使用所谓的平面内存模型(或者更确切地说是非常相似的东西),其中所有(保护模式)段都经过设置,以便它们都可以访问整个地址空间(即:它们的基数为 0,限制 = 整个地址空间)。

What AndreyT described was what happened on DOS days. These days, modern operating systems use the so called flat memory model (or rather something very similar), in which all (protected mode) segments are setup so that they all can access the whole address space (i.e: they have a base of 0 and a limit = the whole address space).

来日方长 2024-10-22 11:42:52

在具有分段内存模型的机器上,C 实现必须执行以下操作之一才能保持一致:

  • 在每个指针中存储完整地址(带段),或者
  • 确保将用于地址为可以通过数据段访问,可以是在相同的相对地址,也可以是通过编译器在获取局部变量的地址时可以应用的某个魔术偏移量,或者
  • 不使用其地址被获取的局部变量的堆栈,并执行隐藏的 malloc /free 在每个函数入口/返回时(对 longjmp 进行特殊处理!)。

也许还有其他方法可以做到这一点,但这是我能想到的唯一方法。分段内存模型确实与 C 语言非常不相容,并且它们被放弃是有充分理由的。

On a machine with a segmented memory model, the C implementation must do one of the following things to be conformant:

  • Store the full address (with segment) in each pointer, OR
  • Ensure that all stack addresses that will be used for variables whose addresses are taken can be accessed via the data segment, either at the same relative address or via some magic offset the compiler can apply when taking the address of local variables, OR
  • Not use the stack for local variables whose addresses are taken, and perform a hidden malloc/free on every function entry/return (with special handling for longjmp!).

Perhaps there are other ways of doing it, but these are the only ones I can think of. Segmented memory models were really pretty disagreeable with C, and they were abandoned for good reason.

指尖凝香 2024-10-22 11:42:52

分段是 Intel 16 位 8086 处理器的遗留产物。实际上,您可能在虚拟内存中进行操作,其中所有内容都只是线性地址。使用 -S 标志进行编译并查看生成的程序集。

Segmentation is the legacy artifact of the Intel 16-bit 8086 processor. In reality, you probably operate in virtual memory, where everything is just a linear address. Compile with -S flag and see the resulting assembly.

听闻余生 2024-10-22 11:42:52

由于在取消引用之前将地址移动到 eax,因此它默认为 ds 段。然而,正如尼古拉提到的,在用户级代码中,这些段可能都指向相同的地址。

Since you move the address to eax before dereferencing it, it defaults to the ds segment. However, as Nikolai mentioned, in user level code the segments probably all point to the same address.

勿忘心安 2024-10-22 11:42:52

在x86下,直接使用堆栈将使用堆栈段,但间接使用则将其视为数据段。如果您反汇编指针​​取消引用并写入堆栈节指针,您可以看到这一点。在 x86 cs 下,由于线性寻址,ss 和 ds 的处理方式几乎相同(至少在非内核模式下)。英特尔参考手册还应该有一个关于段寻址的部分

Under x86, direct usage of the stack will use the stack segment, but indirect usage treats it as a data segment. You can see this if you disassemble a pointer dereference and write to a stack section pointer. Under x86 cs, ss and ds are treated pretty much the same(atleast in non kernel modes) due to linear addressing. the intel reference manuals should also have a section on segment addressing

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文