为什么进程的地址空间分为四个段(文本、数据、堆栈和堆)?
为什么进程的地址空间必须分为四个段(文本、数据、堆栈和堆)?优点是什么?是否有可能只有一整个大片段?
Why does a process's address space have to divide into four segments (text, data, stack and heap)? What is the advandatage? is it possible to have only one whole big segment?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
将程序分成内存中的多个部分有多种原因。
其中之一是指令和数据存储器在架构上可以是不同的和不连续的,即使用CPU内部和外部的不同指令和电路进行读取和写入,形成两个不同的地址空间(即从地址0和0读取代码)从地址 0 读取数据通常会从不同的存储器返回两个不同的值)。
另一个是可靠性/安全性。您很少希望程序的代码和常量数据发生变化。大多数情况下,发生这种情况是因为出现了问题(无论是在程序本身还是在其输入中,这可能是恶意构造的)。您想要防止这种情况发生并知道是否有人尝试。同样,您不希望可以更改的数据区域是可执行的。如果是,并且程序中存在安全错误,则当恶意代码将其作为数据进入程序数据区域并触发这些安全错误(例如缓冲区溢出)时,程序很容易被迫执行有害的操作。
另一个是存储......在许多程序中,许多数据区域根本没有初始化或初始化为一个公共预定义值(通常为 0)。当程序加载并即将启动时,必须为这些数据区域保留内存,但这些区域不需要存储在磁盘上,因为那里没有有意义的数据。
在某些系统上,您可能将所有内容都集中在一处(部分/段/等)。这里一个值得注意的例子是 MSDOS,其中 .COM 风格的程序没有任何结构,只是它们的大小必须小于约 64KB,并且第一个可执行指令必须出现在文件的最开头,并假设其位置对应于 IP =0x100(其中IP是指令指针寄存器)。代码和数据如何在 .COM 程序中放置和交错并不重要,而是由程序员决定。
还有其他架构工件,例如 x86 段。同样,MSDOS 是处理它们的操作系统的一个很好的例子。 .EXE 风格的程序中可能有多个段,这些段直接对应于 x86 CPU 段、实模式寻址方案,其中通过称为段的 64KB 长“窗口”查看内存。这些窗口/段的位置与 CPU 段寄存器的值相关。通过改变段寄存器值,您可以移动“窗口”。为了访问超过 64KB,需要使用不同的段寄存器值,这通常意味着 .EXE 中有多个段(不仅可以是一个用于代码的段和一个用于数据的段,还可以是其中任一段的多个段)。
There are multiple reasons for splitting programs into parts in memory.
One of them is that instruction and data memories can be architecturally distinct and discontiguous, that is, read and written from/to using different instructions and circuitry inside and outside of the CPU, forming two different address spaces (i.e. reading code from address 0 and reading data from address 0 will typically return two different values, from different memories).
Another is reliability/security. You rarely want the program's code and constant data to change. Most of the time when that happens, it happens because something is wrong (either in the program itself or in its inputs, which may be maliciously constructed). You want to prevent that from happening and know if there are any attempts. Likewise you don't want the data areas that can change to be executable. If they are and there are security bugs in the program, the program can be easily forced to do something harmful when malicious code makes it into the program data areas as data and triggers those security bugs (e.g. buffer overflows).
Yet another is storage... In many programs a number of data areas aren't initialized at all or are initialized to one common predefined value (often 0). Memory has to be reserved for these data areas when the program is loaded and is about to start, but these areas don't need to be stored on the disk, because there's no meaningful data there.
On some systems you may have everything in one place (section/segment/etc). One notable example here is MSDOS, where .COM-style programs have no structure other than that they have to be less than about 64KB in size and the first executable instruction must appear at the very beginning of file and assume that its location corresponds to IP=0x100 (where IP is the instruction pointer register). How code and data are placed and interleaved in a .COM program is unimportant and up to the programmer.
There are other architectural artifacts such as x86 segments. Again, MSDOS is a good example of an OS that deals with them. .EXE-style programs in it may have multiple segments in them that correspond directly to the x86 CPU segments, to the real-mode addressing scheme, in which memory is viewed through 64KB-long "windows" known as segments. The position of these windows/segments is relative to the value of the CPU's segment registers. By altering the segment register values you can move the "windows". In order to access more than 64KB one needs to use different segment register values and that often implies having multiple segments in the .EXE (can be not just one segment for code and one for data, but also multiple segments for either of them).
至少文本和数据段是分开的,以防止存储在变量内的恶意代码运行。
指令(编译后的代码)存储在文本段中,而变量的内容存储在数据段中,后者永远不会被执行,只能读取和写入。
更多信息请参见此处。
At least the text and data segments are separated to prevent malicious code that's stored inside a variable from being run.
Instructions (compiled code) are stored in the text segment, while the contents of your variables are stored in a data segment, the latter of which never gets executed, only read from and written to.
A little more info here.
这种区别难道不是一个大的、hacky 的解决方法,用于将安全性修补到数据和指令共享相同内存的冯诺依曼架构中吗?
Isn't this distinction just a big, hacky workaround for patching security into the von-Neumann architecture where data and instructions share the same memory?