16 位 C 编译器如何工作?
C 的内存模型及其对指针算术等的使用,似乎是对平面地址空间进行建模。 16 位计算机使用分段内存访问。 16位C编译器如何处理这个问题并从C程序员的角度模拟平面地址空间?例如,以下代码在 8086 上大致会编译成什么汇编语言指令?
long arr[65536]; // Assume 32 bit longs.
long i;
for(i = 0; i < 65536; i++) {
arr[i] = i;
}
C's memory model, with its use of pointer arithmetic and all, seems to model flat address space. 16-bit computers used segmented memory access. How did 16-bit C compilers deal with this issue and simulate a flat address space from the perspective of the C programmer? For example, roughly what assembly language instructions would the following code compile to on an 8086?
long arr[65536]; // Assume 32 bit longs.
long i;
for(i = 0; i < 65536; i++) {
arr[i] = i;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
他们没有。相反,他们使分段对 C 程序员可见,通过使用多种类型的指针来扩展语言:
近
、远
和巨大
。near
指针只是一个偏移量,而far
和huge
指针是段和偏移量的组合。有一个编译器选项可以设置 内存模型,它确定默认指针类型是近还是远。即使在今天,在 Windows 代码中,您仍会经常看到像
LPCSTR
(表示const char*
)这样的类型定义。 “LP”是 16 位时代的遗留物;它代表“长(远)指针”。They didn't. Instead, they made segmentation visible to the C programmer, extending the language by having multiple types of pointers:
near
,far
, andhuge
. Anear
pointer was an offset only, whilefar
andhuge
pointers were a combined segment and offset. There was a compiler option to set the memory model, which determined whether the default pointer type was near or far.In Windows code, even today, you'll often see typedefs like
LPCSTR
(forconst char*
). The "LP" is a holdover from the 16-bit days; it stands for "Long (far) Pointer".C 内存模型并不以任何方式暗示平面地址空间。但从来没有。事实上,C 语言规范是专门为允许非平面地址空间而设计的。
在使用分段地址空间的最简单的实现中,最大连续对象的大小将受到段大小的限制(在 16 位平台上为 65536 字节)。这意味着此类实现中的 size_t 将为 16 位,并且您的代码根本无法编译,因为您尝试声明的对象的大小大于允许的最大值。
更复杂的实现将支持所谓的“巨大”内存模型。您会看到,在分段内存模型上寻址任何大小的连续内存块确实没有问题,只是需要在指针算术上做一些额外的工作。因此,在巨大的内存模型中,实现将做出这些额外的努力,这将使代码变慢一些,但同时将允许寻址几乎任何大小的对象。所以,你的代码可以完美编译。
C memory model does not in any way imply flat address space. It never did. In fact, C language specification is specifically designed to allow non-flat address spaces.
In the most trivial implementation with segmented address space, the size of the largest continuous object would be limited by the size of the segment (65536 bytes on a 16 bit platform). This means that
size_t
in such implementation would be 16 bit, and that your code simply would not compile, since you are attempting to declare an object that has larger size than the allowed maximum.A more complex implementation would support so called huge memory model. You see, there's really no problem addressing continuous memory blocks of any size on a segmented memory model, it just requires some extra efforts in pointer arithmetics. So, within the huge memory model, the implementation would make those extra efforts, which would make the code a bit slower, but at the same time would allow addressing objects of virtually any size. So, your code would compile perfectly fine.
真正的 16 位环境使用可到达任何地址的 16 位指针。示例包括 PDP-11、6800 系列(6802、6809、68HC11)和 8085。这是一个干净高效的环境,就像简单的 32 位架构一样。
80x86 系列在所谓的“实模式”下强制我们采用混合 16 位/20 位地址空间,即本机 8086 寻址空间。处理这个问题的常用机制是将指针类型增强为两种基本类型:near(16 位指针)和far(32 位指针)。代码和数据指针的默认值可以通过“内存模型”批量设置:
tiny
、small
、compact
、medium
、far
和huge
(某些编译器不支持所有模型)。tiny
内存模型对于整个空间(代码 + 数据 + 堆栈)小于 64K 的小型程序非常有用。所有指针(默认情况下)都是 16 位或near
;指针与整个程序的段值隐式关联。small
模型假设data+stack小于64K且在同一个段;代码段仅包含代码,因此最多可以有 64K,最大内存占用为 128K。代码指针靠近
并且与CS(代码段)隐式关联。数据指针也靠近
并且与DS(数据段)相关联。medium
模型具有最多 64K 的数据 + 堆栈(如小型),但可以具有任意数量的代码。数据指针为 16 位,隐含地与数据段相关联。代码指针是 32 位远指针,并且具有一个段值,具体取决于链接器如何设置代码组(令人讨厌的簿记麻烦)。compact
模型是媒介的补充:少于 64K 的代码,但可以容纳任意数量的数据。数据指针是远
,代码指针是近
。在
large
或huge
模型中,指针的默认子类型是32位或far
。主要区别在于大指针总是自动标准化,因此增加它们可以避免 64K 回绕问题。请参阅此。The true 16-bit environments use 16 bit pointers which reach any address. Examples include the PDP-11, 6800 family (6802, 6809, 68HC11), and the 8085. This is a clean and efficient environment, just like a simple 32-bit architecture.
The 80x86 family forced upon us a hybrid 16-bit/20-bit address space in so-called "real mode"—the native 8086 addressing space. The usual mechanism to deal with this was enhancing the types of pointers into two basic types,
near
(16-bit pointer) andfar
(32-bit pointer). The default for code and data pointers can be set in bulk by a "memory model":tiny
,small
,compact
,medium
,far
, andhuge
(some compilers do not support all models).The
tiny
memory model is useful for small programs in which the entire space (code + data + stack) is less than 64K. All pointers are (by default) 16 bits ornear
; a pointer is implicitly associated with a segment value for the whole program.The
small
model assumes that data + stack is less than 64K and in the same segment; the code segment contains only code, so can have up to 64K as well, for a maximum memory footprint of 128K. Code pointers arenear
and implicitly associated with CS (the code segment). Data pointers are alsonear
and associated with DS (the data segment).The
medium
model has up to 64K of data + stack (like small), but can have any amount of code. Data pointers are 16 bits and are implicitly tied to the data segment. Code pointers are 32 bitfar
pointers and have a segment value depending on how the linker has set up the code groups (a yucky bookkeeping hassle).The
compact
model is the complement of medium: less than 64K of code, but any amount of data. Data pointers arefar
and code pointers arenear
.In
large
orhuge
model, the default subtype of pointers are 32 bit orfar
. The main difference is that huge pointers are always automatically normalized so that incrementing them avoids problems with 64K wrap arounds. See this.在 DOS 16 位中,我不记得能够做到这一点。您可以拥有多个大小均为 64K(字节)的内容(因为可以调整段并将偏移量归零),但不记得是否可以使用单个数组跨越边界。直到我们可以编译 32 位 DOS 程序(在 386 或 486 处理器上)时,你可以随意分配任何你想要的东西并进入数组的深度都没有实现。也许除 microsoft 和 borland 之外的其他操作系统和编译器可以生成大于 64kbytes 的平面数组。 Win16我不记得自由,直到win32来袭,也许我的记忆已经生锈了...无论如何,你很幸运或富有,拥有1兆内存,256kbyte或512kbyte的机器并不是闻所未闻的。你的软盘驱动器最终只有一小部分兆到 1.44 兆,而你的硬盘(如果有的话)有十几兆或几兆,所以你只是不会经常计算那么大的东西。
我记得我在学习 DNS 时遇到的特殊挑战,当你可以下载地球上所有注册域名的整个 DNS 数据库时,事实上你必须建立自己的 dns 服务器,这在当时几乎是需要拥有一个网络的。地点。该文件有 35 兆字节,而我的硬盘有 100 兆字节,再加上 dos 和 windows 占用了其中一些。大概有1、2兆内存,当时可能可以执行32位dos程序。部分如果我想解析我在多次传递中执行的 ascii 文件,但每次传递的输出都必须转到另一个文件,并且我必须删除先前的文件以便在磁盘上为下一个文件留出空间。标准主板上有两个磁盘控制器,一个用于硬盘,一个用于 CDROM 驱动器,这玩意儿也不便宜,如果你能买得起另一个硬盘和磁盘控制器卡的话,没有很多备用的 ISA 插槽。
甚至存在用 C 读取 64kbytes 的问题,您通过 fread 传递您想要以 16 位 int 读取的字节数,这意味着 0 到 65535 而不是 65536 字节,如果您没有读取均匀大小的扇区,那么性能会急剧下降,因此您只需一次读取 32kbytes 即可最大限度地提高性能,直到 dos32 时代才出现 64k,此时您终于确信传递给 fread 的值现在是一个 32 位数字,并且编译器不会砍掉高 16 位,而只会使用较低的 16 位(如果您使用足够的编译器/版本,这种情况经常发生)。目前,我们在 32 位到 64 位转换中遇到了与 16 位到 32 位转换类似的问题。最有趣的是像我这样的人的代码,他们了解到从 16 位 int 到 32 位 int 改变了大小,但 unsigned char 和 unsigned long 没有,所以你改编并很少使用 int ,以便你的程序可以编译并工作16 位和 32 位。 (那一代人的代码对于那些也经历过这一代人并使用相同技巧的其他人来说有点突出)。但对于 32 到 64 的转换,情况正好相反,未重构为使用 uint32 类型声明的代码会受到影响。
阅读 wallyk 刚刚发表的答案,包裹着的巨大指针确实敲响了警钟,而且并不总是能够编译为巨大的。小是我们今天所熟悉的平面内存模型,并且与今天一样很容易,因为您不必担心段。因此,如果可以的话,最好进行小型编译。您仍然没有足够的内存、磁盘或软盘空间,因此您通常不会处理那么大的数据。
并同意另一个答案,段偏移量是 8088/8086 intel。整个世界还没有被英特尔统治,所以还有其他平台,只有平坦的内存空间,或者在硬件(处理器之外)中使用其他技巧来解决问题。由于段/偏移量,英特尔能够使用 16 位的时间比它应有的时间更长。分段/偏移可以用它做一些很酷且有趣的事情,但它和其他任何事情一样痛苦。你要么简化你的生活并生活在平坦的内存空间中,要么你不断担心分段边界。
In DOS 16 bit, I dont remember being able to do that. You could have multiple things that were each 64K (bytes)(because the segment could be adjusted and the offset zeroed) but dont remember if you could cross the boundary with a single array. The flat memory space where you could willy nilly allocate whatever you wanted and reach as deep as you liked into an array didnt happen until we could compile 32 bit DOS programs (on 386 or 486 processors). Perhaps other operating systems and compilers other than microsoft and borland could generate flat arrays greater than 64kbytes. Win16 I dont remember that freedom until win32 hit, perhaps my memory is getting rusty...You were lucky or rich to have a megabyte of memory anyway, a 256kbyte or 512kbyte machine was not unheard of. Your floppy drive had a fraction of a meg to 1.44 meg eventually, and your hard disk if any had a dozen or few meg, so you just didnt compute thing that large that often.
I remember the particular challenge I had learning about DNS when you could download the entire DNS database of all registered domain names on the planet, in fact you had to to put up your own dns server which was almost required at the time to have a web site. That file was 35megabytes, and my hard disk was 100megabytes, plus dos and windows chewing up some of that. Probably had 1 or 2 meg of memory, might have been able to do 32 bit dos programs at the time. Part if it was me wanting to parse the ascii file which I did in multiple passes, but each pass the output had to go to another file, and I had to delete the prior file to have room on the disk for the next file. Two disk controllers on a standard motherboard, one for the hard disk and one for the cdrom drive, here again this stuff wasnt cheap, there were not a lot of spare isa slots if you could afford another hard disk and disk controller card.
There was even the problem of reading 64kbytes with C you passed fread the number of bytes you wanted to read in a 16 bit int, which meant 0 to 65535 not 65536 bytes, and performance dropped dramatically if you didnt read in even sized sectors so you just read 32kbytes at a time to maximize performance, 64k didnt come until well into the dos32 days when you were finally convinced that the value passed to fread was now a 32 bit number and the compiler wasnt going to chop off the upper 16 bits and only use the lower 16 bits (which happened often if you used enough compilers/versions). We are currently suffering similar problems in the 32 bit to 64 transition as we did with the 16 to 32 bit transition. What is most interesting is the code from the folks like me that learned that going from 16 to 32 bit int changed size, but unsigned char and unsigned long did not, so you adapted and rarely used int so that your programs would compile and work for both 16 and 32 bit. (The code from folks from that generation kind of stands out to other folks that also lived through it and used the same trick). But for the 32 to 64 transition it is the other way around and code not refactored to use uint32 type declarations are suffering.
Reading wallyk's answer that just came in, the huge pointer thing that wrapped around does ring a bell, also not always being able to compile for huge. small was the flat memory model we are comfortable with today, and as with today was easy because you didnt have to worry about segments. So it was a desireable to compile for small when you could. You still didnt have a lot of memory or disk or floppy space so you just didnt normally deal with data that large.
And agreeing with another answer, the segment offset thing was 8088/8086 intel. The whole world was not yet dominated by intel, so there were other platforms that just had a flat memory space, or used other tricks perhaps in hardware (outside the processor) to solve the problem. Because of the segment/offset intel was able to ride the 16 bit thing longer than it probably should have. Segment/offset had some cool and interesting things you could do with it, but it was as much a pain as anything else. You either simplified your life and lived in a flat memory space or you constantly worried about segment boundaries.
真正确定旧 x86 上的地址大小有点棘手。您可以说它是 16 位,因为可以对地址执行的算术必须适合 16 位寄存器。您也可以说它是 32 位,因为实际地址是根据 16 位通用寄存器和 16 位段寄存器计算的(所有 32 位都很重要)。您也可以直接说它是 20 位,因为段寄存器左移 4 位并添加到 gp 寄存器中以进行硬件寻址。
实际上,您选择其中哪一个并不重要,因为它们都是 c 抽象机的大致相同的近似值。有些编译器允许您选择每次编译时使用的内存模型,而其他编译器则仅假设 32 位地址,然后仔细检查可能溢出 16 位的操作是否会发出正确处理该情况的指令。
Really pinning down the address size on old x86's is sort of tricky. You could say that its 16 bit, because the arithmetic you can perform on an address must fit in a 16 bit register. You could also say that it's 32 bit, because actual addresses are computed against a 16 bit general purpose register and 16 bit segment register (all 32 bits are significant). You could also just say it's 20 bit, because the segment registers are shifted 4 bits left and added to the gp registers for hardware addressing.
It actually doesn't matter that much which one of these you chose, because they are all roughly equal approximations of the c abstract machine. Some compilers let you pick a memory model you were using per compilation, while others just assume 32 bit addresses and then carefully check that operations that could overflow 16 bits emit instructions that handle that case correctly.
查看此维基百科条目。关于远指针。基本上,它可以指示一个段和一个偏移量,从而可以跳转到另一个段。
Check out this wikipedia entry. About Far pointers. Basically, its possible to indicate a segment and an offset, making it possible to jump to another segment.