x86-64 和远调用/跳转
快速总结:在 x86-64 模式下,远跳是否与 x86-32 模式下一样慢?
在 x86 处理器上,跳转分为三种类型:
- 短跳转,PC 偏移量为 +/-127 字节(2 字节指令)
- 附近,+/- 32k 偏移量“滚动”当前段(3-字节指令)
- 远,可以跳转到任何地方(5 字节指令)
短跳转和近跳转需要 1-2 个时钟周期,而远跳转需要 50-80 个时钟周期,具体取决于处理器。根据我对文档的阅读,这是因为它们“超出了当前代码段的 CS 范围”。
在 x86-64 模式下,不使用代码段 - 该段实际上始终为 0..无穷大。因此,超出某个部分不应该受到惩罚。
因此,问题是:如果处理器处于 x86-64 模式,时钟周期数是否会因远跳而改变?
相关奖励问题:大多数在 32 位保护模式下运行的类似 *nix 的操作系统显式地将段大小设置为 0..infinity 并管理线性 -> 。物理转换完全通过页表进行。他们是否从远调用时间(更少的时钟周期)中受益,或者惩罚实际上是自 8086 以来大小段寄存器的内部 CPU 遗留问题?
Quick summary: in x86-64 mode, are far jumps as slow as in x86-32 mode?
On the x86 processor, jumps fall into three types:
- short, with a PC-offset of +/-127 bytes (2-byte instruction)
- near, with a +/- 32k offset that "rolls around" the current segment (3-byte instruction)
- far, which can jump anywhere (5-byte instruction)
Short and near jumps take 1-2 clock cycles, while far jumps take 50-80 clock cycles, depending on processor. From my reading of the documentation, this is because they "go outside CS, the current code segment."
In x86-64 mode, code segments aren't used - The segment is effectively always 0..infinity. Ergo, there shouldn't be a penalty for going outside a segment.
Thus, the question: Does the number of clock cycles change for a far jump if the processor is in x86-64 mode?
Related bonus question: Most *nix-like operating systems running in 32-bit protected mode explicitly set the segment sizes to 0..infinity and manage the linear -> physical translation entirely through the page tables. Do they get a benefit from this in terms of the time for far calls (fewer clock cycles), or is the penalty really an internal CPU legacy from the size segment registers have been since the 8086?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
CS不仅用于基础和限制,还用于权限。 CPL 以及其他字段在那里进行编码,例如:
远跳转也可以通过任务门,远调用也可以通过调用门。无论 64 位模式如何,所有这些都必须处理。
综上所述,64 位模式下的远跳转并不比 32 位模式下快。事实上,考虑到启用 64 位模式时,段描述符是禁用 64 位模式时的两倍,所有描述符表访问都会加倍,这可能会延长跳转时间。
CS is used not only for base and limit, but also for permissions. The CPL is encoded there, as well as other fields such as:
Far jumps can also go through a task gate, and far calls can also go through call gates. All of these have to be handled, regardless of 64-bit mode.
To sum up, a far jump in 64-bit mode is no faster than in 32-bit mode. In fact, considering that when 64-bit mode is enabled, segment descriptors are twice as large as when 64-bit mode is disabled, all descriptor-table accesses are doubled, which may lengthen the time of the jump.