确定 CPU 的操作码周期计数
我想知道从哪里可以获取各种机器的 CPU 操作码周期计数。我正在谈论的示例可以在以下链接中看到:
https://web.archive.org/web/20150217051448/http://www.obelisk.demon.co.uk/6502/reference.html
如果你检查 MAME 源代码,特别是在 src\emu\cpu 下,你会发现大多数 CPU 模型都以类似的方式跟踪周期计数。我的问题是,从哪里获取这些信息,或者如果不可用则对其进行逆向工程?我从未见过任何“官方”ASM 程序员指南包含周期计数信息。我最初的猜测是,一个小程序被扔进真实硬件的 bootrom 中,如果它包含相当于 RDTSC 的操作码,就会完成类似这样的操作:
RDTSC
//opcode of choosing
RDTSC
但是如果没有这样的支持,你会怎么做?我知道对于较旧的硬件,MAME 团队除了 ROM 和分散的文档之外无法访问任何内容。
I was wondering where would one go about getting CPU opcode cycle counts for various machines. An example of what I'm talking about can be seen at this link:
https://web.archive.org/web/20150217051448/http://www.obelisk.demon.co.uk/6502/reference.html
If you examine the MAME source code, especially under src\emu\cpu, you'll see that most of the CPU models keep a track of the cycle count in a similar way. My question is where does one go about getting this information, or reverse engineering it if its not available? I've never seen any 'official' ASM programmer's guide contain cycle count info. My initial guess is that a small program is thrown into the real hardware's bootrom, and if it contains an opcode equivalent to RDTSC, something like this is done:
RDTSC
//opcode of choosing
RDTSC
But what would you do if such support wasn't available? I know for older hardware the MAME team has no access to anything but the roms, and scattered documentation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
从奔腾开始,英特尔和 AMD 处理器(以及大多数竞争对手)的周期计数都很容易找到。然而,从 Pentium Pro 和 AMD K5 开始,CPU 进入了动态执行模型,其中指令可以乱序执行。在这种情况下,执行一条指令所花费的时间在很大程度上取决于它使用的数据,以及(例如)它是否依赖于前一条指令的数据(在这种情况下,它必须等待该指令完成才能执行)执行)。
还有一些限制,例如每个周期可以解码多少条指令(例如,至少一条,再加上两条,只要它们“简单”)以及每个周期可以退出多少条指令(通常大约三或四条)。
因此,在现代 CPU 上,单独讨论给定指令的周期几乎毫无意义。有意义的结果需要一系列指令,因此您不仅要查看该指令,还要查看它之前和之后的内容。在一个指令流中成为严重瓶颈的指令在另一个流中可能基本上是空闲的(例如,如果您将一个乘法与大量加法混合在一起,则该乘法可能几乎是空闲的 - 但如果它被许多其他乘法包围) ,可能会比较贵)。
Up through about the Pentium, cycle counts were easy to find for Intel and AMD processors (and most competitors). Starting with the Pentium Pro and AMD K5, however, the CPU went to a dynamic execution model, in which instructions can be executed out of order. In this case, the time taken to execute an instruction depends heavily upon the data it uses, and whether (for example) it depends on data from a previous instruction (in which case, it has to wait for that instruction to complete before it can execute).
There are also constraints on things like how many instructions can be decoded per cycle (e.g. at least one, plus two more as long as they're "simple") and how many can be retired per cycle (usually around three or four).
As a result, on a modern CPU it's almost meaningless to talk about the cycles for a given instruction in isolation. Meaningful results require a stream of instructions, so you look not only at that instruction, but what comes before and after it. An instruction that's a serious bottleneck in one instruction stream might be essentially free in another stream (e.g. if you have one multiplication mixed in with a lot of adds, the multiplication might be almost free -- but if it's surrounded by a lot of other multiplications, it might be relatively expensive).
接受的 RDTSC 计数应该有一个序列化指令,以确保在获取计数之前所有先前的指令都已退出。这会增加计数的开销,但您可以简单地“计数”零指令并从测量的指令中减去该值。
一些 pdf 手册很好地涵盖了这一点。
http://www.agner.org/optimize/#manuals
The accepted RDTSC count should have a serializing instruction to ensure that all previous instructions have retired before getting the count. This adds overhead to the count, but you can simply "count" zero instructions and subtract that value from the measured instructions.
Some pdf manuals that cover this very well.
http://www.agner.org/optimize/#manuals