如何指定x86和PowerPC指令的执行时间?
我必须估算 PowerPC 和 x86 汇编代码的执行时间。我知道我无法精确计算,这取决于许多问题(当前处理器状态 - x86 处理器决定微指令中的内部指令、从较慢内存的缓存中获取代码的内存访问时间等)。 )。
我在英特尔优化参考(附录C)中找到了一些信息,但它没有提供有关所有通用指令的信息。有没有完整的参考资料?
PowerPC 处理器怎么样?我在哪里可以找到此类信息?
I have to approximate execution time of PowerPC and x86 assembler code.I understand that I cannot compute exact it dependson many problems (current processor state - x86 processor dicides internal instructions in microinstructions, memory access time obtainig code from cache of from slower memory etc.).
I found some information in Intel Optimization reference (APPENDIX C), but it does not provide information about all general purpose instructions. Is there any complete reference about it?
What about PowerPC processors? Where can I find such information?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
PowerPC 有很好的文档记录,但这取决于您所讨论的是哪种处理器。 IBM 为 970 (G5) 制作了一本相当不错的手册。当谈到微架构的细节时,英特尔则不太愿意透露。
话虽如此,你想做的事情是相当棘手的。 x86 和 PowerPC 都是超标量 - 它们具有多个执行单元和管道,因此不像过去每个时钟周期执行一条指令。例如,PowerPC 970 在任何给定时间最多可以有 215 条“运行中”指令。理想情况下,如果您想测量一小部分代码的精确周期计数,则需要一个模拟器。
PowerPC is pretty well documented, but it depends which processor you're talking about. IBM did a pretty good manual for the 970 (G5). Intel is a little less forthcoming when it comes to details of micro-architecture.
Having said that though, what you want to do is quite tricky. Both x86 and PowerPC are superscalar - they have multiple execution units and pipelines, so it's not like the old days where you maybe executed one instruction per clock cycle. The PowerPC 970 for example can have up to 215 instructions "in flight" at any given time. Ideally you need a simulator if you want to measure exact cycle counts for small sections of code.
对于现代通用操作系统来说,如果不极其严格地控制执行环境,或者做出至少在某些时候不正确的假设,这肯定是非常困难的。
例如:如果某个硬件资源被一个非常饥饿的竞争进程或多个竞争进程过载,则执行给定代码段所花费的时间将取决于操作系统在竞争进程之间共享过载资源的公平程度。即使操作系统可以完美公平地共享资源,您也必须能够限制竞争进程的数量以确定有限的时间限制。
This must be very hard to do for a modern, general-purpose, OS without either controlling the execution environment extremely tightly, or making assumptions that won't be true at least some of the time.
For example: If some hardware resource is overloaded either by one very hungry competing process or multiple competing processes, then the elapsed time to execute a given piece of code will depend upon how fairly the OS can share the overloaded resource between the competing processes. Even if the OS can share the resource perfectly fairly, you have to be able to limit the number of competing processes to determine a finite time limit.
现代处理器大部分时间都在等待内存,或者在等待当前线程的内存时寻找要做的事情。
我认为你应该尝试优化你的内存使用。
Modern processors spend most of their time waiting for memory, or finding stuff to do while waiting for memory for their current thread.
I think you should probably just try optimising your memory usage.
你必须进行极其严格的分析。考虑到所有缓存、对齐、流水线、时间切片等。x86 是否还具有每条指令的硬时钟周期时间?最好按照 CPU 手册的建议编写速度优化代码。
You'd have to do an extremely rigorous analysis. Take into account all the caches, alignment, pipelining, time slicing, etc, etc, etc. Does x86 even have hard clock cycle times per instruction any more? Better off to just write the optimized code for speed according to how the CPU manual suggests.