简明 SSE 和 MMX 指令参考以及延迟和吞吐量
我正在尝试通过使用带有内联汇编的 MMX 和 SSE 指令集来优化一些算术。然而,我一直无法找到有关这些增强指令集的时序和用法的良好参考。您能否帮我找到包含有关吞吐量、延迟、操作数以及指令简短描述信息的参考资料?
到目前为止,我找到了:
Intel指令参考 英特尔 64 和 IA-32 架构开发人员手册:卷。 2A 和 英特尔 64 和 IA-32 架构开发人员手册:卷。 2B
I am trying to optimize some arithmetic by using the MMX and SSE instruction sets with inline assembly. However, I have been unable to find good references for the timings and usages of these enhanced instruction sets. Could you please help me find references that contain information about the throughput, latency, operands, and perhaps short descriptions of the instructions?
So far, I have found:
Intel Instruction References
Intel 64 and IA-32 Architectures Developer's Manual: Vol. 2A and
Intel 64 and IA-32 Architectures Developer's Manual: Vol. 2B
Intel Optimization Guide
http://www.intel.com/Assets/PDF/manual/248966.pdf
Timings of Integer Operations
http://gmplib.org/~tege/x86-timing.pdf
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想,英特尔指令参考应该为这些指令的实际用途提供充分的指南?它有每个的伪代码、其操作的描述,在某些情况下甚至还有代表性案例的小图表。
至于时间安排,据我所知没有官方指南。 Agner Fog 的页面是标准参考:
http://www.agner.org/optimize/
The Intel Instruction Reference should provide an adequate guide to what these instructions actually do, I would have thought? It has pseudocode for each one, a description of its operation, and in some cases even a little diagram of a representative case.
For timings, there's no official guide that I'm aware of. Agner Fog's page is the standard reference:
http://www.agner.org/optimize/
英特尔内在指南(位于 AVX 页面左下角)是一个组织良好的可搜索工具,您可以通过 SSE 版本和/或指令类型(例如 FP 算术或整数逻辑)缩小范围。
对于每条指令,它还按 CPU 和参数显示延迟/吞吐量表。
The Intel's Intrinsic Guide (at the bottom left of the AVX page), is a well-organized searchable tool, where you can narrow down by SSE version and/or instruction type, e.g., FP arithmetic or Integer Logical.
For each instruction, it also shows a latency/throughput table by CPU and by parameters.
时序见《Intel优化指南》;有关每个 CPU 架构每条指令的吞吐量和延迟,请参阅附录 C。
The timing are in the "Intel Optimization Guide"; see Appendix C for throughput and latencies for each instruction per CPU architecture.