各种操作的大致 CPU 周期数

发布于 2024-08-30 05:56:39 字数 231 浏览 9 评论 0 原文

我试图找到有关各种操作大约需要多少个 CPU 周期的参考。

我不需要确切的数字(因为这会因 CPU 的不同而有所不同),但我想要一些相对可信的数据,它可以提供我可以在与朋友讨论时引用的大概数字。

举个例子,我们都知道浮点除法比移位需要更多的 CPU 周期。

我猜差异在于除法大约是 100 个周期,而班次是 1,但我正在寻找一些可以引用的东西来支持这一点。

有人可以推荐这样的资源吗?

I am trying to find a reference for approximately how many CPU cycles various operations require.

I don't need exact numbers (as this is going to vary between CPUs) but I'd like something relatively credible that gives ballpark figures that I could cite in discussion with friends.

As an example, we all know that floating point division takes more CPU cycles than say doing a bitshift.

I'd guess that the difference is that the division is around 100 cycles, where as a shift is 1 but I'm looking for something to cite to back that up.

Can anyone recommend such a resource?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

恍梦境° 2024-09-06 05:56:39

对于 x86 处理器,请参阅英特尔® 64 和 IA-32 架构优化参考手册,可能是附录 C。

然而,要弄清楚一条指令在现代 x86 处理器上执行需要多少个周期并不容易,因为它过多地依赖于例如访问缓存中的数据、对齐访问、分支预测是否失败、如果指令管道中存在停顿以及许多其他问题。

For x86 processors, see Intel® 64 and IA-32 Architectures Optimization Reference Manual, probably Appendix C.

However, it's not in any way easy to figure out how many cycles an instruction takes to execute on a modern x86 processor, as it depends too much on e.g. accessing data in cache,aligned access, whether branch prediction fails, if there's a stall in the instruction pipeline and quite a lot of other things.

柠檬色的秋千 2024-09-06 05:56:39

我做了一个小应用程序来测试这一点。使用synthmaker免费版的一个非常近似的应用程序...e代表空,数字非常近似周期

  divide|e:115|10
    mult|e: 48|10
     add|e: 48|10
    subs|e: 50|10
compare>|e: 50|10
     sin|e:135:10

循环分析器中的读数从50到100变化很大,通常是预期量的一倍或两倍,这些是代表平均值的数字,周期分析器是一个非常粗糙的工具,但它给出了公平的结果,用户使用 ASM 编码的指数来计算 exp 和基数,例如大约 800 个周期,所以我可以说上述数字至少接近50%。我以为差距更大了!看起来大约是两倍。如果你想让我制作的文件在 SM 免费版本中运行,请给我发邮件,我本来打算保存一个 exe,这就是我这样做的原因,但你不能在免费版本中保存,愚蠢的我!我不会从 1.17 版中的第一个方块开始编码:/
ant.stewart,位于 yahoo dotty com。

I did a small app to test this. A very approximate app using synthmaker free edition... e is for empty, numbers are very approx cycles

  divide|e:115|10
    mult|e: 48|10
     add|e: 48|10
    subs|e: 50|10
compare>|e: 50|10
     sin|e:135:10

The readings in the cycle analyser vary wildly from 50 to 100, usually single or double of the expected amount, these are figures that represent averages,the cycle analyzer is a very rough tool, but it gives fair results, a workaround user made exponent coded in ASM that calculates both the exp and the base at audio rate for example is around 800 cycles, so I'd say the above figures are close to at least 50 percent. I thought the divide was way more! It seems about twice as much. If you want the file I made to run in SM free version mail me, I was going to save an exe that is why i did it but you cant save in free version silly me! I am not going to code it from square one in version 1.17 :/
ant.stewart at the place yahoo dotty com.

怪我鬧 2024-09-06 05:56:39

Agner Fog 进行了研究

  • 说明表
  • 指令表:指令延迟、吞吐量和
    Intel、AMD 和 VIA CPU 的微操作故障。

    最后更新于 2022 年 11 月 4 日

    There is the research made by Agner Fog:

    1. Instruction tables

    Instruction tables: Lists of instruction latencies, throughputs and
    micro-operation breakdowns for Intel, AMD, and VIA CPUs.

    Last updated 2022-11-04

    峩卟喜欢 2024-09-06 05:56:39

    这将取决于硬件。最好的办法是在您想要测试的特定硬件上运行一些基准测试。

    基准测试大致如下:

    • 运行原始操作一百万次(例如,添加两个整数)
    • 记录运行所需的时间(例如,以秒为单位)
    • 乘以机器每秒执行的周期数 - 这将给出您所花费的周期总数。
    • 将 1000000 除以上一步的数字 - 这将得出每个周期的指令数。请记住,对于管道,该值可能小于 1。

    This is going to be hardware-dependent. The best thing to do is to run some benchmarks on the particular hardware you want to test.

    A benchmark would go roughly like this:

    • Run a primitive operation a million times (say, adding two integers)
    • Record the time it took to run (say, in seconds)
    • Multiply by the number of cycles your machine executes per second - this will give you the total number of cycles spent.
    • Divide 1000000 by the number from the previous step - this will give you the number of instructions per cycle. Keep in mind that with pipelining, this could be less than 1.
    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文