splat 编译器生成的代码的相对性能
CPU 设计方面的进步(例如动态指令调度)是否缩小了 splat 编译器生成的代码与优化编译器生成的代码之间的性能差距,即现在的编译器是否可以变得更加愚蠢?
Have advancements in CPU design like dynamic instruction scheduling narrowed the performance gap between code generated by splat compilers and by optimizing compilers, i.e. can compilers get away with being more stupid these days?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
相反,优化编译器在当代 CPU 上取得了更多成果。自动矢量化使代码速度提高数倍。现代指令集还提供了一些优化机会(例如,在 x86 上使用 CMOV 代替条件分支)。
在某些领域,绩效差距正在缩小。 CPU 执行函数调用的速度更快,因此函数内联可能不如以前那么有利。循环展开有时可能会使代码变慢一些。但在大多数情况下,编译器优化和 CPU 优化是相互正交的。 CPU 无法进行循环融合或公共子表达式消除。编译器无法提供动态指令调度、分支预测或数据预取的良好替代方案。
On the contrary, optimizing compilers achieve more on contemporary CPUs. Automatic vectorization makes code up to several times faster. Modern instruction sets also give some optimization opportunities (for example, using CMOV instead of conditional branch on x86).
There are some areas, where performance gap is narrowed. CPU performs function calls faster, so function inlining may be not as beneficial as earlier. Loop unrolling may sometimes make code a little bit slower. But in most cases compiler optimizations and CPU optimizations are orthogonal to each other. CPUs cannot do loop fusion or common subexpression elimination. Compilers cannot provide good alternative to dynamic instruction scheduling, branch prediction or data prefetch.