在哪里可以学习底层的、核心的性能知识？

发布于 2024-12-02 05:32:16 字数 309 浏览 4 评论 0原文

这实际上是一个由两部分组成的问题：

对于想要压缩每个时钟周期的人，人们会谈论管道、缓存局部性等。

我已经看到到处提到的这些低级性能技术，但我还没有看到好的从头到尾介绍主题。有什么资源推荐吗？（谷歌给了我定义和论文，我真的很欣赏某种有效的示例/教程，现实生活中的动手材料）
人们如何实际衡量这类事物？就像在某种分析器中一样？我知道我们总是可以更改代码，看到改进并在回顾时进行理论化，我只是想知道是否有适合这项工作的既定工具。

（我知道算法优化是数量级所在。我对这里的金属感兴趣）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凉月流沐 2024-12-09 05:32:16

一致的答复是“不要过早优化”。正如您所提到的，您将从更好的设计中获得比更好的循环更多的性能，并且您的维护人员也会欣赏它。

也就是说，回答你的问题：
学习组装。很多很多的组装。当您可以移位时，不要乘以 2 的幂。了解异或复制和清除寄存器的奇怪用法。具体参考，
http://www.mark.masmcode.com/ 和 http://www.agner.org/optimize/

是的，您需要对代码进行计时。在 *nix 上，它可以像 time {commands ; 一样简单。 } 但您可能想要使用全功能分析器。 GNU gprof 是开源的 http://www.cs.utah。 edu/dept/old/texinfo/as/gprof.html

如果这确实是你的事，那就去做吧，玩得开心，并记住，很多很多的位级数学。你的维护者会恨你的；）

回复收藏 0 原文

沙与沫 2024-12-09 05:32:16

编辑/重写：

如果是您需要的书籍，Michael Abrash 在这一领域做得很好，《汇编语言之禅》、一些杂志文章、图形编程大黑书等。他所调整的大部分内容不再是问题变了，问题变了。您将从中得到的是可能导致瓶颈的各种事情的想法以及解决方法的各种方法。最重要的是对所有事情进行计时，并了解计时测量的工作原理，这样您就不会因测量不正确而欺骗自己。计时不同的解决方案并尝试疯狂、奇怪的解决方案，你可能会发现一个你没有意识到的优化，直到你暴露它才意识到。

我才刚刚开始阅读，但到目前为止，See MIPS Run（早期/第一版）看起来不错（请注意，ARM 取代 MIPS 成为处理器市场的领导者，因此 MIPS 和 RISC 的炒作有点过时了）。有许多关于 MIPS 的新旧教科书。 Mips 是为了性能而设计的（在某些方面以软件工程师为代价）。

如今的瓶颈属于处理器本身及其周围的 I/O 以及与该 I/O 连接的内容。处理器芯片本身（对于高端系统）的内部运行速度比 I/O 的处理速度要快得多，因此您只能调整到这个程度，然后就必须离开芯片并永远等待。下车时，在火车车程为 3 小时的情况下，从火车到目的地的速度快半分钟并不一定是值得优化的。

这都是关于学习硬件的，您可能可以停留在 1 和 0 的世界中，而不必进入实际的电子产品。但如果不真正了解接口和内部结构，你真的无法进行太多的性能调整。您可能会重新安排或更改一些指令并获得一点提升，但要使某些事情快几百倍，您需要的不仅仅是这些。学习许多不同的指令集（汇编语言）有助于进入处理器。我建议模拟 HDL，例如 opencore 的处理器，以了解一些人如何进行设计，并牢牢掌握如何真正从任务中挤出时钟。处理器知识很丰富，内存接口很重要并且需要学习，媒体（闪存、硬盘等）和显示器和图形、网络以及所有这些东西之间的所有类型的接口。所需要的就是了解时钟级别或尽可能接近时钟级别。

EDIT/REWRITE:

If it is books you need Michael Abrash did a good job in this area, Zen of Assembly language, a number of magazine articles, big black book of graphics programming, etc. Much of what he was tuning for is no longer a problem, the problems have changed. What you will get out of this is the ideas of the kinds of things that can cause bottle necks and the kinds of ways to solve. Most important is to time everything, and understand how your timing measurements work so that you are not fooling yourself by measuring incorrectly. Time the different solutions and try crazy, weird solutions, you may find an optimization that you were not aware of and didnt realize until you exposed it.

I have only just started reading but See MIPS Run (early/first edition) looks good so far (note that ARM took over MIPS as the leader in the processor market, so the MIPS and RISC hype is a bit dated). There are a number of text books old and new to be had about MIPS. Mips being designed for performance (At the cost of the software engineer in some ways).

The bottlenecks today fall into the categories of the processor itself and the I/O around it and what is connected to that I/O. The insides of the processor chips themselves (for higher end systems) run much faster than the I/O can handle, so you can only tune so far before you have to go off chip and wait forever. Getting off the train, from the train to your destination half a minute faster when the train ride was 3 hours is not necessarily a worthwhile optimization.

It is all about learning the hardware, you can probably stay within the ones and zeros world and not have to get into the actual electronics. But without really knowing the interfaces and internals you really cannot do much performance tuning. You might re-arrange or change a few instructions and get a little boost, but to make something several hundred times faster you need more than that. Learning a lot of different instruction sets (assembly languages) helps get into the processors. I would recommend simulating HDL, for example processors at opencores, to get a feel for how some folks do their designs and getting a solid handle on how to really squeeze clocks out of a task. Processor knowledge is big, memory interfaces are a huge deal and need to be learned, media (flash, hard disks, etc) and displays and graphics, networking, and all the types of interfaces between all of those things. And understanding at the clock level or as close to it as you can get, is what it takes.

回复收藏 0 原文