在哪里可以学习底层的、核心的性能知识?
这实际上是一个由两部分组成的问题:
对于想要压缩每个时钟周期的人,人们会谈论管道、缓存局部性等。
我已经看到到处提到的这些低级性能技术,但我还没有看到好的从头到尾介绍主题。有什么资源推荐吗? (谷歌给了我定义和论文,我真的很欣赏某种有效的示例/教程,现实生活中的动手材料)
人们如何实际衡量这类事物?就像在某种分析器中一样?我知道我们总是可以更改代码,看到改进并在回顾时进行理论化,我只是想知道是否有适合这项工作的既定工具。
(我知道算法优化是数量级所在。我对这里的金属感兴趣)
This is actually a 2 part question:
For people who want to squeeze every clock cycle, people talk about pipelines, cache locality, etc.
I have seen these low level performance techniques mentioned here and there but I have not seen a good introduction to the subject, from start to finish. Any resource recommendations? (Google gave me definitions and papers, where I'd really appreciate some kind of worked examples/tutorials real-life hands-on kind of materials)
How does one actually measure this kind of things? Like, as in a profiler of some sort? I know we can always change the code, see the improvement and theorize in retrospect, I am just wondering if there are established tools for the job.
(I know algorithm optimization is where the orders of magnitudes are. I am interested in the metal here)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
一致的答复是“不要过早优化”。正如您所提到的,您将从更好的设计中获得比更好的循环更多的性能,并且您的维护人员也会欣赏它。
也就是说,回答你的问题:
学习组装。很多很多的组装。当您可以移位时,不要乘以 2 的幂。了解异或复制和清除寄存器的奇怪用法。具体参考,
http://www.mark.masmcode.com/ 和 http://www.agner.org/optimize/
是的,您需要对代码进行计时。在 *nix 上,它可以像
time {commands ; 一样简单。 }
但您可能想要使用全功能分析器。 GNU gprof 是开源的 http://www.cs.utah。 edu/dept/old/texinfo/as/gprof.html如果这确实是你的事,那就去做吧,玩得开心,并记住,很多很多的位级数学。你的维护者会恨你的;)
The chorus of replies is, "Don't optimize prematurely." As you mention, you will get a lot more performance out of a better design than a better loop, and your maintainers will appreciate it, as well.
That said, to answer your question:
Learn assembly. Lots and lots of assembly. Don't MUL by a power of two when you can shift. Learn the weird uses of xor to copy and clear registers. For specific references,
http://www.mark.masmcode.com/ and http://www.agner.org/optimize/
Yes, you need to time your code. On *nix, it can be as easy as
time { commands ; }
but you'll probably want to use a full-features profiler. GNU gprof is open source http://www.cs.utah.edu/dept/old/texinfo/as/gprof.htmlIf this really is your thing, go for it, have fun, and remember, lots and lots of bit-level math. And your maintainers will hate you ;)
编辑/重写:
如果是您需要的书籍,Michael Abrash 在这一领域做得很好,《汇编语言之禅》、一些杂志文章、图形编程大黑书等。他所调整的大部分内容不再是问题变了,问题变了。您将从中得到的是可能导致瓶颈的各种事情的想法以及解决方法的各种方法。最重要的是对所有事情进行计时,并了解计时测量的工作原理,这样您就不会因测量不正确而欺骗自己。计时不同的解决方案并尝试疯狂、奇怪的解决方案,你可能会发现一个你没有意识到的优化,直到你暴露它才意识到。
我才刚刚开始阅读,但到目前为止,See MIPS Run(早期/第一版)看起来不错(请注意,ARM 取代 MIPS 成为处理器市场的领导者,因此 MIPS 和 RISC 的炒作有点过时了)。有许多关于 MIPS 的新旧教科书。 Mips 是为了性能而设计的(在某些方面以软件工程师为代价)。
如今的瓶颈属于处理器本身及其周围的 I/O 以及与该 I/O 连接的内容。处理器芯片本身(对于高端系统)的内部运行速度比 I/O 的处理速度要快得多,因此您只能调整到这个程度,然后就必须离开芯片并永远等待。下车时,在火车车程为 3 小时的情况下,从火车到目的地的速度快半分钟并不一定是值得优化的。
这都是关于学习硬件的,您可能可以停留在 1 和 0 的世界中,而不必进入实际的电子产品。但如果不真正了解接口和内部结构,你真的无法进行太多的性能调整。您可能会重新安排或更改一些指令并获得一点提升,但要使某些事情快几百倍,您需要的不仅仅是这些。学习许多不同的指令集(汇编语言)有助于进入处理器。我建议模拟 HDL,例如 opencore 的处理器,以了解一些人如何进行设计,并牢牢掌握如何真正从任务中挤出时钟。处理器知识很丰富,内存接口很重要并且需要学习,媒体(闪存、硬盘等)和显示器和图形、网络以及所有这些东西之间的所有类型的接口。所需要的就是了解时钟级别或尽可能接近时钟级别。
EDIT/REWRITE:
If it is books you need Michael Abrash did a good job in this area, Zen of Assembly language, a number of magazine articles, big black book of graphics programming, etc. Much of what he was tuning for is no longer a problem, the problems have changed. What you will get out of this is the ideas of the kinds of things that can cause bottle necks and the kinds of ways to solve. Most important is to time everything, and understand how your timing measurements work so that you are not fooling yourself by measuring incorrectly. Time the different solutions and try crazy, weird solutions, you may find an optimization that you were not aware of and didnt realize until you exposed it.
I have only just started reading but See MIPS Run (early/first edition) looks good so far (note that ARM took over MIPS as the leader in the processor market, so the MIPS and RISC hype is a bit dated). There are a number of text books old and new to be had about MIPS. Mips being designed for performance (At the cost of the software engineer in some ways).
The bottlenecks today fall into the categories of the processor itself and the I/O around it and what is connected to that I/O. The insides of the processor chips themselves (for higher end systems) run much faster than the I/O can handle, so you can only tune so far before you have to go off chip and wait forever. Getting off the train, from the train to your destination half a minute faster when the train ride was 3 hours is not necessarily a worthwhile optimization.
It is all about learning the hardware, you can probably stay within the ones and zeros world and not have to get into the actual electronics. But without really knowing the interfaces and internals you really cannot do much performance tuning. You might re-arrange or change a few instructions and get a little boost, but to make something several hundred times faster you need more than that. Learning a lot of different instruction sets (assembly languages) helps get into the processors. I would recommend simulating HDL, for example processors at opencores, to get a feel for how some folks do their designs and getting a solid handle on how to really squeeze clocks out of a task. Processor knowledge is big, memory interfaces are a huge deal and need to be learned, media (flash, hard disks, etc) and displays and graphics, networking, and all the types of interfaces between all of those things. And understanding at the clock level or as close to it as you can get, is what it takes.
Intel和AMD提供了x86和x86-64的优化手册。
http://www.intel。 com/content/www/us/en/processors/architectures-software-developer-manuals.html/
http://developer.amd.com/documentation/guides/pages/default.aspx
另一个优秀的资源是agner。
http://www.agner.org/optimize/
一些要点(排名不分先后) ):
Intel and AMD provide optimization manuals for x86 and x86-64.
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html/
http://developer.amd.com/documentation/guides/pages/default.aspx
Another excellent resource is agner.
http://www.agner.org/optimize/
Some of the key points (in no particular order):
是的,测量,是的,了解所有这些技术。
有经验的人会告诉您“不要过早优化”,我将其简单地理解为“不要猜测”。
他们还会说“使用分析器来查找瓶颈”,但我对此有疑问。我听到很多关于人们使用分析器的故事,他们要么非常喜欢它们,要么对它们的输出感到困惑。
所以充满了它们。
我很少听到的是实现加速因素的成功故事。
我使用的方法非常简单,并且我尝试给出很多示例,包括这个案例。
Yes, measure, and yes, know all those techniques.
Experienced people will tell you "don't optimize prematurely", which I relate as simply "don't guess".
They will also say "use a profiler to find the bottleneck", but I have a problem with that. I hear lots of stories of people using profilers and either liking them a lot or being confused with their output.
SO is full of them.
What I don't hear a lot of is success stories, with speedup factors achieved.
The method I use is very simple, and I've tried to give lots of examples, including this case.
我建议 优化汇编中的子例程
语言
x86 平台优化指南。
虽然它是相当重的东西;)
I'd suggest Optimizing subroutines in assembly
language
An optimization guide for x86 platforms.
It's quite heavy stuff though ;)