优化,编译器及其影响
(i) 如果程序针对一种 CPU 类别进行了优化(例如多核 Core i7) 通过编译相同的代码,那么它的性能 在老一代的其他 CPU 上处于次优水平(例如 Pentium 4) ...优化可能会损害其他 CPU 的性能..?
(ii)为了优化,编译器可以使用 x86 扩展(如 SSE 4),它们是 在较旧的 CPU 中不可用...所以,是否可以回退到某些非扩展 基于旧 CPU 的例程..?
(iii) 英特尔 C++ 编译器是否比 Visual C++ 编译器或 GCC 更优化。
(iv) 真正的多核线程应用程序能否在 较旧的 CPU(如 Pentium III 或 4)..?
(i) If a Program is optimised for one CPU class (e.g. Multi-Core Core i7)
by compiling the Code on the same , then will its performance
be at sub-optimal level on other CPUs from older generations (e.g. Pentium 4)
... Optimizing may prove harmful for performance on other CPUs..?
(ii)For optimization, compilers may use x86 extensions (like SSE 4) which are
not available in older CPUs.... so ,Is there a fall-back to some non-extensions
based routine on older CPUs..?
(iii)Is Intel C++ Compiler is more optimizing than Visual C++ Compiler or GCC..
(iv) Will a truly Multi-Core Threaded application will perform effeciently on a
older CPUs (like Pentium III or 4)..?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在某个平台上进行编译并不意味着针对该平台进行优化。 (也许这只是你问题中的糟糕措辞。)
在我使用过的所有编译器中,针对平台 X 的优化不会影响指令集,只会影响它的使用方式,例如针对 i7 的优化不会启用 SSE2 指令。
此外,优化器在大多数情况下都会避免“悲观”未优化的平台,例如,在针对 i7 进行优化时,通常不会选择 i7 上的小改进,如果这意味着另一个重大命中。通用平台。
它还取决于指令集中的性能差异 - 我的印象是它们在过去十年中变得少得多(但我最近没有深入研究 - 对于最新的可能是错误的几代人)。还要考虑到优化仅在少数地方产生显着差异。
为了说明优化器的可能选项,请考虑以下方法来实现 switch 语句:
if (x==c) goto label
“最佳” 组合算法取决于比较的相对成本,按固定偏移量跳转并跳转到从内存读取的地址。它们在现代平台上差异不大,但即使很小的差异也可能会产生对一种或另一种实现的偏好。
Compiling on a platform does not mean optimizing for this platform. (maybe it's just bad wording in your question.)
In all compilers I've used, optimizing for platform X does not affect the instruction set, only how it is used, e.g. optimizing for i7 does not enable SSE2 instructions.
Also, optimizers in most cases avoid "pessimizing" non-optimized platforms, e.g. when optimizing for i7, typically a small improvement on i7 will not not be chosen if it means a major hit for another common platform.
It also depends in the performance differences in the instruction sets - my impression is that they've become much less in the last decade (but I haven't delved to deep lately - might be wrong for the latest generations). Also consider that optimizations make a notable difference only in few places.
To illustrate possible options for an optimizer, consider the following methods to implement a switch statement:
if (x==c) goto label
the "best" algorithm depends on the relative cost of comparisons, jumps by fixed offsets and jumps to an address read from memory. They don't differ much on modern platforms, but even small differences can create a preference for one or other implementation.
优化在 CPU X 上执行的代码可能确实会使该代码在 CPU Y 上的优化程度低于为在 CPU Y 上执行而优化的相同代码。很可能。
可能不会。
无法一概而论。您必须测试您的代码并得出自己的结论。
可能不会。
对于每一个关于为什么在某些条件下 X 应该比 Y 更快的争论(编译器的选择、CPU 的选择、编译优化标志的选择),一些聪明的 SOer 会找到一个反论点,为每个例子找到一个反例。当橡胶遇到路面时,您唯一的办法就是进行测试和测量。如果您想知道编译器 X 是否比编译器 Y “更好”,请首先定义更好的含义,然后进行大量实验,然后分析结果。
It is probably true that optimising code for execution on CPU X will make that code less optimal on CPU Y than the same code optimised for execution on CPU Y. Probably.
Probably not.
Impossible to generalise. You have to test your code and come to your own conclusions.
Probably not.
For every argument about why X should be faster than Y under some set of conditions (choice of compiler, choice of CPU, choice of optimisation flags for compilation) some clever SOer will find a counter-argument, for every example a counter-example. When the rubber meets the road the only recourse you have is to test and measure. If you want to know whether compiler X is 'better' than compiler Y first define what you mean by better, then run a lot of experiments, then analyse the results.
I) 如果您没有告诉编译器要支持哪种 CPU 类型,则它在所有 CPU 上都可能不是最佳的。另一方面,如果您让编译器知道针对您的特定类型的 CPU 进行优化,那么它在其他 CPU 类型上肯定可能不是最佳的。
II) 否(至少对于英特尔和微软而言)。如果您告诉编译器使用 SSE4 进行编译,那么在代码中的任何位置使用 SSE4 都会感到安全,而无需进行测试。您有责任确保您的平台能够执行 SSE4 指令,否则您的程序将崩溃。您可能想要编译两个库并加载正确的一个。 SSE4(或任何其他指令集)编译的替代方法是使用内在函数,它们将在内部检查性能最佳的指令集(以少量开销为代价)。请注意,我在这里讨论的不是指令内在函数(它们特定于指令集),而是内在函数。
III)这本身就是一个完全不同的讨论。它随每个版本而变化,并且对于不同的程序可能有所不同。所以这里唯一的解决方案就是测试。只是一个注释;众所周知,Intel 编译器不能很好地编译以在 Intel 以外的任何设备上运行(例如:内部函数可能无法识别 AMD 或 Via CPU 的指令集)。
IV) 如果我们忽略较新 CPU 的片上效率和明显的架构差异,那么它在较旧 CPU 上的性能可能也同样好。多核处理本身并不依赖于 CPU 类型。但性能非常依赖于机器架构(例如:内存带宽、NUMA、芯片到芯片总线)以及多核通信的差异(例如:缓存一致性、总线锁定机制、共享缓存)。所有这些使得无法在 MP 中比较新旧 CPU 效率,但我相信这不是您要问的。因此,总的来说,为较新的 CPU 编写的 MP 程序不应低效地使用旧 CPU 的 MP 方面。或者换句话说,仅仅针对较旧的 CPU 专门调整程序的 MP 方面不会有太大作用。显然,您可以重写算法以更有效地使用特定的 CPU(例如:共享缓存可能允许您使用在工作线程之间交换更多数据的算法,但该算法将在没有共享缓存、完整总线锁定的系统上死亡和低内存延迟/带宽),但它涉及的不仅仅是与 MP 相关的调整。
I) If you did not tell the compiler which CPU type to favor, the odds are that it will be slightly sub-optimal on all CPUs. On the other hand, if you let the compiler know to optimize for your specific type of CPU, then it can definitely be sub-optimal on other CPU types.
II) No (for Intel and MS at least). If you tell the compiler to compile with SSE4, it will feel safe using SSE4 anywhere in the code without testing. It becomes your responsibility to ensure that your platform is capable of executing SSE4 instructions, otherwise your program will crash. You might want to compile two libraries and load the proper one. An alternative to compiling for SSE4 (or any other instruction set) is to use intrinsics, these will check internally for the best performing set of instructions (at the cost of a slight overhead). Note that I am not talking about instruction instrinsics here (those are specific to an instruction set), but intrinsic functions.
III) That is a whole other discussion in itself. It changes with every version, and may be different for different programs. So the only solution here is to test. Just a note though; Intel compilers are known not to compile well for running on anything other than Intel (e.g.: intrinsic functions may not recognize the instruction set of a AMD or Via CPU).
IV) If we ignore the on-die efficiencies of newer CPUs and the obvious architecture differences, then yes it may perform as well on older CPU. Multi-Core processing is not dependent per se on the CPU type. But the performance is VERY dependent on the machine architecture (e.g.: memory bandwidth, NUMA, chip-to-chip bus), and differences in the Multi-Core communication (e.g.: cache coherency, bus locking mechanism, shared cache). All this makes it impossible to compare newer and older CPU efficiencies in MP, but that is not what you are asking I believe. So on the whole, a MP program made for newer CPUs, should not be using less efficiently the MP aspects of older CPUs. Or in other words, just tweaking the MP aspects of a program specifically for an older CPU will not do much. Obviously you could rewrite your algorithm to more efficiently use a specific CPU (e.g.: A shared cache may permit you to use an algorithm that exchanges more data between working threads, but this algo will die on a system with no shared cache, full bus lock and low memory latency/bandwidth), but it involves a lot more than just MP related tweaks.
(1) 这不仅是可能的,而且几乎每一代 x86 处理器上都有记录。回到 8088,每一代都继续前进。对于当前主流应用程序和操作系统(包括 Linux)来说,较新的处理器速度较慢。 32 位到 64 位的过渡没有帮助,更多的内核和更低的时钟速度会让情况变得更糟。出于同样的原因,这也是事实。
(2) 指望您的二进制文件失败或崩溃。有时你会很幸运,但大多数时候你不会。是的,有新指令,支持它们可能意味着陷入未定义指令的陷阱,并且对该指令进行软件模拟,这将非常慢,并且缺乏对它的需求意味着它可能做得不好或根本不存在。优化可以使用新指令,但除此之外,我猜您正在谈论的大部分优化都与重新排序指令有关,以便各种管道不会停止。因此,您将它们安排在一代处理器上速度很快,但在另一代处理器上速度会较慢,因为在 x86 系列中,内核变化太大。 AMD 在那里运行了一段时间,因为他们会让相同的代码运行得更快,而不是试图发明新的处理器,当软件赶上时最终会更快。 AMD 和英特尔都在努力保持芯片运行而不崩溃,这已不再是事实。
(3) 一般情况下是的。例如,gcc 是一个可怕的编译器,一刀切,不适合任何人,它永远不会也永远不会擅长优化。例如,对于同一处理器,gcc 4.x 代码比 gcc 3.x 代码慢(是的,所有这些都是主观的,这完全取决于正在编译的特定应用程序)。我使用的内部编译器远远领先于廉价或免费的编译器(我在这里并不局限于 x86)。它们值这个价吗?这就是问题所在。
总的来说,由于可怕的新编程语言和大量的内存、存储、缓存层,软件工程技能处于历史最低水平。这意味着能够制造出优秀编译器的工程师队伍会随着时间的推移而减少,更不用说优秀的优化编译器了,这种情况已经持续了至少 10 年。因此,即使是内部编译器也会随着时间的推移而退化,或者他们只是让员工来开发开源工具并为开源工具做出贡献,而不是拥有内部工具。此外,出于同样的原因,硬件工程师使用的工具正在退化,因此我们现在希望处理器能够在不崩溃的情况下运行,而不是过多地尝试优化。由于存在如此多的错误和芯片变化,因此大多数编译器工作都是为了避免这些错误。底线是,gcc 单枪匹马地摧毁了编译器世界。
(4) 参见上文(2)。不要指望它。您想要运行此程序的操作系统可能不会安装在旧处理器上,从而减轻您的痛苦。出于同样的原因,针对 Pentium III 优化的二进制文件在 Pentium 4 上运行速度较慢,反之亦然。为在多核处理器上良好运行而编写的代码在单核处理器上的运行速度将比为单核处理器优化相同应用程序时运行得慢。
问题的根源在于 x86 指令集太糟糕了。许多更先进的指令集已经出现,不需要硬件技巧就可以使它们每一代都更快。但wintel机器造成了两家垄断,其他机器无法打入市场。我的朋友一直提醒我,这些 x86 机器是微编码的,因此你实际上看不到里面的指令集。更让我愤怒的是,可怕的 isa 只是一个解释层。这有点像使用 Java。只要英特尔保持领先地位,您在问题中概述的问题就会继续存在,如果替代品没有成为垄断,那么我们将永远陷入 Java 模型中,您是共同分母的一侧或另一侧,要么您在特定硬件上模拟通用平台,或者您正在编写应用程序并编译到通用平台。
(1) Not only is it possible but it has been documented on pretty much every generation of x86 processor. Go back to the 8088 and work your way forward, every generation. Clock for clock the newer processor was slower for the current mainstream applications and operating systems (including Linux). The 32 to 64 bit transition is not helping, more cores and less clock speed is making it even worse. And this is true going backward as well for the same reason.
(2) Bank on your binaries failing or crashing. Sometimes you get lucky, most of the time you dont. There are new instructions yes, and to support them would probably mean trap for an undefined instruction and have a software emulation of that instruction which would be horribly slow and the lack of demand for it means it is probably not well done or just not there. Optimization can use new instructions but more than that the bulk of the optimization that I am guessing you are talking about has to do with reordering the instructions so that the various pipelines do not stall. So you arrange them to be fast on one generation processor they will be slower on another because in the x86 family the cores change too much. AMD had a good run there for a while as they would make the same code just run faster instead of trying to invent new processors that eventually would be faster when the software caught up. No longer true both amd and intel are struggling to just keep chips running without crashing.
(3) Generally, yes. For example gcc is a horrible compiler, one size fits all fits no one well, it can never and will never be any good at optimizing. For example gcc 4.x code is slower on gcc 3.x code for the same processor (yes all of this is subjective, it all depends on the specific application being compiled). The in house compilers I have used were leaps and bounds ahead of the cheap or free ones (I am not limiting myself to x86 here). Are they worth the price though? That is the question.
In general because of the horrible new programming languages and gobs of memory, storage, layers of caching, software engineering skills are at an all time low. Which means the pool of engineers capable of making a good compiler much less a good optimizing compiler decreases with time, this has been going on for at least 10 years. So even the in house compilers are degrading with time, or they just have their employees to work on and contribute to the open source tools instead having an in house tool. Also the tools the hardware engineers use are degrading for the same reason, so we now have processors that we hope to just run without crashing and not so much try to optimize for. There are so many bugs and chip variations that most of the compiler work is avoiding the bugs. Bottom line, gcc has singlehandedly destroyed the compiler world.
(4) See (2) above. Don't bank on it. Your operating system that you want to run this on will likely not install on the older processor anyway, saving you the pain. For the same reason that the binaries optimized for your pentium III ran slower on your Pentium 4 and vice versa. Code written to work well on multi core processors will run slower on single core processors than if you had optimized the same application for a single core processor.
The root of the problem is the x86 instruction set is dreadful. So many far superior instructions sets have come along that do not require hardware tricks to make them faster every generation. But the wintel machine created two monopolies and the others couldnt penetrate the market. My friends keep reminding me that these x86 machines are microcoded such that you really dont see the instruction set inside. Which angers me even more that the horrible isa is just an interpretation layer. It is kinda like using Java. The problems you have outlined in your questions will continue so long as intel stays on top, if the replacement does not become the monopoly then we will be stuck forever in the Java model where you are one side or the other of a common denominator, either you emulate the common platform on your specific hardware, or you are writing apps and compiling to the common platform.