我们一直是英特尔商店。 所有开发人员都使用 Intel 机器,向最终用户推荐的平台是 Intel,如果最终用户想在 AMD 上运行,那就要注意了。 也许测试部门有一台 AMD 机器来检查我们没有运送任何完全损坏的东西,但仅此而已。
直到几年前,我们还只是使用 MSVC 编译器,并且由于它并没有真正提供超出 SSE 级别的大量处理器调整选项,因此没有人太担心代码是否可能更适合某个 x86 供应商而不是另一个供应商。 然而,最近我们大量使用英特尔编译器。 我们的产品肯定会从中获得一些显着的性能优势(在我们的英特尔硬件上),并且其矢量化功能意味着更少需要使用 asm/内在函数。 然而,人们开始有点担心英特尔编译器是否真的无法为 AMD 硬件做好工作。 当然,如果您进入 Intel CRT 或 IPP 库,您会看到大量 cpuid 查询,显然是为了设置优化函数的跳转表。 不过,英特尔似乎不太可能费尽心思为 AMD 芯片做任何好事。
有这方面经验的人可以评论一下这在实践中是否有什么大不了的吗? (我们自己还没有对 AMD 进行过任何性能测试)。
更新 2010-01-04:支持 AMD 的需求从未变得具体到足以让我自己进行任何测试。 这里有一些关于这个问题的有趣读物,此处 和不过这里。
2010-08-09 更新:英特尔与 FTC 的和解似乎对这个问题有话要说 - 请参阅 本文。
We've always been an Intel shop. All the developers use Intel machines, recommended platform for end users is Intel, and if end users want to run on AMD it's their lookout. Maybe the test department had an AMD machine somewhere to check we didn't ship anything completely broken, but that was about it.
Up until a few of years ago we just used the MSVC compiler and since it doesn't really offer a lot of processor tuning options beyond SSE level, noone worried too much about whether the code might favour one x86 vendor over another. However, more recently we've been using the Intel compiler a lot. Our stuff definitely gets some significant performance benefits from it (on our Intel hardware), and its vectorization capabilities mean less need to go to asm/intrinsics. However people are starting to get a bit nervous about whether the Intel compiler may actually not be doing such a good job for AMD hardware. Certainly if you step into the Intel CRT or IPP libraries you see a lot of cpuid queries to apparently set up jump tables to optimised functions. It seems unlikely Intel go to much trouble to do anything good for AMDs chips though.
Can anyone with any experience in this area comment on whether it's a big deal or not in practice ? (We've yet to actually do any performance testing on AMD ourselves).
Update 2010-01-04: Well the need to support AMD never became concrete enough for me to do any testing myself. There are some interesting reads on the issue here, here and here though.
Update 2010-08-09: It seems the Intel-FTC settlement has something to say about this issue - see "Compilers and Dirty Tricks" section of this article.
发布评论
评论(7)
购买一个 AMD 盒子并在上面运行它。 这似乎是唯一负责任的做法,而不是相信互联网上的陌生人;)
除此之外,我相信 AMD 针对英特尔的诉讼的部分内容是基于这样的说法:英特尔的编译器专门生成在 AMD 处理器上运行效率低下的代码。 我不知道这是真是假,但 AMD 似乎是这么认为的。
但即使他们不是故意这样做,毫无疑问,英特尔的编译器专门针对英特尔处理器进行了优化,而不是其他。
当这么说时,我怀疑这会产生巨大的影响。 AMD CPU 仍将受益于编译器的所有自动矢量化和其他巧妙功能。
Buy an AMD box and run it on that. That seems like the only responsible thing to do, rather than trusting strangers on the internet ;)
Apart from that, I believe part of AMD's lawsuit against Intel is based on the claim that Intel's compiler specifically produces code that runs inefficiently on AMD processors. I don't know whether that's true or not, but AMD seems to believe so.
But even if they don't willfully do that, there's no doubt that Intel's compiler optimizes specifically for Intel processors and nothing else.
When that is said, I doubt it'd make a huge difference. AMD CPU's would still benefit from all the auto-vectorization and other clever features of the compiler.
我肯定会说显而易见的事情,如果性能对于您的应用程序至关重要,那么您最好对硬件/编译器的所有组合进行一些测试。 没有任何保证。 作为局外人,我们只能给你我们的猜测/偏见。 您的软件可能具有与我们所见过的不同的独特特征。
我的经验:
我曾经在英特尔工作,开发了一个内部(C++)应用程序,其中性能至关重要。 我们尝试使用英特尔的 C++ 编译器,但它总是执行 gcc - 即使在执行配置文件运行、使用配置文件信息重新编译(icc 据称用于优化)并在完全相同的数据集上重新运行后也是如此(这是2005-2007年的情况,现在情况可能有所不同)。 因此,根据我的经验,您可能想尝试 gcc(除了 icc 和 MSVC),这样您可能会获得更好的性能并回避问题。 切换编译器应该不会太难(如果您的构建过程合理)。
现在我在另一家公司工作,IT人员进行了大量的硬件测试,有一段时间英特尔和AMD的硬件相对具有可比性,但最新一代的英特尔硬件明显优于AMD。 因此,我相信他们购买了大量的英特尔 CPU,并向运行我们软件的客户推荐同样的产品。
但是,回到 Intel 编译器是否专门针对 AMD 硬件运行缓慢的问题。 我怀疑英特尔是否会为此烦恼。 某些使用有关 Intel CPU 架构或芯片组内部知识的优化可能在 AMD 硬件上运行速度较慢,但我怀疑它们专门针对 AMD 硬件。
I'm surely stating the obvious, if performance is crucial for your application, then you'd better do some testing - on all combinations of hardware/compiler. There are no guarantees. As outsiders, we can only give you our guesses/biases. Your software may have unique characteristics that are unlike what we've seen.
My experience:
I used to work at Intel, and developed an in-house (C++) application where performance was critical. We tried to use Intel's C++ compiler, and it always under performed gcc - even after doing profile runs, recompiling using the profiled information (which icc supposedly uses to optimize) and re-running on the exact same dataset (this was in 2005-2007, things may be different now). So, based on my experience, you might want to try gcc (in addition to icc and MSVC), it's possible you will get better performance that way and side-step the question. It shouldn't be too hard to switch compilers (if your build process is reasonable).
Now I work at a different company, and the IT folks do extensive hardware testing, and for a while Intel and AMD hardware was relatively comparable, but the latest generation of Intel hardware significantly out-performed the AMD. As a result, I believe they purchased significant amounts of Intel CPUs and recommend the same for our customers who run our software.
But, back to the question as to whether the Intel compiler specifically targets AMD hardware to run slowly. I doubt Intel bothers with that. It could be that certain optimizations that use knowledge about the internals of Intel CPU architecture or chipsets could run slower on AMD hardware, but I doubt they specifically target AMD hardware.
我们所看到的是,无论何时英特尔编译器必须对可用指令集进行运行时选择,如果它无法识别英特尔CPU,它就会进入其“标准”代码(正如您所期望的那样,这可能不是最佳的) )。
请注意,即使我在上面使用了“编译器”一词,这也主要发生在他们提供的(预编译的)库和内在函数中,这些库和内在函数检查指令集并调用最佳代码。
What we have seen is that wherever the Intel compiler must make a runtime choice about the available instruction set, if it does not recognize an Intel CPU, it goes in their "standard" code (which, as you might expect, may not be optimal).
Note that even if I used the word "compiler" above, this mainly happens in their supplied (pre-compiled) libraries and intrinsics that check the instruction set and call the best code.
抱歉,如果您按下了我的通用按钮。
这是低级优化的主题,因此仅对以下代码重要:1)程序计数器花费大量时间,2)编译器实际看到的代码。 例如,如果 PC 将大部分时间花费在您不编译的库例程中,那么这应该不会有太大影响。
是否条件1& 2 满足,这是我对优化过程的经验:
完成了多次采样和修复迭代。 在每个问题中,都会识别出一个问题,但大多数情况下,问题与程序计数器的位置无关。 相反,由于性能至关重要,因此可以替换调用堆栈中层的函数调用。 为了快速找到它们,我这样做.
请记住,如果有一条函数调用指令在执行时间的很大一部分时间内都位于堆栈上,无论是在几次长调用中,还是在许多短调用中,该调用都会对该部分负责时间,因此删除它或减少执行频率可以节省大量时间。 而且,这种节省远远超过任何低级优化。
该程序现在可以比开始时快很多倍。
我从未见过任何大型程序,无论编写得多么仔细,都不能从这个过程中受益。
如果该过程尚未完成,则不应假设低级优化是加速程序的唯一方法。
当这个过程完成到根本无法再完成的程度之后,如果样本显示 PC 处于编译器看到的代码中,那么低级优化可以产生影响。
Sorry if you hit my general button.
This is on the subject of low-level optimization, so it only matters for code that 1) the program counter spends much time in, and 2) the compiler actually sees. For example, if the PC spends most of its time in library routines that you don't compile, it shouldn't matter very much.
Whether or not conditions 1 & 2 are met, here's my experience of how optimization goes:
Several iterations of sampling and fixing are done. In each of these, a problem is identified and most often it is not about where the program counter is. Rather it is that there are function calls at mid-levels of the call stack that, since performance is paramount, could be replaced. To find them quickly, I do this.
Keep in mind that if there is a function call instruction that is on the stack for a significant fraction of execution time, whether in a few long invocations, or a great many short ones, that call is responsible for that fraction of time, so removing it or executing it less often can save a lot of time. And, that savings far exceeds any low-level optimization.
The program can now be many times faster than it was to begin with.
I've never seen any good-sized program, no matter how carefully written, that could not benefit from this process.
If the process has not been done, it should not be assumed that low-level optimization is the only way to speed up the program.
After this process has been done to the point where it simply can't be done any more, and if samples show that the PC is in code that the compiler sees, then the low-level optimization can make a difference.
在该线程启动时,Microsoft C++ 默认进行代码生成,这在某些情况下对 AMD 有利,但对 Intel 不利。 他们最新的编译器默认使用混合选项,这对两者都有好处,特别是在两个品牌的 CPU 都解决了各自特有的性能错误之后。
当我第一次在英特尔工作时,他们的编译器为英特尔特定的架构设置保留了一些优化。 我想这可能是一些 FTC 证词的主题,尽管它在我 10 小时的证词中没有出现,而且由于最新 CPU 模型和需要更高效地利用编译器开发时间。
如果您在最新的 Intel CPU 上使用这些过时的编译器之一,您可能会看到一些相同的性能缺陷。
At the time this thread was started, Microsoft C++ defaulted to code generation which was good in some cases for AMD and bad for Intel. Their more recent compilers default to the blend option which is good for both, particularly after both brands of CPUs had worked out their peculiar performance bugs.
When I first worked at Intel, their compilers reserved some optimizations for Intel-specific architecture settings. I guess that might have been a topic of some FTC depositions, although it didn't come up in my 10 hours of testimony, and the practice was already on the way out due to convergence of optimization requirements between up to date CPU models and the need for more productive use of compiler development time.
If you used one of those obsolete compilers on an up to date Intel CPU, you might see some of the same performance deficiencies.
如果你不能采取行动,那么担心是没有意义的。 可能的行动有: 不购买 AMD,或使用不同的编译器。 因此,显而易见的事情是:
(1)购买一台 AMD 机器,并测量使用英特尔编译器编译的代码的速度。 够快吗? 如果是的话,你就完成了,你可以买AMD的,不用担心。
(2) 如果不是:使用不同的编译器编译代码并在 AMD 机器上运行。 够快吗? 如果没有,你就完了,你买不到AMD了,别担心。
(3) 如果是:在 Intel 机器上运行相同的代码。 够快吗? 如果是的话,你就完成了,你可以购买AMD,但必须切换编译器,不用担心。
(4) 如果不是: 可能性是:不要购买 AMD,扔掉所有 Intel 计算机,或者使用两种不同的编译器进行编译。 选一个。
It's pointless to worry if you can't act. Possible actions are: Not buying AMD, or using a different compiler. So the obvious things to do are:
(1) Buy one AMD box, and measure the speed of the code compiled with the Intel compiler. Is it fast enough? If yes, you're done, you can buy AMD, don't worry.
(2) If no: Compile the code with a different compiler and run it on the AMD box. Is it fast enough? If no, you're done, you can't buy AMD, don't worry.
(3) If yes: Run the same code on an Intel box. Is it fast enough? If yes, you're done, you can buy AMD but have to switch compilers, don't worry.
(4) If no: Possibilities are: Don't buy AMD, throw all Intel computers out, or compile with two different compilers. Pick one.
当供应商试图阻止 Lotus 产品在推出之前进入市场时,我直接经历过故意破坏技术的情况。 可行的技术是可用的,但 Lotus 被禁止使用它。 嗯……
几年前,有博客向用户展示,修补英特尔编译器中的单个字节会导致其发出“最佳”代码,并且在 AMD 上使用时不会受到损害。 我已经很多年没有查找过那些博客文章了。
我倾向于相信这种竞争行为会持续下去。 我没有其他证据可以提供。
I have directly experienced purposeful crippling of technology when a vendor attempted to prevent a Lotus product from reaching market before their offering. A working technology was available, but Lotus was forbidden to use it. Ah well...
A few years back there were blogs that showed users that patching a single byte in the Intel compiler caused it to emit "optimal" code that was not crippled when used on AMD. I have not looked for those blog entries in years.
I am inclined to believe that such competitive behavior continues. I have no other evidence to offer.