我正在构建一个工作站,并希望进行一些繁重的 CUDA 编程。我不想全力以赴购买 Tesla 卡,并且已将范围缩小到 Quadro 4000 和 GeForce 480,但我不太明白其中的区别,从纸面上看,480 的核心数更多480 与 4000 的 256 相比,但 4000 的价格几乎是 480 的两倍。有人理解这里的差异来证明更高的价格是合理的吗?
我将对它进行科学计算,所以一切都将是双精度的,如果这在它们之间产生差异的话。
I'm building a workstation and want to get into some heavy CUDA programming. I don't want to go all out getting the Tesla cards and have pretty much narrowed it down to either the Quadro 4000 and the GeForce 480, but I don't really understand the difference, on paper it looks like the 480 has more cores 480 vs 256 for the 4000, but the 4000 is almost twice as much the 480 in price. Does someone understand the difference here to justify the higher price.
I will be doing scientific computing on it, so everything will be in double precision, if that makes a difference between them.
发布评论
评论(4)
如果您既不关心可视化也不关心渲染(在屏幕上绘制最终结果,例如光线跟踪),那么您的问题的答案会稍微简单一些,但并非微不足道。
我不会详细介绍 Quadro 和 GeForce 卡之间的差异,但我只会强调有助于在它们之间进行选择的重要要点。
一般来说:
如果您需要大量内存,那么您需要 Tesla 或 Quadro。消费卡 ATM 的最大容量为 1.5 Gb (GTX 480),而 Tesla 和 Quadros 的最大容量为 6 Gb。
GF10x 系列卡的双精度 (FP64) 性能上限为单精度 (FP32) 性能的 1/8,而架构则可以达到 1/2。这是当今硬件制造商中非常流行的另一种市场细分技巧。削弱GeForce系列的目的是让Tesla系列在HPC方面获得优势;事实上,GTX 480 比 Tesla 20x0 更快 - 1.34TFlops 与 1.03 TFlops、177.4 Gb/秒与 144 Gb/秒(峰值)。
Tesla 和 Quadro 经过(应该)更彻底的测试,因此不太容易产生与游戏无关的错误,但在科学计算方面,只需一个位翻转就可能破坏结果。 NVIDIA 声称 Tesla 卡符合 QC-d 标准,适合 24/7 使用。
最近的一篇论文(Haque 和 Pande,《软数据上的硬数据》错误:GPGPU 中实际错误率的大规模评估)表明 Tesla 确实不太容易出错。
我的经验是,GeForce 卡往往不太可靠,尤其是在持续高负载的情况下。适当的冷却非常重要,同时避免超频卡(包括工厂超频型号)(参见 前面提到的论文)。
因此,根据经验:
用于生产 HPC/科学计算:
Quadro,如果需要 FP64 和/或还需要高级渲染功能(新的“Fermi”Tesla 具有与 GeForce 类似的渲染功能)
如果你想集中使用 FP64,请忘记 GeForce,否则
回到具体细节你的问题:
你提到的两张牌来自完全不同的联赛,因此不能直接比较。如果您需要Quadro 的渲染功能,请购买Quadro。否则,Quadro 并不真正值得,尤其是 4000,它甚至比 GTX 460 还要慢,而价格却高出约 3.5 倍。我认为你最好使用 GTX 470 或 480,只要确保你购买的是标准频率的即可。
请注意,在此比较中,GeForce 双精度性能的下降不是问题,但让我详细说明一下。由于 Quadro 4000 是一款低端型号,AFAIR 只有 450 MHz 着色器(我找不到参考 ATM,但它绝对应该低于时钟频率为 513 MHz 的 5000),这使其具有大约 115 GFlops FP64。与此同时,GTX 480 的上限约为 168 GFlops FP64,甚至 GTX 460 也约为 113 GFlops(峰值)。
与 GTX 480 相比,Quadro 4000 的 FP32 性能和内存带宽都低得多(86.9 GB/s vs 177.4 GB/s)!
请注意,从理论峰值性能的角度来看,GTX 480(数据表)比 Tesla C2050/2070 和 Quadro 6000 这反映在大多数应用程序中。
If you neither care about visualization nor rendering (drawing final results on screen e.g. raytracing) than the answer to your question is slightly more simple, but not trivial.
I'm not going to go into detail about the differences between Quadro and GeForce cards, but I will just underline the significant points which can contribute in choosing between them.
In general:
If you need lots of memory than you need Tesla or Quadro. Consumer cards ATM have max 1.5 Gb (GTX 480) while Teslas and Quadros up to 6 Gb.
GF10x series cards have their double precision (FP64) performance capped at 1/8-th of the single precision (FP32) performance, while the architecture is capable of 1/2. Yet another market segmentation trick, quite popular nowadays among hardware manufacturers. Crippling the GeForce line is meant to give the Tesla line an advantage in HPC; GTX 480 is in fact faster than Tesla 20x0 - 1.34TFlops vs 1.03 TFlops, 177.4 Gb vs 144 Gb/sec (peak).
Tesla and Quadro are (supposed to be) more thoroughly tested and therefore less prone to produce errors that are pretty much irrelevant in gaming, but when it comes to scientific computing, just a single bit flip can trash the results. NVIDIA claims that Tesla cards are QC-d for 24/7 use.
A recent paper (Haque and Pande, Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU) suggests that Tesla is indeed less error prone.
My experience is that GeForce cards tend to be less reliable, especially at constant hight load. Proper cooling is very important, as well as avoiding overclocked cards including factory overclokced models (see Figure 1 of the previously mentioned paper).
So as a rule of thumb:
for production HPC/scientific computing:
Quadro if need FP64 and/or also need advanced rendering features (the new "Fermi" Teslas have similar rendering capabilities as a GeForce)
If you want to use FP64 intensively, forget about GeForce, otherwise
Back to the specifics of your question:
The two cards you mention are from entirely different league and therefor not directly comparable. If you need the Quadro's rendering features get a Quadro. Otherwise, Quadro is not really worth it especially not the 4000 which is even slower than a GTX 460 while it costs ~3.5x more. I think you're better off with a GTX 470 or 480, just make sure that you buy the ones with standard frequencies.
Note that the crippled GeForce double precision performance is not an issue in this comparison, but let me elaborate. As the Quadro 4000 is a low-end model with AFAIR only 450 MHz shaders (I can't find the reference ATM, but it should be definitely lower than the 5000 which is clocked at 513 MHz) which gives it around 115 GFlops FP64. At the same time, the capped GTX 480 is around 168 GFlops FP64 and even a GTX 460 is around 113 GFlops (peak).
Both the FP32 performance and memory bandwidth is much lower on the Quadro 4000 comapred to the GTX 480 (86.9 vs 177.4 GB/s)!
Note, that from the point of view of theoretical peak performance the GTX 480 (data sheet) is considerably faster than both Tesla C2050/2070 and Quadro 6000 which is reflected in most applications.
Quadro/Tesla 卡还有一些上面未提及的小优点:
当然,这些优势对大多数人来说没有任何区别。但对于某些用途来说,它们是至关重要的。
There are some small advantages to the Quadro/Tesla cards not mentioned above:
Certainly these advantages don't make any difference for most people. For certain uses, though, they're critical.
对于 CUDA 编程,“游戏玩家”GPU(GeForce GTX 系列,如 GTX 780)相对于“专业”GPU(Tesla 系列和 Quadro 系列)的优势在于:
但
显然,GPU 的选择取决于您的应用程序的需求。但我认为对于大多数应用来说 GTX 是更好的选择。例如,在许多图像处理应用中,单精度就足够了,考虑到性能和价格,GTX 显然是更好的选择。例如,在OpenCV GPU库主要开发人员撰写的这篇文章中,作者使用了NVidia GTX 580 对 CPU 的结果进行基准测试。如果您需要更好的双精度性能或更多内存,我建议选择 Quadro 或 Tesla。
The advantages of a "gamer" GPU (GeForce GTX series, like GTX 780) over "professional" GPU (Tesla series and Quadro series) for CUDA programming are:
But
Clearly the choice of GPU depends on what you need in your application. But I think that for most applications GTX is a better choice. For example, in many image processing applications single precision is enough and GTX is clearly a better choice considering it performance and price. For example, in this article written by main developers of OpenCV GPU library the authors used NVidia GTX 580 for bench-marking their results against the CPU. I'd say go with Quadro or Tesla if you need better double precision performance or more memory.
从规格来看并不明显,但我认为您对双精度的需求表明 Quadro 4000 是更好的匹配。尽管 GeForce 480 拥有更多的内核和两倍的内存带宽,但它的核心是一张游戏卡。 Quadros 面向专业工作,因此能得到更好的支持。此外,Quadro 可以实现 64 倍抗锯齿(而 GeForce 上为 32 倍)这一事实表明该卡的能力更强。
It's not obvious looking at the specs, but I think that your need for double precision suggests the Quadro 4000 is a better match. Although the GeForce 480 has more cores and twice as much memory bandwidth, at its heart it's a gaming card. Quadros are targeted at professional work, and better supported as a result. Also, the fact that the Quadro can do 64x antialiasing (vs. 32x on the GeForce) suggests a more capable card.