深入解析CPU和GPU的区别

发布于 2024-12-08 11:41:54 字数 182 浏览 5 评论 0原文

我一直在寻找 CPU 和 GPU 之间的主要区别,更准确地说,是区分 cpu 和 gpu 的细线。例如,为什么不使用多个 cpu 而不是 GPU,反之亦然。为什么 GPU 在处理计算中比 CPU“更快”?其中一个可以做而另一个不能做或不能高效地完成哪些类型的事情以及原因。请不要回复“中央处理器”和“图形处理器”之类的答案。我正在寻找深入的技术答案。

I've been searching for the major differences between a CPU and a GPU, more precisely the fine line that separates the cpu and gpu. For example, why not use multiple cpus instead of a gpu and vice versa. Why is the gpu "faster" in crunching calculations than the cpu. What are some types of things that one of them can do and the other can't do or do efficiently and why. Please don't reply with answers like " Central processing unit " and " "Graphics processing unit". I'm looking for a in-depth technical answer.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

來不及說愛妳 2024-12-15 11:41:54

GPU 基本上是大规模并行计算机。它们可以很好地解决可以使用大规模数据分解的问题,并且可以为这些问题提供数量级的加速。

然而,GPU 中的各个处理单元在通用性能方面无法与 CPU 相媲美。它们要简单得多,并且没有长管道、乱序执行和指令级并行等优化。

它们还有其他缺点。首先,你的用户必须拥有一个,除非你控制硬件,否则你不能依赖它。此外,将数据从主内存传输到 GPU 内存并返回时也会产生开销。

因此,这取决于您的要求:在某些情况下,GPU 或 Tesla 等专用处理单元是明显的赢家,但在其他情况下,您的工作无法分解以充分利用 GPU,而开销则使 CPU 成为更好的选择。

GPUs are basically massively parallel computers. They work well on problems that can use large scale data decomposition and they offer orders of magnitude speedups on those problems.

However, individual processing units in a GPU cannot match a CPU for general purpose performance. They are much simpler and do not have optimizations like long pipelines, out-of-order execution and instruction-level-parallelizaiton.

They also have other drawbacks. Firstly, you users have to have one, which you cannot rely on unless you control the hardware. Also there are overheads in transferring the data from main memory to GPU memory and back.

So it depends on your requirements: in some cases GPUs or dedicated processing units like Tesla are the clear winners, but in other cases, your work cannot be decomposed to make full use of a GPU and the overheads then make CPUs the better choice.

温柔少女心 2024-12-15 11:41:54

首先观看这个演示:

http://www.nvidia.com/object/nvision08_gpu_v_cpu.html

太有趣了!

所以这里重要的是“CPU”可以被控制来执行基本上任何命令的计算;对于彼此不相关的计算,或者每个计算都强烈依赖于其邻居(而不仅仅是相同的操作)的计算,您通常需要完整的 CPU。例如,编译一个大型 C/C++ 项目。编译器必须按顺序读取每个源文件的每个标记,然后才能理解下一个的含义;仅仅因为有很多源文件需要处理,它们都具有不同的结构,因此相同的计算不适用于源文件。

您可以通过使用多个独立的 CPU(每个 CPU 处理单独的文件)来加快速度。将速度提高 X 倍意味着您需要 X 个 CPU,其成本是 1 个 CPU 的 X 倍。


某些类型的任务涉及对数据集中的每个项目执行完全相同的计算;一些物理模拟看起来像这样;在每一步中,模拟中的每个“元素”都会移动一点;其近邻对其施加的力的“总和”。

由于您要对大量数据进行相同的计算,因此您可以重复 CPU 的某些部分,但共享其他部分。 (在链接的演示中,空气系统、阀门和瞄准是共享的;每个彩弹只有重复的枪管)。进行 X 次计算所需的硬件成本不到 X 倍。

明显的缺点是共享硬件意味着您无法告诉并行处理器的一个子集做一件事,而另一个子集做一些不相关的事情。当 GPU 执行一项任务然后执行另一项不同的任务时,额外的并行能力将会被浪费。

First watch this demonstration:

http://www.nvidia.com/object/nvision08_gpu_v_cpu.html

That was fun!

So what's important here is that the "CPU" can be controlled to perform basically any calculation on command; For calculations that are unrelated to each other, or where each computation is strongly dependent on its neighbors (rather than merely the same operaton), you usually need a full CPU. As an example, compiling a large C/C++ project. The compiler has to read each token of each source file in sequence before it can understand the meaning of the next; Just because there are lots of source files to process, they all have different structure, and so the same calculations don't apply accros the source files.

You could speed that up by having several, independent CPU's, each working on separate files. Improving the speed by a factor of X means you need X CPU's which will cost X times as much as 1 CPU.


Some kinds of task involve doing exactly the same calculation on every item in a dataset; Some physics simulations look like this; in each step, each 'element' in the simulation will move a little bit; the 'sum' of the forces applied to it by its immediate neighbors.

Since you're doing the same calculation on a big set of data, you can repeat some of the parts of a CPU, but share others. (in the linked demonstration, the air system, valves and aiming are shared; Only the barrels are duplicated for each paintball). Doing X calculations requires less than X times the cost in hardware.

The obvious disadvantage is that the shared hardware means that you can't tell a subset of the parallel processor to do one thing while another subset does something unrelated. the extra parallel capacity would go to waste while the GPU performs one task and then another different task.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文