第一个GPU如何获得CPU的支持？

发布于 2025-02-13 00:26:20 字数 104 浏览 0 评论 0 原文

我想CPU必须具有允许其与GPU进行交流和合作的功能，我可以想象到今天存在，但是在GPU的早期，公司如何获得大型CPU公司的支持来支持其设备，并得到支持， CPU公司添加了哪些功能来启用这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

二智少女 2025-02-20 00:26:20

您的意思是，除了在PCI之类的公共汽车上做设备之外，您的意思是特殊支持？（或更年长的ISA或VLB。）

TL：DR：CPU具有的所有特殊功能对于改进的带宽写作（有时甚至是阅读）视频记忆 3D图形卡都在商业上成功了。它们不是必需的，只是表演的提升。

一旦GPU在商业上成功且受欢迎，并且是游戏PC的必要组成部分，CPU供应商添加功能以使事情变得更好是显而易见的。

相同的IO总线可让您插入声卡或网卡，已经具有访问设备内存和MMIO的功能，以及设备IO端口，这对于视频驱动程序制作图形卡做事是必要的。

现代GPU通常是系统（尤其是非服务器）中最高的带宽设备，因此它们受益于快速公共汽车，因此 agp 有一段时间，直到PCI Express（PCIE）再次统一了所有内容。

无论如何，图形卡可以在标准公共汽车上使用；只有3D图形变得流行和商业上很重要（并且足够快，可以使PCI总线成为瓶颈），这需要改变。到那时，CPU /主板公司完全意识到消费者关心了3D游戏，因此，专门为图形卡开发新的公共汽车是有意义的。

（与 gart ，图形地址/apertics地址/aperture remapper table，这使它变得更容易，使它变得更容易/更安全的驱动程序让AGP或PCIE视频卡直接从系统内存中读取。在用户空间中，不允许用户空间读取任意系统内存，这要归功于它是一个只允许一个地址范围的IOMMU。）

在GART成为一件事之前，我假设PCI GPU的驱动程序需要让主机CPU启动DMA到设备。或者，如果GPU确实发生了Bus-Master DMA，则可以在系统中读取系统内存的任何字节，因此驾驶员必须小心，不要让程序通过任意指针。

无论如何，拥有GART是新的，AGP是新的，它是早期早期的3D图形卡，例如和 ati 3d rage 。我不知道足够的详细信息，无法确定我可以准确地描述GART启用的功能。

因此，大多数对GPU的支持是在公共汽车方面，因此是芯片组的东西，而不是适当的CPU。（当时，CPU没有集成的内存控制器，而只是通过前总线与芯片组北桥进行交谈。）

相关CPU 指令包括Intel的SSE和SSE2指令集，其中具有流= = NT =非时空的商店非常适合存储大量数据，这些数据很快就会由CPU重新阅读，如果有的话。

第二代Core2（2008 ISH）中的sse4.1添加了一个流 load 指令（ movntdqa ），该（仍然）只有在标记在记忆区域上的记忆区域上使用的任何内容CPU的页面表或MTRR为WC（又名USWC：不可原谅的，写入组合）。从GPU存储器复制到主机是预期的用例。（）

x86 cpu介绍 mtrr （内存类型范围寄存器）是另一个改进CPU - ＆gt的功能； GPU写带宽。同样，这是在3D图形在商业上成功进行游戏之后。

You mean special support beyond just being devices on a bus like PCI? (Or even older, ISA or VLB.)

TL:DR: All the special features CPUs have which are useful for improved bandwidth to write (and sometimes read) video memory came after 3D graphics cards were commercially successful. They weren't necessary, just a performance boost.

Once GPUs were commercially successful and popular, and a necessary part of a gaming PC, it made obvious sense for CPU vendors to add features to make things better.

The same IO busses that let you plug in a sound card or network card already have the capabilities to access device memory and MMIO, and device IO ports, which is all that's necessary for video drivers to make a graphics card do things.

Modern GPUs are often the highest-bandwidth devices in a system (especially non-servers), so they benefit from fast buses, hence AGP for a while, until PCI Express (PCIe) unified everything again.

Anyway, graphics cards could work on standard busses; it was only once 3D graphics became popular and commercially important (and fast enough for the PCI bus to be a bottleneck), that things needed to change. At that point, CPU / motherboard companies were fully aware that consumers cared about 3D games, and thus it would make sense to develop a new bus specifically for graphics cards.

(Along with a GART, graphics address/aperture remapping table, an IOMMU that made it much easier / safer for drivers to let an AGP or PCIe video card read directly from system memory. Including I think with addresses under control of user-space, without letting user-space read arbitrary system memory, thanks to it being an IOMMU that only allows a certain address range.)

Before the GART was a thing, I assume drivers for PCI GPUs needed to have the host CPU initiate DMA to the device. Or if bus-master DMA by the GPU did happen, it could read any byte of physical memory in the system if it wanted, so drivers would have to be careful not to let programs pass arbitrary pointers.

Anyway, having a GART was new with AGP, which post-dates early 3D graphics cards like 3dfx's Voodoo and ATI 3D Rage. I don't know enough details to be sure I'm accurately describing the functionality a GART enables.

So most of the support for GPUs was in terms of busses, and thus a chipset thing, not CPUs proper. (Back then, CPUs didn't have integrated memory controllers, instead just talking to the chipset northbridge over a frontside bus.)

Relevant CPU instructions included Intel's SSE and SSE2 instruction sets, which had streaming (NT = non-temporal) stores which are good for storing large amounts of data that won't be re-read by the CPU any time soon, if at all.

SSE4.1 in 2nd-gen Core2 (2008 ish) added a streaming load instruction (movntdqa) which (still) only does anything special if used on memory regions marked in the CPU's page tables or MTRR as WC (aka USWC: uncacheable, write-combining). Copying back from GPU memory to the host was the intended use-case. (Non-temporal loads and the hardware prefetcher, do they work together?)

x86 CPUs introducing the MTRR (Memory Type Range Register) is another feature that improved CPU -> GPU write bandwidth. Again, this came after 3D graphics were commercially successful for gaming.

回复收藏 0 原文

~没有更多了~