当前位置：文江博客话题详情

硬盘直接内存访问的目的是什么？

发布于 2024-09-19 03:12:44 字数 467 浏览 9 评论 0原文

乍一看，让硬盘自行写入 RAM 似乎是一个好主意，而无需 CPU 指令复制数据，特别是考虑到异步网络的成功。但是关于直接内存访问 (DMA) 的维基百科文章指出了这一点：

通过 DMA，CPU 可以摆脱这种开销，并可以在数据传输期间执行有用的任务（尽管 CPU 总线会被 DMA 部分阻塞）。

我不明白公交线路如何被“部分阻塞”。据推测，内存当时可以由一台设备访问，然后 CPU 似乎几乎没有什么有用的工作可以做。第一次尝试读取未缓存的内存时，它会被阻止，我预计在 2 MB 缓存的情况下，速度会很快。

释放 CPU 来执行其他任务的目标似乎是没有道理的。硬盘 DMA 在实践中是否会促进性能提升？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蝶舞 2024-09-26 03:12:45

1：PIO（编程IO）扰动CPU缓存。大多数时候，从磁盘读取的数据不会立即被处理。应用程序通常以大块的形式读取数据，但 PIO 是以较小的块（通常为 64K IIRC）完成的。因此，数据读取应用程序将等待直到大块被传输，并且在从控制器获取小块之后，不会从缓存中的小块中受益。同时，其他应用程序将遭受大部分缓存因传输而被逐出的影响。这可能可以通过使用特殊指令来避免，这些指令指示 CPU 不要缓存数据，而是将其“直接”写入主内存，但是我非常确定这会减慢复制循环的速度。因此造成的伤害甚至比缓存抖动还要严重。

2：PIO 由于是在 x86 系统以及可能大多数其他系统上实现的，因此与 DMA 相比确实很慢。问题不在于 CPU 不够快。问题源于总线和磁盘控制器的 PIO 模式的设计方式。如果我没记错的话，CPU 必须从所谓的 IO 端口读取每个字节（或使用 32 位 PIO 模式时的每个 DWORD）。这意味着对于每个 DWORD 数据，端口的地址必须放在总线上，并且控制器必须通过将数据 DWORD 放在总线上进行响应。而使用 DMA 时，控制器可以利用总线和/或内存控制器的全部带宽传输数据突发。当然，这种传统的 PIO 设计还有很大的优化空间。 DMA 传输就是这样一种优化。仍被视为 PIO 的其他解决方案也可能是可能的，但它们仍然会遇到其他问题（例如上面提到的缓存抖动）。

3：内存和/或总线带宽不是大多数应用的限制因素，因此 DMA 传输不会造成任何延迟。它可能会稍微减慢某些应用程序的速度，但通常应该很难注意到。毕竟，与总线和/或内存控制器的带宽相比，磁盘相当慢。 “磁盘”（SSD、RAID 阵列）可提供 > 500 MB/s 确实很快。不能至少提供 10 倍该数字的总线或内存子系统必定来自石器时代。 OTOH PIO 在传输数据块时确实会完全停止 CPU。

1: PIO (programmed IO) thrashes the CPU caches. The data read from the disk will, most of the time, not be processed immediately afterwards. Data is often read in large chunks by the application, but PIO is done in smaller blocks (typically 64K IIRC). So the data-reading application will wait until the large chunk has been transferred, and not benefit from the smaller blocks being in the cache just after they have been fetched from the controller. Meanwhile other applications will suffer from large parts of the cache being evicted by the transfer. This could probably be avoided by using special instructions which instruct the CPU not to cache data but write it "directly" to the main memory, however I'm pretty certain that this would slow down the copy-loop. And thereby hurt even more than the cache-thrashing.

2: PIO, as it's implemented on x86 systems, and probably most other systems, is really slow compared to DMA. The problem is not that the CPU wouldn't be fast enough. The problem stems from the way the bus and the disk controller's PIO modes are designed. If I'm not mistaken, the CPU has to read every byte (or every DWORD when using 32 bit PIO modes) from a so-called IO port. That means for every DWORD of data, the port's address has to be put on the bus, and the controller must respond by putting the data DWORD on the bus. Whereas when using DMA, the controller can transfer bursts of data, utilizing the full bandwidth of the bus and/or memory controller. Of course there is much room for optimizing this legacy PIO design. DMA transfers are such an optimization. Other solutions that would still be considered PIO might be possible too, but then again they would still suffer from other problems (e.g. the cache thrashing mentioned above).

3: Memory- and/or bus-bandwidth is not the limiting factor for most applications, so the DMA transfer will not stall anything. It might slow some applications down a little, but usually it should be hardly noticeable. After all disks are rather slow compared with the bandwidth of the bus and/or memory controller. A "disk" (SSD, RAID array) that delivers > 500 MB/s is really fast. A bus or memory subsystem that cannot at least deliver 10 times that number must be from the stone ages. OTOH PIO really stalls the CPU completely while it's transferring a block of data.

回复收藏 0 原文

栖迟 2024-09-26 03:12:45

我不知道我是否遗漏了什么。

假设我们没有 DMA 控制器。从“慢速”设备到内存的每次传输对于 CPU 来说都是一个循环，

ask_for_a_block_to_device 
wait_until_device_answer (or change_task_and_be_interrupted_when_ready)
write_to_memory

因此 CPU 应该自己写入内存。一块一块地。

是否需要使用CPU来进行内存传输？不。我们使用另一个设备（或 DMA 总线主控等机制）将数据传输到内存或从内存传输数据。

同时，CPU 可能会做一些不同的事情，例如：使用缓存做事情，但甚至花大量时间访问内存的其他部分。

这是关键部分：数据并未 100% 地传输，因为其他设备速度非常慢（与内存和 CPU 相比）。

尝试表示共享内存总线使用情况的示例（C 当由 CPU 访问时，D，当由 DMA 访问时）

Memory Bus ----CCCCCCCC---D----CCCCCCCCCDCCCCCCCCC----D

正如您所看到的，内存一次访问一个设备。有时由 CPU，有时由 DMA 控制器。 DMA 的次数很少。

I don't know if I'm missing anything.

Let's suppose we don't have DMA controller. Every transfer from the "slow" devices to the memory would be for the CPU a loop

ask_for_a_block_to_device 
wait_until_device_answer (or change_task_and_be_interrupted_when_ready)
write_to_memory

So the CPU should have to write the memory itself. Chunk by chunk.

Is it necessary the use of a CPU for doing memory transfers? No. We use another device (or mecanism like DMA bus mastering) which transfers data to/from the memory.

Meanwhile CPU could be doing something different like : doing things with cache, but even accessing other parts of the memory a great share of the time.

This is the crucial part: data is not being transfered 100% of the time, because the other device is very slow (compared to memory and CPU).

Trying to represent an example of the shared memory bus usage (C when accesed by CPU, D, when accesed by DMA)

Memory Bus ----CCCCCCCC---D----CCCCCCCCCDCCCCCCCCC----D

As you can see memory is accesed one device at a time. Sometimes by CPU, sometimes by the DMA controller. The DMA very few times.

回复收藏 0 原文

趁年轻赶紧闹 2024-09-26 03:12:45

我不明白公交线路如何“部分阻塞”

在许多时钟周期的一段时间内，有些会被阻塞，有些不会。引用墨尔本大学：

第二季度。什么是循环盗窃？为什么有循环可以窃取？
A2。当 DMA 设备向内存传输数据或从内存传输数据时，它会
（在大多数架构中）使用与 CPU 使用相同的总线
来访问内存。如果CPU同时想使用总线
作为 DMA 设备时，CPU 将停滞一个周期，因为
DMA 设备具有更高的优先级。这是为了防止
小 DMA 缓冲区溢出。（CPU 永远不会出现超限问题。）
大多数现代 CPU 都具有满足大多数内存引用的缓存
无需通过总线访问主存。因此 DMA 将
对他们的影响要小得多。

即使在 DMA 块传输发生时 CPU 完全处于饥饿状态，它也会比 CPU 必须处于循环中将字节移入 I/O 设备或从 I/O 设备移出的速度更快。