在没有写屏障的情况下,磁盘控制器如何处理对同一扇区的并发写入?
当我使用 O_DIRECT|O_ASYNC 打开文件并对同一个磁盘扇区进行两次并发写入(中间没有 fsync 或 fdatasync)时,linux 磁盘子系统或硬件磁盘控制器是否提供任何保证该磁盘扇区上的最终数据将是第二个写的吗?
虽然 O_DIRECT 确实绕过了操作系统缓冲区高速缓存,但数据最终会进入低级 IO 队列(磁盘调度程序队列、磁盘驱动程序队列、硬件控制器的高速缓存/队列等)。我已经从 IO 堆栈一直追踪到电梯算法。
例如,如果以下请求序列最终出现在磁盘调度程序队列中,
write sector 1 from buffer 1
write sector 2 from buffer 2
write sector 1 from buffer 3 [Its not buffer 1!!]
则电梯代码将执行“反向合并”以分别合并来自缓冲区 1,2 的扇区 1,2。然后发出磁盘两个磁盘IO。但我不确定磁盘扇区 1 上的最终数据是来自缓冲区 1 还是缓冲区 3(因为我不知道驱动程序/控制器的写入重新排序语义)。
场景2:
write sector 1 from buffer 1
write sector 500 from buffer 2
write sector 1 from buffer 3
这种场景将如何处理? 一个更基本的问题是,当使用 AIO 以 O_DIRECT 模式进行写入时,在没有显式写入屏障的情况下,该请求序列能否最终出现在磁盘调度程序的队列中?
如果是,是否有任何顺序保证,例如“对同一扇区的多次写入将导致最后一次写入成为最终写入”?
或者排序是不确定的[受磁盘控制器/其缓存的支配,在障碍内重新排序写入以优化寻道时间]
When I open a file with O_DIRECT|O_ASYNC and do two concurrent writes to the same disk sector, without a fsync or fdatasync in between, does the linux disk subsystem or the Hardware disk controllers offer any guarantee that the final data on that disk sector will be the second write ?
While its true that O_DIRECT bypasses the OS buffer cache, data ultimately ends up in the low level IO queue (disk scheduler queue, disk driver's queue, hardware controller's cache/queues etc). I have traced the IO stack all the way down to the elevator algorithm.
For example if the following sequence of requests end up in the disk scheduler queue
write sector 1 from buffer 1
write sector 2 from buffer 2
write sector 1 from buffer 3 [Its not buffer 1!!]
the elevator code would do a "back merge" to coalesce sector1,2 from buffers 1,2 respectively. And then issue disk two disk IOs. But I am not sure if the final data on disk sector 1 is from buffer 1 or buffer 3 (as I dont know about the write re-ordering semantics of drivers/controllers).
Scenario 2:
write sector 1 from buffer 1
write sector 500 from buffer 2
write sector 1 from buffer 3
How will this scenario be handled?
A more basic question is when doing writes in O_DIRECT mode with AIO, can this sequence of requests end up in the disk scheduler's queue, in the absence of explicit write barriers ?
If yes, is there any ordering guarantee like "multiple writes to same sector will result in the last write being the final write" ?
or is that ordering non-deterministic [left at the mercy of the disk controller/its caches that reorder writes within barriers to optimize seek time]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
障碍正在消失。如果您需要在重叠写入之间进行排序,则应该等待第一个写入完成后再发出第二个写入。 (障碍正在消失。)
在一般情况下,我相信没有任何保证。从应用程序的角度来看,最终结果是不确定的,具体取决于时间、主机和存储设备的状态等。
请求队列将以可预测的方式合并请求,但硬件不需要为以下写入提供一致的结果:同时在驱动器的队列中。
根据存储设备的速度和主机 CPU 的速度,您不一定能保证在命令发送到存储设备之前在请求队列中进行合并。
不幸的是,我不清楚使用 O_DIRECT(相对于直接构建 BIOS 的文件系统)的应用程序应该如何等待完成。
Barriers are going away. If you require ordering among overlapping writes, you're supposed to wait for completion of the first before issuing the second. (Barriers are going away.)
In the general case I believe there is no guarantee. The final result is non-deterministic from the application perspective, depending on timing, state of the host and storage device, etc.
The request queue will merge requests in a predictable fashion, but hardware is not required to provide consistent results for writes that are in the drive's queue at the same time.
Depending on how fast the storage device is and how slow the host CPU is, you can't necessarily guarantee that merging will take place in the request queue before commands are sent to the storage device.
Unfortunately, how applications using O_DIRECT (as opposed to filesystems that directly construct bios) are supposed to wait for completion is not clear to me.
好的,写入请求最终会进入线性电梯队列。此时,它们是否来自不同的线程并不重要。相同的安排可能是单个线程发出三个连续写入的结果。现在,您会将您的文件信任给操作系统或控制器,以某种任意方式对同一扇区进行顺序写入吗?我不会,但我当然可能是错的:)
OK, write requests end up in a linear elevator queue. At this point it's not relevant whether they came from different threads. Same arrangement could be a result of a single thread issuing three sequential writes. Now, would you trust your files to an OS or to a controller that reorders sequential writes to the same sector in some arbitrary fashion? I wouldn't but I might be wrong of course :)