我什么时候应该在内核blockDev驱动程序中使用req_op_flush? (Req_op_flush Bio' flush Dirty Raid Controller Caches吗?)

发布于 2025-01-31 17:19:23 字数 675 浏览 5 评论 0原文

我什么时候应该在内核blockDev驱动程序中使用req_op_flush,并且接收req_op_flush(或同等的SCSI CMD)的硬件的预期行为是什么?

在Linux内核中,当A struct Bio标记为req_op_flush以写入式控制器的形式传递给RAID Controller卷时,RAID控制器是否应该冲洗其肮脏的缓存器?

在我看来,这是req_op_flush的目的,但这与想要快速使用写入返回的情况是不符的:如果缓存是电池收益的,是否应该忽略控制器吗?

ext4的super.c ext4 /a>,写入blkdev_issue_flush()通过barrier = 0安装选项禁用障碍时。这似乎暗示着RAID控制器被告知要加快了缓存...但是RAID固件是否会违反规则?

  • 冲洗行为是否取决于固件实施和制造商?
  • SAS/SCSI规范在哪里?
  • 其他考虑?

When should I use REQ_OP_FLUSH in my kernel blockdev driver, and what is the expected behavior of the hardware that receives the REQ_OP_FLUSH (or equivalent SCSI cmd)?

In the Linux kernel, when a struct bio is flagged as REQ_OP_FLUSH is passed to a RAID controller volume in writeback mode, is the RAID controller supposed to flush its dirty caches?

It seems to me that this is the purpose of REQ_OP_FLUSH but that is at odds with wanting to be fast with writeback: If the cache is battery-backed, shouldn't the controller ignore the flush?

In ext4's super.c ext4_sync_fs() function, the write skips a call to blkdev_issue_flush() when barriers are disabled via the barrier=0 mount option. This seems to imply that RAID controllers will flush their caches when they are told to...but does RAID firmware ever break the rules?

  • Is the flush behavior dependent on the firmware implementation and manufacturer?
  • Where is the SAS/SCSI specification on the subject?
  • Other considerations?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

忆离笙 2025-02-07 17:19:23

Linux-Block邮寄清单上的Christoph Hellwig说:

电源故障的设备
保护会宣传(例如,在NVME中使用VWC标志),[Linux内核]永远不会发送冲洗。

Kernel.org的Keith Busch:

您可以检查队列属性,/sys/block/< disk>/queue/write_cache。如果是
值是“通过”,然后设备报告没有一个
挥发性缓存。如果是“写回”,则具有挥发性的缓存。

如果这听起来很后退,请考虑使用突袭
控制器缓存为示例:

  1. a RAID控制器,带有 non-dolaTile “ writeback”缓存(从
    控制器的观点,即带有
    电池的)是“通过”
    就内核而言,设备是因为控制器将
    在持续的缓存中,将写入作为完整的写入。


  2. 具有 domaidile “写下”缓存的RAID控制器(来自
    控制器的观点,即没有电池的)是“写回”
    就内核而言,设备是因为控制器将
    在缓存中返回写入的文字,但
    缓存不是持久的!因此,在这种情况下,需要冲洗/FUA。


[参考:https://lore.kernel.org/all/< 8DDF8C88A98C9E818C8C8C9BC7878C9D“> [email&nbsp; procepted] < /a>/]

从个人经验中,并非所有RAID控制器都将正确设置Queue/write_cache,如上述Keith所示。如果您知道自己的数组具有以写入模式运行的非易失性高速缓存,请确保它在“写”中,以便将冲洗液被删除:

]# cat /sys/block/<disk>/queue/write_cache
<cache status>

如果不处于适当的模式,则将其修复。下面的这些设置似乎反映了,但是如果它们这样做,则在上面重新阅读#1和#2,因为这些是正确的

如果您有非挥发性cache ( IE,带有 bbu):

]# echo "write through" > /sys/block/<disk>/queue/write_cache

如果您有一个挥发性缓存(即,没有 bbu):

]# echo "write back" > /sys/block/<disk>/queue/write_cache

因此,关于何时标记的问题的答案req_op_flush在您的内核代码中是:每当您认为代码应将磁盘提交时。由于块层可以重新订购任何bio请求,

  1. 请发送写入io,等待其完成
  2. 发送齐平,等待齐平完成

,然后保证您将在磁盘上的#1中获得IO 。

但是,如果所编写的设备在“通过”模式中具有CACHE_MODE,则齐平将立即完成,即使在功率损失后,它都可以完成工作并保持非挥发性高速缓存(BBU,supercap之后) ,flashcache等)。

Christoph Hellwig on the linux-block mailing list said:

Devices with power fail
protection will advertise that (using VWC flag in NVMe for example) and [the Linux kernel] will never send flushes.

Keith Busch at kernel.org:

You can check the queue attribute, /sys/block/<disk>/queue/write_cache. If the
value is "write through", then the device is reporting it doesn't have a
volatile cache. If it is "write back", then it has a volatile cache.

If this sounds backwards, then consider this using a RAID
controller cache as an example:

  1. A RAID controller with a non-volatile "writeback" cache (from the
    controller's perspective, ie, with battery) is a "write through"
    device as far as the kernel is concerned because the controller will
    return the write as complete as soon as it is in the persistent cache.

  2. A RAID controller with a volatile "writeback" cache (from the
    controller's perspective, ie without battery) is a "write back"
    device as far as the kernel is concerned because the controller will
    return the write as complete as soon as it is in the cache, but the
    cache is not persistent! So in that case flush/FUA is necessary.

[ Reference: https://lore.kernel.org/all/[email protected]/ ]

From personal experience, not all raid controllers will properly set queue/write_cache as indicated by Keith above. If you know your array has a non-volatile cache running in write-back mode then check make sure it is in "write through" so flushes will be dropped:

]# cat /sys/block/<disk>/queue/write_cache
<cache status>

and fix it if it isn't in the proper mode. These settings below might seem backdwards, but if they do, then re-read #1 and #2 above because these are correct:

If you have a non-volatile cache (ie, with BBU):

]# echo "write through" > /sys/block/<disk>/queue/write_cache

If you have a volatile cache (ie, without BBU):

]# echo "write back" > /sys/block/<disk>/queue/write_cache

So the answer to the question about when to flag REQ_OP_FLUSH in your kernel code is this: whenever you think your code should commit to disk. Since the block layer can re-order any bio request,

  1. Send a WRITE IO, wait for its completion
  2. Send a flush, wait for flush completion

and then you are guaranteed to have the IO from #1 on disk.

However, if the device being written has cache_mode in "write through" mode, then the flush will complete immediately and its up to your controller do do its job and keep the non-volatile cache active, even after a power loss (BBU, supercap, flashcache, etc).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文