创建未刷新的文件输出缓冲区
我正在尝试解决在 Linux 上运行的几个不同语言的程序中未刷新的文件 I/O 缓冲区出现的问题。刷新缓冲区的解决方案很简单,但是未刷新缓冲区的问题是随机发生的。我对如何创建(重现)和诊断这种情况感兴趣,而不是寻求可能导致这种情况的帮助。
这引出了一个由两部分组成的问题:
是否可以人为且轻松地构造实例,在给定的时间段内,可以拥有已知未刷新的输出缓冲区?我的搜索结果是空的。一个简单的基线是在一个进程中锤击硬盘驱动器(例如交换),同时尝试从另一个进程写入大量数据。虽然这“有效”,但它使系统几乎无法使用:我无法四处查看发生了什么。
Linux 中是否有命令可以识别给定进程具有未刷新的文件输出缓冲区?这是可以在命令行运行的东西,还是需要直接查询内核?我一直在研究
fsync
、sync
、ioctl
、flush
、bdflush
、和其他人。然而,由于缺乏创建未刷新缓冲区的方法,因此尚不清楚这些可能揭示什么。
为了为其他人重现,C 中#1 的示例将非常好,但问题确实与语言无关 - 只要知道创建这种情况的方法就会对我正在使用的其他语言有所帮助。
更新 1:我的对于任何混乱,我们深表歉意。正如几个人指出的那样,缓冲区可以位于内核空间或用户空间。这有助于查明问题:我们正在创建大的脏内核缓冲区。这种区别和答案完全解决了#1:现在似乎很清楚如何在用户空间或内核空间中重新创建未刷新的缓冲区。不过,识别哪个进程 ID 具有脏内核缓冲区尚不清楚。
I am trying to clear up an issue that occurs with unflushed file I/O buffers in a couple of programs, in different languages, running on Linux. The solution of flushing buffers is easy enough, but this issue of unflushed buffers happens quite randomly. Rather than seek help on what may cause it, I am interested in how to create (reproduce) and diagnose this kind of situation.
This leads to a two-part question:
Is it feasible to artificially and easily construct instances where, for a given period of time, one can have output buffers that are known to be unflushed? My searches are turning up empty. A trivial baseline is to hammer the hard drive (e.g. swapping) in one process while trying to write a large amount of data from another process. While this "works", it makes the system practically unusable: I can't poke around and see what's going on.
Are there commands from within Linux that can identify that a given process has unflushed file output buffers? Is this something that can be run at the command line, or is it necessary to query the kernel directly? I have been looking at
fsync
,sync
,ioctl
,flush
,bdflush
, and others. However, lacking a method for creating unflushed buffers, it's not clear what these may reveal.
In order to reproduce for others, an example for #1 in C would be excellent, but the question is truly language agnostic - just knowing an approach to create this situation would help in the other languages I'm working in.
Update 1: My apologies for any confusion. As several people have pointed out, buffers can be in the kernel space or the user space. This helped pinpoint the problems: we're creating big dirty kernel buffers. This distinction and the answers completely resolve #1: it now seems clear how to re-create unflushed buffers in either user space or kernel space. Identifying which process ID has dirty kernel buffers is not yet clear, though.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您对内核缓冲数据感兴趣,那么您可以通过 /proc/sys/vm/dirty_* 中的 sysctl 调整 VM 写回。特别是,
dirty_expire_centisecs
是指脏数据有资格进行写回的期限(以百分之一秒为单位)。增加此值将为您提供更长的时间来进行调查。您还可以增加dirty_ratio
和dirty_background_ratio
(它们是系统内存的百分比,分别定义同步和异步写回的开始点)。实际上创建脏页很容易 - 只需
write(2)
写入文件并退出而不同步,或者弄脏文件的MAP_SHARED
映射中的某些页面。If you are interested in the kernel-buffered data, then you can tune the VM writeback through the sysctls in
/proc/sys/vm/dirty_*
. In particular,dirty_expire_centisecs
is the age, in hundredths of a second, at which dirty data becomes eligible for writeback. Increasing this value will give you a larger window of time in which to do your investigation. You can also increasedirty_ratio
anddirty_background_ratio
(which are percentages of system memory, defining the point at which synchronous and asynchronous writeback start respectively).Actually creating dirty pages is easy - just
write(2)
to a file and exit without syncing, or dirty some pages in aMAP_SHARED
mapping of a file.一个具有未刷新缓冲区的简单程序是:
Stdio,默认情况下,当连接到终端时,仅刷新换行符上的 stdout。
A simple program that would have an unflushed buffer would be:
Stdio, by default only flushes stdout on newlines, when connected to a terminal.
通过控制接收端很容易导致缓冲区未刷新。 *nix 系统的美妙之处在于一切看起来都像一个文件,因此您可以使用特殊文件来做您想做的事情。最简单的选择是管道。如果您只想控制标准输出,这是最简单的选项:unflushed_program |慢速消费者。否则,您可以使用命名管道:
slow_consumer
很可能是您设计的一个程序,用于缓慢读取数据,或者仅读取X字节并停止。It is very easy to cause unflushed buffers by controlling the receiving side. The beauty of *nix systems is that everything looks like a file, so you can use special files to do what you want. The easiest option is a pipe. If you just want to control stdout, this is the simples option:
unflushed_program | slow_consumer
. Otherwise, you can use named pipes:slow_consumer
is most likely a program you design to read data slowly, or just read X bytes and stop.