无缓冲文件 I/O 的后果
使用无缓冲文件 I/O 将大量数据写入磁盘(至少对于操作系统级别以上的所有内容)会产生什么后果?
详细信息:
我正在编写一个 Ruby 脚本,该脚本将执行另一段代码,捕获其 stdout 和 stderr 并将它们写入文件。显然(至少在 Ruby 中),stderr 是无缓冲的,而 stdout 是缓冲的,在我的例子中,这会导致乱序输出,因为 stderr 行在某些 stdout 行之前打印。
看来解决方案是使这部分代码使用无缓冲 IO(使用 IO.sync = true )。但是,我的脚本正在运行的代码段也会将大量文本写入磁盘。所以我想知道不使用 Ruby 缓冲区(仅操作系统缓冲区及以下)的后果是什么,如果它很重要,我还能如何解决排序问题?
What are the consequences of writing large amounts of data to disk using unbuffered file I/O (at least for everything above the operating system level)?
Details:
I'm writing a Ruby script that will execute another piece of code, capturing its stdout and stderr and writing them to a file. Apparently (in Ruby, at least), stderr is unbuffered and stdout is buffered, which in my case results in out-of-order output as the stderr lines get printed before some stdout lines.
It seems the solution is to make this portion of the code use unbuffered IO (with IO.sync = true
). However, the piece of code my script is running will also be writing large amounts of text to disk. So I'm wondering what the consequence is of not using the Ruby buffer (only the OS buffer and below), and if it is significant, how else I can get around the ordering problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当写入操作计数较小时,无缓冲 I/O 比缓冲 I/O 慢,而对于大量写入操作,情况则相反。在每个操作大约 1,000 到 10,000 字节的中间范围内,没有太大区别。
当操作一致时,您还会看到稍微更好的性能
Unbuffered I/O is slower than buffered I/O when write operations have small counts and the situation is reversed for large count write operations. In a middle range around 1,000 to 10,000 bytes per operation it doesn't make much difference.
You will also see slightly better performance when operations are aligned
IO.sync 的作用是切换缓冲区的自动刷新,但不会改变缓冲区仍在缓冲的事实。
您可能想要的是完全绕过缓冲系统并使用 IO#syswrite 代替
:文档说,您应该选择缓冲或无缓冲并坚持使用,因为混合和匹配可能会导致问题。
What
IO.sync
does is toggle automatic flushing of the buffer, but doesn't change the fact that it is still buffered.What you might want instead is to bypass the buffering system altogether and use IO#syswrite instead:
As the documentation says, you should pick either buffered or unbuffered and stick with it, as mixing and matching can cause issues.