无缓冲文件 I/O 的后果

发布于 2024-11-24 04:06:13 字数 374 浏览 1 评论 0原文

使用无缓冲文件 I/O 将大量数据写入磁盘(至少对于操作系统级别以上的所有内容)会产生什么后果?

详细信息

我正在编写一个 Ruby 脚本,该脚本将执行另一段代码,捕获其 stdout 和 stderr 并将它们写入文件。显然(至少在 Ruby 中),stderr 是无缓冲的,而 stdout 是缓冲的,在我的例子中,这会导致乱序输出,因为 stderr 行在某些 stdout 行之前打印。

看来解决方案是使这部分代码使用无缓冲 IO(使用 IO.sync = true )。但是,我的脚本正在运行的代码段也会将大量文本写入磁盘。所以我想知道不使用 Ruby 缓冲区(仅操作系统缓冲区及以下)的后果是什么,如果它很重要,我还能如何解决排序问题?

What are the consequences of writing large amounts of data to disk using unbuffered file I/O (at least for everything above the operating system level)?

Details:

I'm writing a Ruby script that will execute another piece of code, capturing its stdout and stderr and writing them to a file. Apparently (in Ruby, at least), stderr is unbuffered and stdout is buffered, which in my case results in out-of-order output as the stderr lines get printed before some stdout lines.

It seems the solution is to make this portion of the code use unbuffered IO (with IO.sync = true). However, the piece of code my script is running will also be writing large amounts of text to disk. So I'm wondering what the consequence is of not using the Ruby buffer (only the OS buffer and below), and if it is significant, how else I can get around the ordering problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

橪书 2024-12-01 04:06:14

当写入操作计数较小时,无缓冲 I/O 比缓冲 I/O 慢,而对于大量写入操作,情况则相反。在每个操作大约 1,000 到 10,000 字节的中间范围内,没有太大区别。

当操作一致时,您还会看到稍微更好的性能

Unbuffered I/O is slower than buffered I/O when write operations have small counts and the situation is reversed for large count write operations. In a middle range around 1,000 to 10,000 bytes per operation it doesn't make much difference.

You will also see slightly better performance when operations are aligned

情何以堪。 2024-12-01 04:06:14

IO.sync 的作用是切换缓冲区的自动刷新,但不会改变缓冲区仍在缓冲的事实。

您可能想要的是完全绕过缓冲系统并使用 IO#syswrite 代替

STDERR.syswrite("Look ma, no buffers")

:文档说,您应该选择缓冲或无缓冲并坚持使用,因为混合和匹配可能会导致问题。

What IO.sync does is toggle automatic flushing of the buffer, but doesn't change the fact that it is still buffered.

What you might want instead is to bypass the buffering system altogether and use IO#syswrite instead:

STDERR.syswrite("Look ma, no buffers")

As the documentation says, you should pick either buffered or unbuffered and stick with it, as mixing and matching can cause issues.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文