file.flush() 到底在做什么?
我在 Python 文件对象文档中找到了这一点:
flush()
不一定将文件的数据写入磁盘。使用flush()
后跟os .fsync()
以确保此行为。
所以我的问题是:Python 的 flush
到底在做什么?我以为它会强制将数据写入磁盘,但现在我发现它没有。为什么?
I found this in the Python documentation for File Objects:
flush()
does not necessarily write the file’s data to disk. Useflush()
followed byos.fsync()
to ensure this behavior.
So my question is: what exactly is Python's flush
doing? I thought that it forces to write data to the disk, but now I see that it doesn't. Why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
通常涉及两个级别的缓冲:
内部缓冲区是由您正在编程的运行时/库/语言创建的缓冲区,旨在通过避免每次写入的系统调用来加快速度。相反,当您写入文件对象时,您会写入其缓冲区,每当缓冲区填满时,数据就会使用系统调用写入实际文件。
但是,由于操作系统缓冲区的原因,这可能并不意味着数据已写入磁盘。这可能只是意味着数据从运行时维护的缓冲区复制到操作系统维护的缓冲区中。
如果你写了一些东西,它最终(仅)出现在缓冲区中,并且机器电源被切断,那么当机器关闭时,该数据不会在磁盘上。
因此,为了帮助解决这个问题,您可以在各自的对象上使用
flush
和fsync
方法。第一个,
flush
,将简单地将程序缓冲区中滞留的任何数据写出到实际文件中。通常,这意味着数据将从程序缓冲区复制到操作系统缓冲区。具体来说,这意味着如果另一个进程打开相同的文件进行读取,它将能够访问您刚刚刷新到该文件的数据。然而,这并不一定意味着它已经“永久”存储在磁盘上。
为此,您需要调用 os.fsync 方法,该方法确保所有操作系统缓冲区与其所属的存储设备同步,换句话说,该方法将从操作系统复制数据缓冲区到磁盘。
通常,您不需要费心使用这两种方法,但如果您对磁盘上实际最终的内容抱有偏执是一件好事,那么您应该按照指示进行这两种调用。
2018 年附录。
请注意,具有缓存机制的磁盘现在比 2013 年更加常见,因此现在涉及更多级别的缓存和缓冲区。我假设这些缓冲区也将由同步/刷新调用处理,但我真的不知道。
There's typically two levels of buffering involved:
The internal buffers are buffers created by the runtime/library/language that you're programming against and is meant to speed things up by avoiding system calls for every write. Instead, when you write to a file object, you write into its buffer, and whenever the buffer fills up, the data is written to the actual file using system calls.
However, due to the operating system buffers, this might not mean that the data is written to disk. It may just mean that the data is copied from the buffers maintained by your runtime into the buffers maintained by the operating system.
If you write something, and it ends up in the buffer (only), and the power is cut to your machine, that data is not on disk when the machine turns off.
So, in order to help with that you have the
flush
andfsync
methods, on their respective objects.The first,
flush
, will simply write out any data that lingers in a program buffer to the actual file. Typically this means that the data will be copied from the program buffer to the operating system buffer.Specifically what this means is that if another process has that same file open for reading, it will be able to access the data you just flushed to the file. However, it does not necessarily mean it has been "permanently" stored on disk.
To do that, you need to call the
os.fsync
method which ensures all operating system buffers are synchronized with the storage devices they're for, in other words, that method will copy data from the operating system buffers to the disk.Typically you don't need to bother with either method, but if you're in a scenario where paranoia about what actually ends up on disk is a good thing, you should make both calls as instructed.
Addendum in 2018.
Note that disks with cache mechanisms is now much more common than back in 2013, so now there are even more levels of caching and buffers involved. I assume these buffers will be handled by the sync/flush calls as well, but I don't really know.
因为操作系统可能不会这样做。刷新操作迫使文件数据进入 RAM 中的文件缓存,然后操作系统的工作就是将其实际发送到磁盘。
Because the operating system may not do so. The flush operation forces the file data into the file cache in RAM, and from there it's the OS's job to actually send it to the disk.
它刷新内部缓冲区,这应该会导致操作系统将缓冲区写入文件。[1] Python 使用操作系统的默认缓冲,除非您进行其他配置。
但有时操作系统仍然选择不配合。尤其是 Windows/NTFS 中的写入延迟等美妙的事情。基本上内部缓冲区已被刷新,但操作系统缓冲区仍然保留着它。因此,在这些情况下,您必须告诉操作系统使用 os.fsync() 将其写入磁盘。
[1] http://docs.python.org/library/stdtypes.html
It flushes the internal buffer, which is supposed to cause the OS to write out the buffer to the file.[1] Python uses the OS's default buffering unless you configure it do otherwise.
But sometimes the OS still chooses not to cooperate. Especially with wonderful things like write-delays in Windows/NTFS. Basically the internal buffer is flushed, but the OS buffer is still holding on to it. So you have to tell the OS to write it to disk with
os.fsync()
in those cases.[1] http://docs.python.org/library/stdtypes.html
基本上,flush()会清理你的RAM缓冲区,它的真正威力是它可以让你在之后继续写入它——但它不应该被认为是最好/最安全的写入文件功能。它会刷新你的内存以获取更多数据,仅此而已。如果您想确保数据安全地写入文件,请改用 close() 。
Basically, flush() cleans out your RAM buffer, its real power is that it lets you continue to write to it afterwards - but it shouldn't be thought of as the best/safest write to file feature. It's flushing your RAM for more data to come, that is all. If you want to ensure data gets written to file safely then use close() instead.