操作系统缓冲文件有害吗?
我为我的同事写了一个下载库。它将下载的数据写入文件。
我的同事发现,即使下载了 100 Mb 的数据,该文件在很长一段时间内仍然很小。
因此他们建议我应该在每次 write()
之后调用 flush()
,这样就不会占用内存来缓冲这些数据。
但我不认为 100 Mb 的虚拟内存很多,并且认为 Windows 缓冲这么多数据也许有其原因。
你对此有何看法?
I wrote a download library for my colleague. It writes downloaded data to files.
My colleagues found that the file stays small for a long time, even if 100 Mb data have been downloaded.
So they suggest that I should call flush()
after every write()
so it will not take up memory to buffer these data.
But I don't think 100 Mb of virtual memory is a lot and think maybe windows has its reason to buffer so much data.
What do you think about it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我相信操作系统能够根据个人情况进行适当的自我调整。
至于“立即刷新,以免在断电时丢失数据” - 如果在文件中途断电,您会相信您写入的数据没问题并从那里恢复下载吗?如果是这样,也许值得尽早刷新 - 但我会权衡恢复的复杂性与电源故障的相对罕见性,并在阅读完所有内容后关闭文件。如果您看到文件写了一半,请将其删除并从头开始重新下载。
I would trust the operating system to tune itself appropriately, personally.
As for "flush immediately so as not to lose data if power dies" - if the power dies half way through a file, would you trust that the data you'd written was okay and resume the download from there? If so, maybe it's worth flushing early - but I'd weigh the complexity of resuming against the relative rarity of power failures, and just close the file when I'd read everything. If you see a half written file, delete it and download it again from scratch.
好吧,首先你应该调查/调试正在发生的事情。问题可能出在其他地方;例如,Windows 资源管理器可能无法足够快地刷新文件大小。
也就是说,你是对的,一般来说,如果操作系统的虚拟机系统决定在 RAM 中缓冲内容,它就有充分的理由这样做,你通常不应该干涉。毕竟,如果有大量可用内存,那么使用它是有意义的。
Well, first you should investigate / debug what is going on. The problem might be elsewhere; for example Windows Explorer might not refresh the file size fast enough.
That said, you are right, generally if the VM system of the OS decides to buffer stuff in RAM, it has a good reason to do so, and you should not normally interfere. If there is a lot of free memory, it makes sense to use it, after all.
如果是我,我希望确保所有数据尽快保存到非易失性位置。我肯定会刷新流,以确保在断电时不会丢失任何东西。您没有指定稍后是否需要访问数据,但我假设有,否则您为什么要存储它?
不过,要回答最初的问题 - 它对操作系统并不“有害”,但您确实有丢失数据的风险。
If it was me, I'd want to ensure that all data was persisted to a non-volatile location as soon as possible. I'd definitely flush the streams to make sure I didn't lose anything in the event of a power failure. You didn't specify if there was a need to access the data later on, but I assume there is, otherwise why would you want to store it?
To answer the original question, though - it isn't "harmful" to the OS, but you do risk losing data.
以某些特定的间隔/大小/行刷新可能比每次写入都刷新更好。它有助于减少内存占用,并确保定期更新实际文件的信息。例如,您可以每 100 行刷新一次。
Flushing at some specific intervals/sizes/lines might be good rather than flushing for every write. It helps to reduce memory footprint and also make sure the actual file is updated with information periodically. For example, you could flush at every 100 lines.
如果有一种方法可以减少内存需求而对性能影响可以忽略不计,那么我更喜欢不那么贪婪的版本。我可能需要这些内存来处理更重要的事情,而 100Mb 的占用空间对于下载程序来说是相当大的。
If there is a means to reduce the memory requirements with negligible performance impact, I'd prefer a less greedy version. I might need that memory for something more important, and 100Mb footprint is pretty huge for a downloader.