Win32:写入文件而不缓冲?

发布于 2024-07-10 01:10:04 字数 2075 浏览 7 评论 0原文

我需要创建一个新的文件句柄,以便对该句柄的任何写入操作立即写入磁盘。

额外信息:句柄将是子进程继承的 STDOUT,因此我需要该进程的任何输出立即写入磁盘。

研究CreateFile文档,FILE_FLAG_WRITE_THROUGH标志看起来正是我所需要的:

写操作不会通过 任何中间缓存,它们都会去 直接写入磁盘。

我编写了一个非常基本的测试程序,但是它不起作用。 我在 CreateFile 上使用了该标志,然后在一个长循环中使用了 WriteFile(myHandle,...),在大约 15 秒内写入了大约 100MB 的数据。 (我添加了一些Sleep())。

然后,我设置了一个专业的监控环境,其中包括在资源管理器中连续按“F5”。 结果:文件保持在 0kB,然后在测试程序结束时跳到 100MB。

接下来我尝试的是在每次写入后使用 FlushFileBuffers(myHandle) 手动刷新文件。 这使得观察到的文件大小如预期般稳定增长。

那么我的问题是,FILE_FLAG_WRITE_THROUGH 不应该在手动刷新文件的情况下完成此操作吗? 我错过了什么吗? 在“现实世界”程序中,我无法刷新文件,因为我无法控制使用它的子进程。

还有 FILE_FLAG_NO_BUFFERING 标志,出于同样的原因我不能使用该标志 - 无法控制使用该句柄的进程,因此我无法按照该标志的要求手动对齐写入。

编辑: 我专门制作了一个单独的项目来观察文件大小的变化。 它使用 .NET FileSystemWatcher 类。 我还写了更少的数据 - 总共大约 100kB。

这是输出。 查看时间戳中的秒数。

“内置无缓冲区”版本:

25.11.2008 7:03:22 PM: 10230 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10200 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10190 bytes added.

...和“强制(手动)刷新”版本(FlushFileBuffers() 每约 2.5 秒调用一次):

25.11.2008 7:06:10 PM: 10230 bytes added.
25.11.2008 7:06:12 PM: 10230 bytes added.
25.11.2008 7:06:15 PM: 10230 bytes added.
25.11.2008 7:06:17 PM: 10230 bytes added.
25.11.2008 7:06:19 PM: 10230 bytes added.
25.11.2008 7:06:21 PM: 10230 bytes added.
25.11.2008 7:06:23 PM: 10230 bytes added.
25.11.2008 7:06:25 PM: 10230 bytes added.
25.11.2008 7:06:27 PM: 10230 bytes added.
25.11.2008 7:06:29 PM: 10230 bytes added.

I need to create a new file handle so that any write operations to that handle get written to disk immediately.

Extra info: The handle will be the inherited STDOUT of a child process, so I need any output from that process to immediately be written to disk.

Studying the CreateFile documentation, the FILE_FLAG_WRITE_THROUGH flag looked like exactly what I need:

Write operations will not go through
any intermediate cache, they will go
directly to disk.

I wrote a very basic test program and, well, it's not working.
I used the flag on CreateFile then used WriteFile(myHandle,...) in a long loop, writing about 100MB of data in about 15 seconds. (I added some Sleep()'s).

I then set up a professional monitoring environment consisting of continuously hitting 'F5' in explorer. The results: the file stays at 0kB then jumps to 100MB about the time the test program ends.

Next thing I tried was to manually flush the file after each write, with FlushFileBuffers(myHandle). This makes the observed file size grow nice and steady, as expected.

My question is, then, shouldn't the FILE_FLAG_WRITE_THROUGH have done this without manually flushing the file? Am I missing something?
In the 'real world' program, I can't flush the file, 'cause I don't have any control over the child process that's using it.

There's also the FILE_FLAG_NO_BUFFERING flag, that I can't be used for the same reason - no control over the process that's using the handle, so I can't manually align the writes as required by this flag.

EDIT:
I have made a separate project specifically for watching how the size of the file changes. It uses the .NET FileSystemWatcher class. I also write less data - around 100kB in total.

Here's the output. Check out the seconds in the timestamps.

The 'builtin no-buffers' version:

25.11.2008 7:03:22 PM: 10230 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10200 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10190 bytes added.

... and the 'forced (manual) flush' version (FlushFileBuffers() is called every ~2.5 seconds):

25.11.2008 7:06:10 PM: 10230 bytes added.
25.11.2008 7:06:12 PM: 10230 bytes added.
25.11.2008 7:06:15 PM: 10230 bytes added.
25.11.2008 7:06:17 PM: 10230 bytes added.
25.11.2008 7:06:19 PM: 10230 bytes added.
25.11.2008 7:06:21 PM: 10230 bytes added.
25.11.2008 7:06:23 PM: 10230 bytes added.
25.11.2008 7:06:25 PM: 10230 bytes added.
25.11.2008 7:06:27 PM: 10230 bytes added.
25.11.2008 7:06:29 PM: 10230 bytes added.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

栖迟 2024-07-17 01:10:04

在崩溃日志记录的背景下,我也被这个问题困扰了。

FILE_FLAG_WRITE_THROUGH 仅保证您要发送的数据在 WriteFile 返回之前发送到文件系统; 它不保证它确实发送到物理设备。 因此,例如,如果您在带有此标志的句柄上执行 WriteFile 之后执行 ReadFile,则可以保证读取将返回您写入的字节,无论它是从文件系统缓存或底层设备获取数据。

如果您想保证数据已写入设备,那么您需要FILE_FLAG_NO_BUFFERING,以及所有随之而来的额外工作。 例如,这些写入必须对齐,因为缓冲区在返回之前会一直到达设备驱动程序。

知识库中有一篇关于差异的简洁但信息丰富的文章

在您的情况下,如果父进程的寿命比子进程的寿命长,那么您可以:

  1. 使用 CreatePipe API 创建可继承的匿名管道。
  2. 使用CreateFile 创建一个设置了FILE_FLAG_NO_BUFFERING 的文件。
  3. 向子级提供管道的可写句柄作为其 STDOUT。
  4. 在父进程中,从管道的可读句柄读取到对齐的缓冲区,并将它们写入文件。

I've been bitten by this, too, in the context of crash logging.

FILE_FLAG_WRITE_THROUGH only guarantees that the data you're sending gets sent to the filesystem before WriteFile returns; it doesn't guarantee that it's actually sent to the physical device. So, for example, if you execute a ReadFile after a WriteFile on a handle with this flag, you're guaranteed that the read will return the bytes you wrote, whether it got the data from the filesystem cache or from the underlying device.

If you want to guarantee that the data has been written to the device, then you need FILE_FLAG_NO_BUFFERING, with all the attendant extra work. Those writes have to be aligned, for example, because the buffer is going all the way down to the device driver before returning.

The Knowledge Base has a terse but informative article on the difference.

In your case, if the parent process is going to outlive the child, then you can:

  1. Use the CreatePipe API to create an inheritable, anonymous pipe.
  2. Use CreateFile to create a file with FILE_FLAG_NO_BUFFERING set.
  3. Provide the writable handle of the pipe to the child as its STDOUT.
  4. In the parent process, read from the readable handle of the pipe into aligned buffers, and write them to the file.
℉服软 2024-07-17 01:10:04

这是一个老问题,但我想我可以补充一点。 事实上,我认为这里的每个人都错了。 当您使用 write-through 和 unbuffered-io 写入流时,它会写入磁盘,但不会更新与文件系统关联的元数据(例如资源管理器向您显示的内容)。

您可以在这里找到有关此类内容的良好参考http://winntfs.com/2012/11/29/windows-write-caching-part-2-an-overview-for-application-developers/

干杯,

格雷格

This is an old question but I thought I might add a bit to it. Actually everyone here I believe is wrong. When you write to a stream with write-through and unbuffered-io it does write to the disk but it does NOT update the metadata associated with the File System (eg what explorer shows you).

You can find a good reference on this kind of stuff here http://winntfs.com/2012/11/29/windows-write-caching-part-2-an-overview-for-application-developers/

Cheers,

Greg

信愁 2024-07-17 01:10:04

也许您对 FlushFileBuffers

刷新指定文件的缓冲区并使所有缓冲数据写入文件。

通常是WriteFile WriteFileEx 函数将数据写入内部缓冲区,操作系统定期将数据写入磁盘或通信管道。 FlushFileBuffers函数将指定文件的所有缓冲信息写入设备或管道。

他们确实警告说,调用 flush 来大量刷新缓冲区是低效的 - 最好禁用缓存(即 Tim 的 答案):

由于系统内的磁盘缓存交互,当多次写入单独执行时,每次写入磁盘驱动器设备后使用 FlushFileBuffers 函数可能效率低下。 如果应用程序正在对磁盘执行多次写入,并且还需要确保将关键数据写入持久介质,则应用程序应使用无缓冲 I/O,而不是频繁调用 FlushFileBuffers。 要打开无缓冲 I/O 的文件,请调用 CreateFile 函数。 这可以防止文件内容被缓存,并在每次写入时将元数据刷新到磁盘。 有关详细信息,请参阅CreateFile

如果不是高性能情况,并且您不会过于频繁地刷新,那么 FlushFileBuffers 可能就足够了(而且更容易)。

Perhaps you could be satisfied enough with FlushFileBuffers:

Flushes the buffers of a specified file and causes all buffered data to be written to a file.

Typically the WriteFile and WriteFileEx functions write data to an internal buffer that the operating system writes to a disk or communication pipe on a regular basis. The FlushFileBuffers function writes all the buffered information for a specified file to the device or pipe.

They do warn that calling flush, to flush the buffers a lot, is inefficient - and it's better to just disable caching (i.e. Tim's answer):

Due to disk caching interactions within the system, the FlushFileBuffers function can be inefficient when used after every write to a disk drive device when many writes are being performed separately. If an application is performing multiple writes to disk and also needs to ensure critical data is written to persistent media, the application should use unbuffered I/O instead of frequently calling FlushFileBuffers. To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.

If it's not a high-performance situation, and you won't be flushing too frequently, then FlushFileBuffers might be sufficient (and easier).

扶醉桌前 2024-07-17 01:10:04

您在资源管理器中查看的大小可能与文件系统对文件的了解不完全同步,因此这不是测量它的最佳方法。 恰好 FlushFileBuffers 会导致文件系统更新 Explorer 正在查看的信息; 关闭它并重新打开可能最终也会做同样的事情。

除了其他人提到的磁盘缓存问题之外,直写正在做您希望它做的事情。 只是在目录中执行“dir”可能不会显示最新信息。

表明直写仅将其写入“文件系统”的答案并不完全正确。 它确实将其写入文件系统缓存,但也将数据发送到磁盘。 直写可能意味着从缓存中满足后续读取,但这并不意味着我们跳过了一个步骤并且不将其写入磁盘。 请仔细阅读文章摘要。 这对几乎每个人来说都是一个令人困惑的地方。

The size you're looking at in Explorer may not be entirely in-sync with what the file system knows about the file, so this isn't the best way to measure it. It just so happens that FlushFileBuffers will cause the file system to update the information that Explorer is looking at; closing it and reopening may end up doing the same thing as well.

Aside from the disk caching issues mentioned by others, write through is doing what you were hoping it is doing. It's just that doing a 'dir' in the directory may not show up-to-date information.

Answers suggesting that write-through only writes it "to the file system" are not quite right. It does write it into the file system cache, but it also sends the data down to the disk. Write-through might mean that a subsequent read is satisfied from the cache, but it doesn't mean that we skipped a step and aren't writing it to the disk. Read the article's summary very carefully. This is a confusing bit for just about everyone.

习惯成性 2024-07-17 01:10:04

也许您想考虑该文件的内存映射。 一旦写入内存映射区域,文件就会更新。

Win API 文件映射

Perhaps you wanna consider memory mapping that file. As soon as you write to the memory mapped region, the file gets updated.

Win API File Mapping

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文