当前位置：文江博客话题详情

确保在外部进程中创建文件时刷新文件（Win32）

发布于 2024-08-13 09:05:19 字数 1383 浏览 10 评论 0原文

关于将文件活动刷新到磁盘的 Windows Win32 C++ 问题。

我有一个外部应用程序（使用 CreateProcess 运行）来创建一些文件。即，当它返回时，它将创建一个包含某些内容的文件。

在继续之前，如何确保进程创建的文件确实已刷新到磁盘？

我的意思不是 C++ 缓冲区，而是真正的刷新磁盘（例如 FlushFileBuffers）。

请记住，我无权访问任何文件句柄 - 这当然都隐藏在外部进程中。

我想我可以打开我自己的文件句柄，然后使用 FlushFileBuffers，但不清楚这是否有效（因为我的句柄实际上不包含任何需要刷新的内容）。

最后，我希望它在非管理员用户空间中运行，因此我无法在整个卷上使用 FlushFileBuffers。

有什么想法吗？

更新：为什么我认为这是一个问题？

我正在开发一个数据备份应用程序。本质上它必须按照描述创建一些文件。然后它必须更新其内部数据库（使用 SQLite 嵌入式数据库）。

我最近遇到了蓝屏期间发生的数据损坏问题（其原因与我的应用程序无关）。

我担心的是系统崩溃期间应用程序的完整性。是的，我确实关心这个，因为这个应用程序是一个数据备份应用程序。

我关心的用例是这样的：

使用外部进程创建一个小数据文件。此写入正在操作系统缓存中等待写入磁盘。
我更新数据库并提交。这是磁盘活动。此写入也在操作系统缓存中等待。
发生系统故障。

在我看来，我们现在处于潜在的竞争状态。如果“1”被刷新而“2”没有被刷新，那么我们就没有问题（因为数据库事务当时没有提交）。如果两者都没有被冲掉或者两者都被冲掉那么我们也可以。

据我了解，写入将是不确定的。即，我不知道操作系统会保证在“2”之前写入“1”。（我错了吗？）

所以，如果“2”被刷新，但“1”没有被刷新，那么我们就有问题了。

我观察到数据库已正确更新，但文件中有垃圾：数据的最后三分之二是二进制“零”。现在，我不知道当你在蓝屏时刷新文件部分时会是什么样子，但如果它看起来像那样，我不会感到惊讶。

我能保证这就是原因吗？不，我不能保证这一点。我只是猜测。可能只是由于磁盘故障或蓝屏导致文件“自然”损坏。

关于性能，这是我相信我可以处理的。

例如，SQLite 的默认行为是每次提交事务时执行完整文件刷新（使用 FlushFileBuffers）。他们非常清楚，如果您不这样做，那么在系统崩溃时，您的数据库可能会损坏。

另外，我相信我可以通过仅在“检查点”刷新来减轻性能损失。例如，写入 50 个文件，刷新批次，然后写入数据库。

这一切成为问题的可能性有多大？打败了我。但我的应用程序很可能在系统故障时或前后进行存档，因此您认为更有可能。

希望这能解释为什么我不想这样做。

原文

Windows Win32 C++ question about flushing file activity to disk.

I have an external application (ran using CreateProcess) which does some file creation. i.e., when it returns it will have created a file with some content.

How can I ensure that the file the process created was really flushed to disk, before I proceed?

By this I mean not the C++ buffers but really flushing disk (e.g. FlushFileBuffers).

Remember that I don't have access to any file HANDLE - this is all of course hidden inside the external process.

I guess I could open up a handle of my own to the file and then use FlushFileBuffers, but it's not clear this would work (since my handle doesn't actually contain anything which needs flushing).

Finally, I want this to run in non-admin userspace so I cannot use FlushFileBuffers on a whole volume.

Any ideas?

UPDATE: Why do I think this is a problem?

I'm working on a data backup application. Essentially it has to create some files as described. It then has to update it's internal DB (using SQLite embedded DB).

I recently had a data corruption issue which occurred during a bluescreen (the cause of which was unrelated to my app).

What I'm concerned about is application integrity during a system crash. And yes, I do care about this because this app is a data backup app.

The use case I'm concerned about is this:

A small data file is created using external process. This write is waiting in the OS cache to be written to disk.
I update the DB and commit. This is a disk activity. This write is also waiting in the OS cache.
A system failure occurs.

As I see it, we're now in a potential race condition. If "1" gets flushed and "2" doesn't then we're fine (as the DB transact wasn't then committed). If neither gets flushed or both get flushed then we're also OK.

As I understand it, the writes will be non-deterministic. i.e., I'm not aware that the OS will guarantee to write "1" before "2". (Am I wrong?)

So, if "2" gets flushed, but "1" doesn't then we have a problem.

What I observed was that the DB was correctly updated, but that the file had garbage in: the last 2 thirds of the data was binary "zeroes". Now, I don't know what it looks like when you have a file part flushed at the time of bluescreen, but I wouldn't be surprised if it looked like that.

Can I guarantee this is the cause? No I cannot guarantee this. I'm just speculating. It could just be that the file was "naturally" corrupted due to disk failure or as a result of the blue screen.

With regards to performance, this is something I believe I can deal with.

For example, the default behaviour of SQLite is to do a full file flush (using FlushFileBuffers) every time you commit a transaction. They are quite clear that if you don't do this then at the time of system crash, you might have a corrupted DB.

Also, I believe I can mitigate the performance hit by only flushing at "checkpoints". For example, writing 50 files, flushing the lot and then writing to the DB.

How likely is all this to be a problem? Beats me. But then my app might well be archiving at or around the time of system failure so it might be more likely that you think.

Hope that explains why I wan't to do this.

分享到QQ

分享到微博