确保在外部进程中创建文件时刷新文件(Win32)
关于将文件活动刷新到磁盘的 Windows Win32 C++ 问题。
我有一个外部应用程序(使用 CreateProcess 运行)来创建一些文件。即,当它返回时,它将创建一个包含某些内容的文件。
在继续之前,如何确保进程创建的文件确实已刷新到磁盘?
我的意思不是 C++ 缓冲区,而是真正的刷新磁盘(例如 FlushFileBuffers)。
请记住,我无权访问任何文件句柄 - 这当然都隐藏在外部进程中。
我想我可以打开我自己的文件句柄,然后使用 FlushFileBuffers,但不清楚这是否有效(因为我的句柄实际上不包含任何需要刷新的内容)。
最后,我希望它在非管理员用户空间中运行,因此我无法在整个卷上使用 FlushFileBuffers。
有什么想法吗?
更新:为什么我认为这是一个问题?
我正在开发一个数据备份应用程序。本质上它必须按照描述创建一些文件。然后它必须更新其内部数据库(使用 SQLite 嵌入式数据库)。
我最近遇到了蓝屏期间发生的数据损坏问题(其原因与我的应用程序无关)。
我担心的是系统崩溃期间应用程序的完整性。是的,我确实关心这个,因为这个应用程序是一个数据备份应用程序。
我关心的用例是这样的:
- 使用外部进程创建一个小数据文件。此写入正在操作系统缓存中等待写入磁盘。
- 我更新数据库并提交。这是磁盘活动。此写入也在操作系统缓存中等待。
- 发生系统故障。
在我看来,我们现在处于潜在的竞争状态。如果“1”被刷新而“2”没有被刷新,那么我们就没有问题(因为数据库事务当时没有提交)。如果两者都没有被冲掉或者两者都被冲掉那么我们也可以。
据我了解,写入将是不确定的。即,我不知道操作系统会保证在“2”之前写入“1”。 (我错了吗?)
所以,如果“2”被刷新,但“1”没有被刷新,那么我们就有问题了。
我观察到数据库已正确更新,但文件中有垃圾:数据的最后三分之二是二进制“零”。现在,我不知道当你在蓝屏时刷新文件部分时会是什么样子,但如果它看起来像那样,我不会感到惊讶。
我能保证这就是原因吗?不,我不能保证这一点。我只是猜测。可能只是由于磁盘故障或蓝屏导致文件“自然”损坏。
关于性能,这是我相信我可以处理的。
例如,SQLite 的默认行为是每次提交事务时执行完整文件刷新(使用 FlushFileBuffers)。他们非常清楚,如果您不这样做,那么在系统崩溃时,您的数据库可能会损坏。
另外,我相信我可以通过仅在“检查点”刷新来减轻性能损失。例如,写入 50 个文件,刷新批次,然后写入数据库。
这一切成为问题的可能性有多大?打败了我。但我的应用程序很可能在系统故障时或前后进行存档,因此您认为更有可能。
希望这能解释为什么我不想这样做。
Windows Win32 C++ question about flushing file activity to disk.
I have an external application (ran using CreateProcess) which does some file creation. i.e., when it returns it will have created a file with some content.
How can I ensure that the file the process created was really flushed to disk, before I proceed?
By this I mean not the C++ buffers but really flushing disk (e.g. FlushFileBuffers).
Remember that I don't have access to any file HANDLE - this is all of course hidden inside the external process.
I guess I could open up a handle of my own to the file and then use FlushFileBuffers, but it's not clear this would work (since my handle doesn't actually contain anything which needs flushing).
Finally, I want this to run in non-admin userspace so I cannot use FlushFileBuffers on a whole volume.
Any ideas?
UPDATE: Why do I think this is a problem?
I'm working on a data backup application. Essentially it has to create some files as described. It then has to update it's internal DB (using SQLite embedded DB).
I recently had a data corruption issue which occurred during a bluescreen (the cause of which was unrelated to my app).
What I'm concerned about is application integrity during a system crash. And yes, I do care about this because this app is a data backup app.
The use case I'm concerned about is this:
- A small data file is created using external process. This write is waiting in the OS cache to be written to disk.
- I update the DB and commit. This is a disk activity. This write is also waiting in the OS cache.
- A system failure occurs.
As I see it, we're now in a potential race condition. If "1" gets flushed and "2" doesn't then we're fine (as the DB transact wasn't then committed). If neither gets flushed or both get flushed then we're also OK.
As I understand it, the writes will be non-deterministic. i.e., I'm not aware that the OS will guarantee to write "1" before "2". (Am I wrong?)
So, if "2" gets flushed, but "1" doesn't then we have a problem.
What I observed was that the DB was correctly updated, but that the file had garbage in: the last 2 thirds of the data was binary "zeroes". Now, I don't know what it looks like when you have a file part flushed at the time of bluescreen, but I wouldn't be surprised if it looked like that.
Can I guarantee this is the cause? No I cannot guarantee this. I'm just speculating. It could just be that the file was "naturally" corrupted due to disk failure or as a result of the blue screen.
With regards to performance, this is something I believe I can deal with.
For example, the default behaviour of SQLite is to do a full file flush (using FlushFileBuffers) every time you commit a transaction. They are quite clear that if you don't do this then at the time of system crash, you might have a corrupted DB.
Also, I believe I can mitigate the performance hit by only flushing at "checkpoints". For example, writing 50 files, flushing the lot and then writing to the DB.
How likely is all this to be a problem? Beats me. But then my app might well be archiving at or around the time of system failure so it might be more likely that you think.
Hope that explains why I wan't to do this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你为什么想要这个?操作系统将确保数据及时刷新到磁盘。如果您访问它,它将从缓存或磁盘返回数据,因此这对您来说是透明的。
如果发生灾难时需要一些安全性,则必须调用 FlushFileBuffers,例如在运行外部进程后创建一个具有管理员权限的进程。但这会严重影响整机的性能。
您唯一的其他选择是修改其他进程的源。
[编辑] 最简单的解决方案可能是在进程中复制文件,然后刷新副本(因为您有句柄)。将副本保存在“未在数据库中提交”的名称下。
然后更新数据库。写入数据库,“从文件更新......”。如果下次该条目已存在,则不要更新数据库并跳过此步骤。
将数据库刷新到磁盘。
将文件重命名为“文件已处理到数据库中”。重命名是一个原子操作(因此它要么发生,要么不发生)。
如果您无法为不同状态想出一个好的文件名,则可以使用子文件夹并在它们之间移动文件。
Why would you want this? The OS will make sure that the data is flushed to the disk in due time. If you access it, it will either return the data from the cache or from disk, so this is transparent for you.
If you need some safety in case of disaster, then you must call
FlushFileBuffers
, for example by creating a process with admin rights after running the external process. But that can severely impact the performance of the whole machine.Your only other option is to modify the source of the other process.
[EDIT] The most simple solution is probably to copy the file in your process and then flush the copy (since you have the handle). Save the copy under a name which says "not committed in the database".
Then update the database. Write into the database, "updated from file ...". If this entry already exists next time, don't update the database and skip this step.
Flush the database to disk.
Rename the file to "file has been processed into database". Rename is an atomic operation (so it either happens or not).
If you can't think of a good filename for the different states, then use subfolders and move the file between them.
嗯,这里没有有吸引力的选择。没有记录的方法可以从进程中检索所需的文件句柄。尽管有未记录的,但请谨慎考虑才去那里(通过DuplicateHandle) 。
是的,在卷句柄上调用 FlushFileBuffers 是记录的方式。您可以通过让服务进行调用来避免权限问题。使用标准流程互操作机制之一从您的应用程序与其进行对话。名称以 Global\ 为前缀的命名管道可能是最简单的方法。
Well, there are no attractive options here. There is no documented way to retrieve the file handle you need from the process. Although there are undocumented ones, go there (via DuplicateHandle) only with careful consideration.
Yes, calling FlushFileBuffers on a volume handle is the documented way. You can avoid the privilege problem by letting a service make the call. Talk to it from your app with one of the standard process interop mechanisms. A named pipe whose name is prefixed with Global\ is probably the easiest way to get that going.
更新后,我认为 http://sqlite.org/atomiccommit.html 为您提供了所需的答案。
SQLite 确保所有内容都刷新到磁盘的方式有效。所以它也适合你——看看源代码。
After your update I think http://sqlite.org/atomiccommit.html gives you the answers you need.
The way SQLite ensures that everything is flushed to disc works. So it works for you as well - take a look at the source.