大量文件的原子删除

发布于 2024-08-08 10:41:23 字数 218 浏览 2 评论 0原文

我试图一次性删除 10000 多个文件,例如要么全部需要立即删除,要么全部保留在原处。

当然,显而易见的答案是将所有文件移动到临时目录中,并在成功时递归删除它,但这会使所需的 I/O 量增加一倍。

压缩不起作用,因为1)我不知道哪些文件需要删除,2)文件需要经常编辑。

有什么可以帮助降低 I/O 成本吗?任何平台都可以。

编辑:让我们假设随时可能发生停电。

I am trying to delete 10000+ files at once, atomically e.g. either all need to be deleted at once, or all need to stay in place.

Of course, the obvious answer is to move all the files into a temporary directory, and delete it recursively on success, but that doubles the amount of I/O required.

Compression doesn't work, because 1) I don't know which files will need to be deleted, and 2) the files need to be edited frequently.

Is there anything out there that can help reduce the I/O cost? Any platform will do.

EDIT: let's assume a power outage can happen anytime.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

诠释孤独 2024-08-15 10:41:23

Kibbee 是正确的:您正在寻找交易。但是,如果您不愿意,则不必依赖数据库或特殊文件系统功能。事务的本质是这样的:

  1. 将一条记录写入一个特殊文件(通常称为“日志”),其中列出了要删除的文件。
  2. 安全写入此记录后,请确保您的应用程序的行为就像文件实际上已被删除一样。
  3. 稍后,开始删除交易记录中指定的文件。
  4. 删除所有文件后,删除交易记录。

请注意,在步骤 (1) 之后的任何时间,您都可以重新启动应用程序,它将继续删除逻辑删除的文件,直到它们最终全部消失。

请注意,您不应该走得太远:否则您将开始重新实现真正的交易系统。但是,如果您只需要很少的简单事务,那么自行部署的方法可能是可以接受的。

Kibbee is correct: you're looking for a transaction. However, you needn't depend on either databases or special file system features if you don't want to. The essence of a transaction is this:

  1. Write out a record to a special file (often called the "log") that lists the files you are going to remove.
  2. Once this record is safely written, make sure your application acts just as if the files have actually been removed.
  3. Later on, start removing the files named in the transaction record.
  4. After all files are removed, delete the transaction record.

Note that, any time after step (1), you can restart your application and it will continue removing the logically deleted files until they're finally all gone.

Please note that you shouldn't pursue this path very far: otherwise you're starting to reimplement a real transaction system. However, if you only need a very few simple transactions, the roll-your-own approach might be acceptable.

裸钻 2024-08-15 10:41:23

在 *nix 上,在单个文件系统中移动文件是一种成本非常低的操作,它的工作原理是建立到新名称的硬链接,然后取消原始文件的链接。它甚至不会更改任何文件时间。

如果您可以将文件移动到单个目录中,那么您可以重命名该目录以使其成为真正的原子操作,然后以较慢的非原子方式删除文件(和目录)。

您确定您不只是想要一个数据库吗?它们都内置了事务提交和回滚。

On *nix, moving files within a single filesystem is a very low cost operation, it works by making a hard link to the new name and then unlinking the original file. It doesn't even change any of the file times.

If you could move the files into a single directory, then you could rename that directory to get it out of the way as a truly atomic op, and then delete the files (and directory) later in a slower, non-atomic fashion.

Are you sure you don't just want a database? They all have transaction commit and rollback built-in.

等风来 2024-08-15 10:41:23

我认为您真正需要的是进行交易的能力。由于光盘一次只能写入一个扇区,因此一次只能删除一个文件。您需要的是,如果其中一项删除未成功,则能够回滚之前的删除。像这样的任务通常是为数据库保留的。您的文件系统是否可以执行事务取决于您使用的文件系统和操作系统。 Windows Vista 上的 NTFS 支持事务性 NTFS。我不太确定它是如何工作的,但它可能很有用。

另外,还有一种用于 Windows 的卷影复制,在 Linux 世界中称为 < a href="http://tldp.org/HOWTO/LVM-HOWTO/snapshots_backup.html" rel="nofollow noreferrer">LVM 快照。基本上它是光盘在某个时间点的快照。您可以在执行删除之前直接拍摄快照,如果删除不成功,请将文件从快照中复制回来。我已经在 VBScript 中使用 WMI 创建了卷影副本,我确信 C/C++ 也存在类似的 api。

关于卷影复制和 LVM 快照的一件事。对整个分区的工作。所以你不能只拍摄单个目录的快照。然而,拍摄整个磁盘的快照只需要几秒钟。所以你会拍一张快照。删除文件,如果不成功,则将文件从快照复制回。这会很慢,但根据您计划回滚的频率,这可能是可以接受的。另一个想法是恢复整个快照。这可能好,也可能不好,因为它会回滚整个磁盘上的所有更改。如果您的操作系统或其他重要文件位于那里,那就不好了。如果该分区仅包含您要删除的文件,则恢复整个快照可能会更容易、更快捷。

I think what you are really looking for is the ability to have a transaction. Because the disc can only write one sector at a time, you can only delete the files one at a time. What you need is the ability to roll back the previous deletions if one of the deletes doesn't happen successfully. Tasks like this are usually reserved for databases. Whether or not your file system can do transactions depends on which file system and OS you are using. NTFS on Windows Vista supports Transactional NTFS. I'm not too sure on how it works, but it could be useful.

Also, there is something called shadow copy for Windows, which in the Linux world is called an LVM Snapshot. Basically it's a snapshot of the disc at a point in time. You could take a snapshot directly before doing the delete, and on the chance that it isn't successfully, copy the files back out of the snapshot. I've created shadow copies using WMI in VBScript, I'm sure that similar apis exist for C/C++ also.

One thing about Shadow Copy and LVM Snapsots. The work on the whole partition. So you can't take a snapshot of just a single directory. However, taking a snapshot of the whole disk takes only a couple seconds. So you would take a snapshot. Delete the files, and then if unsucessful, copy the files back out of the snapshot. This would be slow, but depending on how often you plan to roll back, it might be acceptable. The other idea would be to restore the entire snapshot. This may or may not be good as it would roll back all changes on the entire disk. Not good if your OS or other important files are located there. If this partition only contains the files you want to delete, recovering the entire snapshot may be easier and quicker.

网名女生简单气质 2024-08-15 10:41:23

不要移动文件,而是在临时目录中创建符号链接。然后,如果一切正常,请删除文件。或者,只需在某处列出文件列表,然后将其删除。

Instead of moving the files, make symbolic links into the temporary directory. Then if things are OK, delete the files. Or, just make a list of the files somewhere and then delete them.

舂唻埖巳落 2024-08-15 10:41:23

难道您不能构建要删除的路径名列表,将此列表写入文件to_be_deleted.log,确保该文件已到达磁盘(fsync()) ),然后开始进行删除。完成所有删除后,删除 to_be_deleted.log 事务日志。

当您启动应用程序时,它应该检查 to_be_deleted.log 是否存在,如果存在,则重播该文件中的删除操作(忽略“不存在”错误)。

Couldn't you just build the list of pathnames to delete, write this list out to a file to_be_deleted.log, make sure that file has hit the disk (fsync()), then start doing the deletes. After all the deletes have been done, remove the to_be_deleted.log transaction log.

When you start up the application, it should check for the existence of to_be_deleted.log, and if it's there, replay the deletes in that file (ignoring "does not exist" errors).

牛↙奶布丁 2024-08-15 10:41:23

您的问题的基本答案是“否”。更复杂的答案是,这需要文件系统的支持,而很少有文件系统具有这种支持。显然 NT 有一个事务性 FS 确实支持这一点。适用于 Linux 的 BtrFS 也可能支持这一点。

在没有直接支持的情况下,我认为硬链接、移动、删除选项是最好的选择。

The basic answer to your question is "No.". The more complex answer is that this requires support from the filesystem and very few filesystems out there have that kind of support. Apparently NT has a transactional FS which does support this. It's possible that BtrFS for Linux will support this as well.

In the absence of direct support, I think the hardlink, move, remove option is the best you're going to get.

思念满溢 2024-08-15 10:41:23

我认为复制然后删除方法几乎是执行此操作的标准方法。您是否知道您无法容忍额外的 I/O?

我不会将自己视为文件系统的导出,但我想任何执行事务的实现都需要首先尝试执行所有所需的操作,然后需要返回并提交这些操作。 IE 中你无法避免执行比非原子执行更多的 I/O。

I think the copy-and-then-delete method is pretty much the standard way to do this. Do you know for a fact that you can't tolerate the additional I/O?

I wouldn't count myself an export at file systems, but I would imagine that any implementation for performing a transaction would need to first attempt to perform all of the desired actions, and then it would need to go back and commit those actions. I.E. you can't avoid performing more I/O than doing it non-atomically.

赴月观长安 2024-08-15 10:41:23

您是否有用于访问文件的抽象层(例如数据库)? (如果您的软件直接进入文件系统,那么我的建议不适用)。

如果删除文件的条件“正确”,请在抽象层中将状态更改为“已删除”,并开始后台作业以“真正”从文件系统中删除它们。

当然,这个建议在打开/关闭文件时会产生一定的成本,但会在符号链接创建等方面节省一些 I/O。

Do you have an abstraction layer (e.g. database) for reaching the files? (If your software goes direct to the filesystem then my proposal does not apply).

If the condition is "right" to delete the files, change the state to "deleted" in your abstraction layer and begin a background job to "really" delete them from the filesystem.

Of course this proposal incurs a certain cost at opening/closing of the files but saves you some I/O on symlink creation etc.

从此见与不见 2024-08-15 10:41:23

在 Windows Vista 或更高版本上,事务性 NTFS应该做你需要的事情:

HANDLE txn = CreateTransaction(NULL, 0, 0, 0, 0, NULL /* or timeout */, TEXT("Deleting stuff"));
if (txn == INVALID_HANDLE_VALUE) {
  /* explode */
}
if (!DeleteFileTransacted(filename, txn)) {
  RollbackTransaction(txn); // You saw nothing.
  CloseHandle(txn);
  die_horribly();
}
if (!CommitTransaction(txn)) {
  CloseHandle(txn);
  die_horribly();
}
CloseHandle(txn);

On Windows Vista or newer, Transactional NTFS should do what you need:

HANDLE txn = CreateTransaction(NULL, 0, 0, 0, 0, NULL /* or timeout */, TEXT("Deleting stuff"));
if (txn == INVALID_HANDLE_VALUE) {
  /* explode */
}
if (!DeleteFileTransacted(filename, txn)) {
  RollbackTransaction(txn); // You saw nothing.
  CloseHandle(txn);
  die_horribly();
}
if (!CommitTransaction(txn)) {
  CloseHandle(txn);
  die_horribly();
}
CloseHandle(txn);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文