拾取目录:如何不拾取仍在写入的文件?

发布于 2024-12-06 19:36:55 字数 549 浏览 1 评论 0原文

我有一个 Python 脚本,用于检查拾取目录并处理它找到的任何文件,然后删除它们。

如何确保不拾取仍在该目录中删除文件的进程正在写入的文件?

我的测试用例非常简单。我将 300MB 的文件复制粘贴到拾取目录中,脚本经常会抓取仍在写入的文件。它仅对部分文件进行操作,然后将其删除。当写入的文件消失时,这会在操作系统中引发文件操作错误。

  • 在打开/处理/删除文件之前,我尝试获取文件的锁定(使用 FileLock 模块)。但这并没有帮助。

  • 我考虑过检查文件的修改时间,以避免在 X 秒内发生任何事情。但这看起来很笨重。

我的测试是在 OSX 上进行的,但我正在尝试找到一个可以跨主要平台运行的解决方案。

我在这里看到类似的问题(How to check if a文件仍在写入?),但没有明确的解决方案。

谢谢

I have a Python script that checks on a pickup directory and processes any files that it finds, and then deletes them.

How can I make sure not to pickup a file that is still being written by the process that drops files in that directory?

My test case is pretty simple. I copy-paste 300MB of files into the pickup directory, and frequently the script will grab a file that's still being written. It operates on only the partial file, then delete it. This fires off a file operation error in the OS as the file it was writing to disappeared.

  • I've tried acquiring a lock on the file (using the FileLock module) before I open/process/delete it. But that hasn't helped.

  • I've considered checking the modification time on the file to avoid anything within X seconds of now. But that seems clunky.

My test is on OSX, but I'm trying to find a solution that will work across the major platforms.

I see a similar question here (How to check if a file is still being written?), but there was no clear solution.

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

開玄 2024-12-13 19:36:55

作为解决方法,您可以侦听文件修改事件(watchdog 是跨平台的)。修改后的事件(至少在 OS X 上)不会在每次写入时触发,只会在关闭时触发。因此,当您检测到修改的事件时,您可以假设所有写入都已完成。

当然,如果文件以块的形式写入,并在每个块之后保存,则这是行不通的。

As a workaround, you could listen to file modified events (watchdog is cross-platform). The modified event (on OS X at least) isn't fired for each write, it's only fired on close. So when you detect a modified event you can assume all writes are complete.

Of course, if the file is being written in chunks, and being saved after each chunk this won't work.

阪姬 2024-12-13 19:36:55

解决此问题的一种方法是更改​​写入文件的程序,首先将文件写入临时文件,然后在完成后将该临时文件移动到目标位置。在大多数操作系统上,当源和目标位于同一文件系统上时,移动是原子的。

One solution to this problem would be to change the program writing the files to write the files to a temporary file first, and then move that temporary file to the destination when it is done. On most operating systems, when the source and destination are on the same file system, move is atomic.

无语# 2024-12-13 19:36:55

如果你无法控制写入部分,你所能做的就是自己观察文件,当它在一定时间内停止增长时,就可以说它很好。我自己也用过这个方法,发现 40 秒对于我的情况来说是安全的。

If you have no control over the writing portion, about all you can do is watch the file yourself, and when it stops growing for a certain amount of time, call it good. I have to use that method myself, and found 40 seconds is safe for my conditions.

甲如呢乙后呢 2024-12-13 19:36:55

每个操作系统都有不同的解决方案,因为文件锁定机制不可移植。

  • 在 Windows 上,您可以使用操作系统锁定。
  • 在 Linux 上,您可以查看打开的文件(类似于 lsof 的做法),如果文件打开,则保留它。

Each OS will have a different solution, because file locking mechanisms are not portable.

  • On Windows, you can use OS locking.
  • On Linux you can have a peek at open files (similarily how lsof does) and if file is open, leave it.
耶耶耶 2024-12-13 19:36:55

您是否尝试过在处理该文件之前打开该文件?如果文件仍在使用中,则 open() 应该抛出异常。

try:
  with open(filename, "rb") as fp:
    pass
  # Copy the file
except IOError:
  # Dont copy

Have you tried opening the file before coping it? If the file is still in use, then open() should throw exception.

try:
  with open(filename, "rb") as fp:
    pass
  # Copy the file
except IOError:
  # Dont copy
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文