在 OSX 上使用 python select kqueue 来监视外部应用程序的文件创建
通常,将我 1 小时长的录音会话转码为 mp3 文件需要二十多分钟。
当 OSX 应用程序 Garageband 完成写入 mp3 文件时,我想使用 python 脚本执行一系列 python 代码。
python 中检测外部应用程序已完成将数据写入文件并关闭该文件的最佳方法是什么。我读过有关 kqueue 和 epoll 的内容,但由于我没有操作系统事件检测方面的背景,也找不到一个很好的例子,所以我在这里要求一个。
我现在使用的代码执行以下操作,我正在寻找更优雅的东西。
while True:
try:
today_file = open("todays_recording.mp3","r")
my_custom_function_to_process_file(today_file)
except IOError:
print "File not ready yet..continuing to wait"
Typically the transcode of my 1 hr long audio recording sessions to an mp3 file takes twenty odd minutes.
I want to use a python script to execute a series of python code when the OSX application garageband finishes writing that mp3 file.
What are the best ways in python to detect that an external application is done writing data to a file and closed that file. I read about kqueue and epoll, but since I have no background in os event detection and couldnt find a good example I am asking for one here.
The code I am using right now does the following and I am looking for something more elegant.
while True:
try:
today_file = open("todays_recording.mp3","r")
my_custom_function_to_process_file(today_file)
except IOError:
print "File not ready yet..continuing to wait"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以打开 lsof 并按您感兴趣的进程或文件进行过滤...
You could popen lsof and filter by either the process or file you're interested in...
答案有点微妙。
如何使用 kqueue/kevent 监视文件写入
从技术上讲,您可以等待数据写入文件,如下所示:
对
test.txt
进行写入后,res
将返回包含一个或多个事件。如果 10 秒过去没有检测到写入,则res
中将返回一个空数组。如果监视队列溢出(在这种情况下极不可能发生,因为您没有监视数千个文件的目录上的大量事件),则res
中将返回一个溢出事件。但这是个好主意吗?
为什么监视写入是一个坏主意
监视写入以确定外部程序何时“完成”文件通常是一个坏主意。发生写入并不意味着程序(本例中为 GarageBand)不会再进行任何写入。更糟糕的是,从另一个程序的角度来看,其写入数据的函数调用可能已经成功,但底层缓冲(在语言平台或操作系统级别)可能会导致通知稍后发送或在某些罕见情况下根本不发送。
如果您只关注写入,那么您最终必须轮询文件以查看它是否已完全写入(重新发明您已有的代码)或使用容易出错且过于具体的启发法来猜测写入者何时完成(例如“写入”)观察到,然后 1 秒内不再写入;这可能意味着写入器已完成……或者意味着计算机负载很重,并且 GarageBand 在写入下一个 MP3 数据块之前需要很长时间进行编码”) 。
监视关闭
这就引出了您内心问题的第二部分:您可以监视外部程序来关闭文件吗?如果我们能做到这一点,我们就可以获得更可靠的提示,即外部程序已完成对给定文件的处理。
答案是肯定的……但在 Python 中并不容易,在 MacOS 上也更不容易。
kqueue(3)
支持 < code>NOTE_CLOSE 和NOTE_CLOSE_WRITE
fflags,当文件的读取器或写入器句柄关闭时触发。但是,截至撰写本文时的最新版本 (3.12),Python stdlib 并未在select
模块中提供这些标志。幸运的是,这是一个旧的 BSD API,不太可能改变,因此从 BSD 内核源中获取这些标志的原始值(我在 NetBSD 源) 很简单:
这些值是无符号(和有符号)16/32 位整数的 256 和 512,所以我们应该能够手动等待它们,如下所示:
但是,这不起作用(当其他程序关闭文件时不会唤醒),因为
NOTE_CLOSE
和NOTE_CLOSE_WRITE
不在 MacOS 上可用。不幸的是,它似乎不像 MacOS 原生的 FSEvents 文件监控 API 也发布与文件关闭相关的事件。结论是,这在 MacOS 上是不可能的,但在其他 BSD 上可能是可能的(通过对
select.kqueue
内部标志进行一些未经授权的修改)。The answer is a bit nuanced.
How to watch for file writes using kqueue/kevent
Technically, you can wait for data to be written to a file like this:
After a write is made to
test.txt
,res
will be returned containing one or more events. If 10 seconds go by without a write being detected, an empty array will be returned inres
. If the watch queue overflows, which is extremely unlikely in this case as you're not watching for huge numbers of events on e.g. a directory of thousands of files, an overflow event will be returned inres
.But is that a good idea?
Why watching for writes is a bad idea
Watching for writes to determine when an external program is "done" with a file is generally a poor idea. Just because a write occurs doesn't mean that the program (GarageBand in this case) isn't going to make any more writes. Worse, from the perspective of that other program, its function call to write data may have already succeeded, but underlying buffering (at the language platform or OS level) may cause the notification to be delivered later or not at all in some rare cases.
If all you watch for is writes, then you end up having to poll the file to see if it's completely written (reinventing the code you already have) or use fallible and over-specific heuristics to guess when the writer is done (e.g. "writes were observed, then no more writes for 1 second; that probably means the writer is finished ... or else it means the computer's under heavy load and GarageBand is taking a long time to encode the next chunk of MP3 data before writing it").
Watching for closes
That brings us to the second part of your inner question: can you watch for external programs to close a file? If we can do that, we can get a much more reliable hint that the external program is done working with a given file.
The answer is yes ... but not easily in Python, and not at all on MacOS.
kqueue(3)
supports theNOTE_CLOSE
andNOTE_CLOSE_WRITE
fflags, which fire when a reader or writer handle to a file is closed. However, the Python stdlib doesn't supply those flags in theselect
module in the latest version as of the time of this writing (3.12).Fortunately, this is an old BSD API and unlikely to change, so grabbing the raw value of those flags from the kernel source a BSD (I found them in the NetBSD source) is easy:
Those values are 256 and 512 in unsigned (and signed) 16/32bit integers, so we should be able to wait for them manually, like this:
However, that doesn't work (doesn't wake up when other programs close the file), because
NOTE_CLOSE
andNOTE_CLOSE_WRITE
are not available on MacOS. Unfortunately, it doesn't seem like the MacOS-native FSEvents file monitoring API publishes events relating to file closure either.The verdict is that this is not possible on MacOS, but is likely possible (with a bit of unauthorized mucking about with
select.kqueue
internal flags) on other BSDs.